Cloud Infrastructure as Code (IaC): Automating IP Address Management

Context

In the modern era cloud infrastructure is often deployed using declarative code like Terraform or Bicep or less declartively using Cloud Development SDKs e.g "Pulumi" (TM).

This includes network topologies built from cloud IaaS components like VNETs (with private IP address space), Virtual Network Gateways, Cloud-to-Datacenter links (e.g ExpressRoute), Virtual Network peering links, "service endpoints" and Public facing IP addresses to communicate with services on the internet.

For large organisations (including "Enterprises") the management of private address space is complex enough "on premises", let alone in the cloud.

Usually a dedicated networking team has to be involved in assignment, provisioning, reclamation, renumbering of IP address space to cloud teams.

This task is often carried out by a dedicated networking team using manual or semi-manual tools ranging from Excel Spreadsheets to an enterprise product like InfoBlox (TM) IPAM.

The management of DNS records is often related to this process if a centralised tool like InfoBlox is in place.

Cloud Engineers then consume the networking information managed with these tools to configure IaC for the cloud using a separate workflow.

Problem Statement

The management of IP address space in the cloud is an ongoing, often manual, iterative task involving mistakes and a degree of trial and error in getting address space correctly subnetted, assigned, configured.

When this process is required to configure infrastructure code for the cloud (e.g Terraform IaC) it can result in code errors and deployment failures due to increasing complexity of configuring the code to reflect the right IP address assignments.

All this would be manageable if the infrastructure code (often written in terraform or bicep), automation (CI/CD pipelines) infrastructure "state" were robust and recovered from configuration errors easily.

However, this is emphatically not the case in general for the cloud. The following problems are common:

Networking configuration errors lead to IaC deployment failures at deployment time in the cloud even if successfully passing tests prior to actual cloud deployment, simply because of hidden business rules of the Cloud Platform Vendor.
Recovery from those errors loses hours or days of DevOps team manpower.
In a complex organisation with an extensive cloud hub-and-spoke network topology the errors can add up to days of downtime in a given year resulting in service downtime and frustrated DevOps engineers.
To sum up: The more complex an organisation's networking requirements are in the cloud the greater the need for a reliable system of managing IP address pools in code.

What is the ask?

As a Network Manager responsible for managing address space across multiple clouds, I want to be able to provision a pool of address space to VNETs, subnets, interfaces, physical links, gateways on the cloud flexibly and track the utilisation by teams and projects and ownership of address space.

I want primarily to:

Provision and reclaim address space as applications are deployed and removed from the cloud,
Track the history of ownership
Make IP address space available to Infrastructure Code via automated API queries.

In addition to these core requirements (or "wants"):

I want smooth integration of IP address Provisioning with CI/CD pipelines and network automation tools without having to manually do subnet calculations every time I assign a range from the reserved pool of available IP addresses.
I would prefer Cloud Application or Engineering teams manage the detailed subnet allocations in the cloud for their specific applications directly rather than relying on the Network Team.

What I want to avoid:

I do not want to maintain IP address records using a system of documentation based on MS Excel or notepad on the specifics of address assignment in the cloud. While these documents should be available on demand, they should not be the core means of managing IP address space in the organisation.
I do not want to buy a million dollar Infoblocks solution to manage my entire organisational address space - I have enough infrastructure to maintain as it is.

What Problem Does This solve?

Reduction of Administrative Errors:

Management of IP addresses in a consistent, reliable and error free way prevents automation failures when provisioning IP addresses to cloud infrastructure due to address space overlap. Moreover, it reduces the cascading Infrastructure Code bugs and failures which result in application downtime when mistakes with IP address assignment inevitably happen.

Improvement of configuration consistency:

The IPAM solution would also carry out tasks like ensure IP address ranges remain contiguous as they are allocated/de-allocated in a VNET/VPC.

Smoother, Faster Cloud Deployment Workflow:

Automating the provisioning and assignment of address space as well as the consistent management of addresses speeds up the cloud deployment workflow and reduces the I.T workload on teams.

What Value Does This Add?

Increased reliability in IP address management leads to more resilient Cloud automation and infrastructure overall, which in turn saves costs and time spent maintaining infrastructure. Unifiying the IP address management workflow and the cloud configuration workflow speeds up operational processes and delivery dramatically (if done right!).

This reliability and operational efficiency can be achieved with effective automation (note the emphasis!).

The advantages for automation (including self service) seem compelling in terms of opportunities for operational agility:

Networking and Infrastructure Teams could pre-provision broad ranges of address space in the IP address database for the Cloud following which Cloud Engineers could allocate or de-allocate ranges "programmatically" to teams, functions, applications, other organisations.
Application teams deploying infrastructure to our cloud could access IP address provisioning information to configure their Infrastructure Code in the deployment pipeline at run time, or prior. They could access the correct IP address ranges for any given application programmatically without needing the help of Cloud Infrastructure Engineers. All of this could be part of a single unified automation workflow instead of separate processes requiring manual coordination between teams.

Build Tasks: A quick brainstorm

How would we build this "Cloud Native IPAM Solution" ?

Because we're "agile" we jump straight into thinking about what we should build. After all, who in this era has time for "architecture" right???

The following key components spring to mind at first glance:

• IPAM IP calculation engine (IP address calculator)
• IPAM query and update interface (API, CLI and/or U.I)
• IP Address space tracking database of some sort
• Provisioning/Deprovisioning workflow along the lines of a CI/CD pipeline
• Integration with IaC through templating engine

At a very high level, our "Cloud Native IPAM Solution" fits into the enterprise as pictured below:

Integration of the IPAM service into existing cloud context

A little Architecture Wouldn't Hurt, Would it?

Well, maybe we shouldn't dismiss "Architecture" so hastily, after all this thing is beginning to look a little complicated!

Let's draw some pictures!

At a slightly more detailed systems level, we envision the components of the IPAM solution to have the following interrelationships:

IP Address Management for Infrastructure-as-Code Implementations

The solution can be described as follows by enumerating it's components (in summary):

Data Model: Provides an abstract scheme for collecting all the properties of IP addresses and address spaces and their interrelationships as one consistent information structure.
Business Logic Engine: Program logic which applies the correct rules for operating on the Data Model for IP addressing. This could be code which provides operations for manipulating the information in the data model according to allowed rules.
Relational Schema: The SQL relational schema implementing the data model in a format which can be represented in a database.
Relational DB: An actual physical database which hosts the relational schema of the data model and provides features for managing the information and the model programmatically.
Database Storage: The physical storage medium which will be used by the relational database software to store the actual ip addressing data.
API: An interface which allows programmatic access to the features of the data model and the information within the data model. This would include commands to allocate ip address ranges, search for available addresses, decommission address space/ranges, assign to applications and teams and check the status of addresses. The API would be consumed by users and pther scripts via web interfaces, CLIs and internet protocols like REST, Websockets, gRPC.
Deployment Automation Process: An automated process (e.g script, ci/cd pipeline, a.i agent, management service) which consumes IP address information to configure software running in The Cloud or generate configuration code like "Terraform" and "Bicep" (IaC DSL).
User Portal: The User Portal provides a very lightweight interface to inspect the IP address information in the IPAM database, annotate IP address information and carry out CRUD operations on it.

Getting Down to Nuts and Bolts

After this rather "lean" design exercise, we should be ready to translate our moderately abstract description of the solution into actual technologies and components which could be used to built a prototype.

To do this we transform our eight-point description into a more detailed specification of the system:

Data Model: Provides an abstract scheme for collecting all the properties of IP addresses and address spaces and their interrelationships as one consistent information structure.
Business Logic Engine: Program logic which applies the correct rules for operating on the Data Model for IP addressing. This could be python script code which provides definitions wrapping operations for carrying out subnetting calculations, update of address metadata (CRUD).
Relational Schema: A SQL relational schema implementing the data model in a format which can be represented in a database.
Relational DB: An actual physical database which hosts the relational schema of the data model and provides features for managing the information and the model programmatically.We should prefer a "service-less" database to reduce the number of moving parts in the solution perhaps SQLite, but if needed PostGreSQL.
Database Storage: The physical storage medium which will be used by the relational database software to store the actual ip addressing data. The type of storage medium (spinning rust, solid state/RAM) would be influenced by the degree to which we need the solution to scale.
API: An interface which allows programmatic access to the features of the data model and the information within the data model. This would include commands to allocate ip address ranges, search for available addresses, decommission address space/ranges, assign to applications and teams and check the status of addresses. The API would be consumed by users and pther scripts via web interfaces, CLIs and internet protocols like REST, Websockets, gRPC.
Deployment Automation Process: Here we'd want to build IP address configuration to be used in Terraform IaC for the deployment of user VNETs/VPCs and the subnet layout in them.
User Portal: The User Portal provides a very lightweight interface to inspect the IP address information in the IPAM database, annotate IP address information and carry out CRUD operations on it. This could be implemented using a containerized web frontend like Flask/ Django etc ...

Now Let Us Build!

We have enough of a picture of our IPAM solution in mind by now to implement a prototype/PoC and test if it integrates with Terraform infrastructure Code for our Azure Cloud Network Implementation.

The following tasks (in no particular order) will be the key steps in our build process:

We will structure our Terraform code to consume IP address configuration provided by an external IP db, perhaps via environment variables in a deployment pipeline (Jenkins?) or by direct query to the IPAM API.
We will implement a CI/CD pipeline to extract the IP addresses and metadata from IPAM via an API and configure it in Terraform IaC for specific deployments.
We will implement a data model, translate it to a SQL schema, configure it in an RDBMS.
We will implement a "rules engine" with an API to operate on the data in the db according to the data model.
We will implement a lightweight API to allow programmatic access to the functions in the data model from external scripts and systems.
We will implement a lightweight web U.I to carry out all operations on the data in the data model to maintain the IP address data.

But first: Principles

Hold on. Not so fast.

We should settle on architectural guiding principles to evaluate and ensure the success of our implementation, define expectations of the solution and build efficiently:

Lean Architecture: Design the MVP first
Scalability not a concern at this point
Agile Implementation: Deliver a working solution quickly, simply.
Evaluate based on fitness for purpose in an actual usage scenario
Robust implementation: Avoid fragile components and design even at the cost of features.
Containerize and package the entire self-contained solution in a highly modular fashion.
COTS technologies with minimal code
Monolithic design in the MVP to aid "self contained packaging"

Understanding the approach, we can now move on to the actual implementation ...

The Implementation

We begin with the Data Model, without which the design is essentially meaningless.

Minimum Viable Data Model (Conceptual Representation)