TOC
The importance of User Interfaces for Network Automation
One of the least talked about aspects of network automation workflows is the ability to interface with the different tools and frameworks that enable these workflows. Consider a scenario where you are part of a small team of Network Engineers in a small company that has a small but growing customer base. You realize the need to automate your device and link provisioning workflows and hack up a few Python scripts (or Ansible playbooks if you prefer) that get the job done in the beginning.
Over time, your company grows exponentially and so does your production network environment. You now have a dedicated team of engineers solely responsible for network deployment and bandwidth provisioning. In such a scenario, it becomes imperative to allow other engineers to run automation workflows for device management and provisioning; and careful thought must be given to the end user experience in order to ensure that your automation tools dont act as blockers for the overall growth of company infrastructure. This still leaves you, the original author of those frameworks in charge of maintaining these tools but you are now able to offload trivial and repetetive tasks to other teams so you can focus on higher priority problems. This post explores how one can approach the problem initially and evolve the user interface over time.
Day 1: The CLI
During your initial foray into automating a production network, you often want to keep things as simple as possible. In fact, it is a good idea to focus as much as possible on the business goals and logic while quickly throwing together a trivial way to accept user input or provide user interaction with your tool. Consider the example of pushing a configuration to a router in the network. For the sake of this discussion, we can ignore the actual implementation details since they are not very relevant. Assuming that we have some Python code called push_config.py
that does the job behind the scenes, your day 1 implementation might just be a simple argument parser using the built in sys
module that accepts, at a minimum, the IP address of the device and its credentials:
import sys
def push_config(router_ip, username, password, config):
# your logic here
return True
if __name__ == '__main__':
router_ip = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
config_file = sys.argv[4]
with open('config_file') as f:
config = f.read()
return push_config(router_ip, username, password, config)
The script can then be invoked using python
:
python push_config.py 10.1.1.200 userfoo passfoo
Of course the problem with this approach is that it does not scale with a growing number of devices and scripts to maintain. It is also fragile and requires a lot of custom error handling and sanity checking. This becomes a bigger problem when you try and offload the duties to other engineers to help with day to day config pushes. In the real world, frameworks like Nornir take care of the inventory management for you by providing a way to build inventories dynamically or load them via inventory files. If you are just starting out with ad hoc scripts, using some well known Python modules such as argparse
, optparse
or the more powerful Click module can alleviate some of the problems with the simple argv
based approach. These libraries make your tooling a lot more user friendly and allow the use of helpful annotations and usage instructions that enable teams to easily operate the network.
Day 2: The API
The CLI approach works well when you have a confined domain of engineers operating and running your tools. Lets say that over time, you expand the functionality of your Python scripts to also fetch critical pieces of information from network devices for the purposes of analysis or monitoring. You are now also required to make this information accessible to certain application owners so they can incorporate network data into their own workflows. Your tooling is now more mission critical to the growth of the organization or company infrastructure, and the simple CLI interface does not work any more - especially when application owners need a programmatic way of ingesting your data. Enter the Application Programming Interface or API. An API allows you to convert your tooling into a service or a framework via a client-server model in which the server exposes “endpoints” that clients can invoke remotely in order to perform specific tasks or run scripts in the backend. The data is then returned back in a structured format for easy ingestion by other clients. A perfect example of this is a RESTful API which uses HTTP requests and responses.
Considering our business use case, remember that we now have multiple teams using our tooling and we would still need to allow CLI access to our deployment team in order to not break their workflows. Do we then need the additional overhead of maintaining not just the original CLI based execution logic but also a parallel REST API that allows for remote execution ? The answer is that we do not, and therein lies the idea of API driven frameworks. In a nutshell, you write all your tooling such that any and all user / client interaction happens via the API.
Going with our example above, here is a short snippet of code that creates a basic HTTP server using the Flask Python framework and allows running our config push tool via the API. The CLI and REST API are disaggregated, maintained as two separate pieces of code. This allows constraining all the business logic in a central API (possibly residing on a machine in a datacenter) while distributing the CLI code to the end users to run via their personal computers. A change or a fix to one can be made without affecting the other.
api.py:
from flask import Flask
app = Flask(__name__)
@app.route('/pushConfig', methods=['POST'])
def push_config():
body = request.json
ip = body['ip_address']
username = body['username']
password = body['password']
config = body['config']
# your logic here
return True
push_config.py:
import requests
import click
api_url = 'http://remote_host_ip:8080'
@click.command()
@click.option('--username', help='Router username')
@click.option('--password', help='Router password')
@click.argument('router_ip')
@click.argument('config_file')
def push_config(u, p, ip, config_file):
with open('config_file') as f:
config = f.read()
body = {
'ip_address': ip,
'username': u,
'password': p,
'config': config,
}
resp = requests.post(api_url + '/pushConfig', json=body)
resp.raise_for_status()
As you can tell, the CLI program now interacts directly with the API over the network in order to push the config. This is known as a Remote Procedure Call or RPC , where you are calling a “procedure” or a function/method remotely over the network using an API that the remote server exposes. Notice how the function name is part of the URL and the function “arguments” are passed as the request body. There are other less widely used RPC mechanisms out there such as JSON RPC or SOAP but for the vast majority of use cases - REST is best.
Day 3: The UI
Fast forward six more months, and your operations team is now involved in several different projects. Pushing out configs to devices is now seamless and efficient thanks to the framework you delivered. In fact, it has worked well enough for the operations team to delegate simple circuit turn-up tasks out to field engineers. However, the field engineers don’t have much of a clue about what a piece of configuration does nor how its generated. All they are concerned with is pushing out a given piece of configuration to a set of devices in the quickest possible manner. Would’t it be great if there were a web interface where an authorized person could simply upload a piece of configuration (handed to them by some other means) and press a button to push that out to the given devices ?
Because we have made our framework API driven, a web UI can be easily bolted on. It becomes just another consumer of our API, similar to our already existing CLI tool. Of course, this requires a bit of HTML or Javascript knowledge but it is a worthwhile investment in my humble opinion - one whose dividends quickly pay off. API driven development has numerous advantages such as easier and faster deployments, more maintainability, increased resiliency and modularity.
Day 4 and beyond
You now have a framework setup that allows for seamless runs of your workflows and helps the organization grow and scale. As a further enhancement, you could now introduce the ability to store the results of your job runs into a central database, thereby maintaining a history and allowing for analysis and reporting. Using a framework like Flask means you could use one of the several available Object-Relational Mapping (ORM) plugins such as Flask-SQLAlchemy to easily integrate a database adapter into your server. As your tooling gets more mission critical, you could consider performance upgrades by switching to more modern mechanisms such as GRPC and providing redundancy and scale by fronting your server instances with a high performance load balancer. Of course all this is easier said than done and requires proper planning, resources and expertise.
A note on “batteries included” frameworks
Frameworks like Flask encourage a modular, distributed API driven approach that allow developing each component (server, Web interface, Database adapters etc.) independently and iteratively. They also provide more fine grained control over each component.
A vastly different approach to creating REST APIs compared to the Flask example above can be found in “batteries included” frameworks, the most common one being Django. While Flask is a simple micro-framework that can be easily extended via plugins, Django includes just about anything you need to create a full fledged Web framework, including APIs and UIs. The rich feature set does come at the cost of increased complexity, which makes things harder to uderstand and to troubleshoot when they break. As usual when it comes to technology choices, tradeoffs need to be made !
User Interface Best Practices
Whether its the CLI or the UI, here are some things to consider when designing your interface:
- Provide clear messaging: Your workflows can fail for various reasons. The last thing an end user wants to see when that happens is a huge 100 line traceback with the actual error buried somewhere therein. Provide clear, concise status messages when things progress and error messages when things fail.
Bad:
python push_config.py --router 1.2.3.4 --username foo --password bar
Pushing config to 1.2.3.4...
Traceback (most recent call last):
File "<some_library.py>", line 2, in <module>
File "<some_other_lib.py>", line 2, in do_something_that_might_error
File "<push_config>>", line 2, in raise_error
ConnectAuthError(1.2.3.4): Connection Refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "push_config.py", line 4, in <module>
raise ValueError('failed') from e
ValueError: failed
Good:
python push_config.py --router 1.2.3.4 --username foo --password bar
[I] Generating Configs
[I] Pushing Configs to 1.2.3.4
[E] Bad Username or Password !
- Use Colors Strategically: Typically use red to highlight errors so that they stand out. Use bold bright colors for items that require special attention. For a Web UI, use typography to create hierarchy and clarity.
python push_config.py --router 1.2.3.4 --username foo --password bar
[I] Generating Configs
[I] Pushing Configs to 1.2.3.4
[W] Using insecure http !
[E] Bad Username or Password !
- Provide configurability: Think about the parameters or options that could potentially be variable during the execution of your program. When it doubt, make it a configurable parameter ! For a CLI program, these could be provided via the argument parser. Things like logging verbosity, connection and execution timeouts etc. should all be made configurable. When deploying an API server, these options could be provided either via the CLI or via environment variables. For items that you dont want the user to control but still need to be variable, provide sane default values.
- Keep things simple: This is the holy grail of any and all systems that should provide years of continued reliable service. Keep a well defined scope in mind and only implement the needed functionality within the tool. Feature bloat is real and can slow things down considerably over time when problems arise (and they will). The future owner of your tool or system wil thank you when ownership changes hands !
comments powered by Disqus