Turbocharge your Jinja2 Templates

Optimize your network configuration templating using Jinja2 tweaks

Posted by mayuresh82 on August 23, 2021 - 10 minute read

TOC

Jinja2 Templating for Network Configuration Management

A critical part of Network Automation is Network Configuration management, which involves creating tooling or frameworks for maintaining, modifying, verifying and pushing configurations to network devices. In order to scale in a multi-vendor network environment, these configurations are often maintained as templates that are then rendered using variables ( usually YAML files or Python dictionaries ) according to specific business logic. Jinja2 is a Python based templating engine commonly used for this purpose. It has gained a lot of popularity in the network automation community thanks in part to frameworks like Ansible adopting it. Jinja2 is inherently quite fast and expressive but there are a few internal tweaks we could use to further enhance its performance especially when working at scale. This blog post aims to explore some of those techniques.

A quick Jinja2 primer

At the heart of the Jinja2 engine is the Environment. A Loader is first used to load templates from different resources, the most common of which is the file system. Templates can also be loaded from Python packages, modules, dictionaries etc. An instance of this Loader class is passed into the Environment which then uses it to load the appropriate templates and to render them according to specific configuration settings. You usually create one shared environment for a specific set of configuration settings. Once a template is selected ( which is just a blob of text ), the Environment then compiles it into native Python bytecode and the compiled template is then represented inside of a Template object.

Given a template such as the one like this:

hostname {{hostname}}
!
logging host {{syslog_host}}
!
{% for interface in interfaces %}
interface {{interface}}
no switchport
!

which includes Jinja2 specific notation (the double braces syntax) as well as native Python syntax ( for xxx in … ), Jinja2 uses an internal lexer to separate out the two and then to “translate” the strings into smaller units (called tokens) that are recognized by the respective systems (i.e by Jinja2 and by Python itself) and can then be parsed/compiled into executable code. I won’t dive into too many details but at a high level this looks something like this:

Test bed topology
Jinja2 compilation

For a more in-depth look into things like AST, Lexers and Parsers, check out I Wrote a Programming Language.

Optimizing the Jinja2 compilation process

If you are compiling just a single template ( say for a single device ), J2 is usally blazing fast ( even for really large templates ) and nothing else is really needed. However, lets think about a large production network consisting of thousands of network devices (think CLOS DC) comprising of more than one vendor. Regardless of the size of the network, you would typically want to keep configuration generation as fast as possible in order to maximize productivity and improve efficiency.

For this exercise, we will assume a network of 10 Cisco IOS devices utilizing the same configuration template. The variables required to generate configs be stored in a YAML file. The folder structure for our files look like this:

/ __
    |__ templates/
              |______ cisco_ios.j2
    |__ routers.yaml

The dummy template and variable file look like below:

> Cisco IOS Template

  version 12.3
no service pad
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname {{hostname}}
!
boot-start-marker
boot-end-marker
!
no aaa new-model
ip subnet-zero
!
{% if enable_cef is defined and enable_cef %}
ip cef
{% endif %}
ip ips po max-events 100
no ftp-server write-enable
!
{% for intf in interfaces %}
interface {{intf.name}}
 {% if intf.ip_address is defined %}
 ip address {{intf.ip_address}}
 {% elif intf.vlan is defined %}
 switchport mode access
 switchport access vlan {{intf.vlan}}
 {% endif %}
 no shutdown
 duplex auto
 speed auto
!
{% endfor %}
!
!
router bgp {{bgp.bgp_asn}}
 router-id {{bgp.router_id}}
 bgp log-neighbor-changes
 {% for n in bgp.neighbors %}
 neighbor {{n.address}} remote-as {{n.asn}}
 {% for af in n.address_families %}
 address-family {{af.afi}} {{af.safi|default('unicast')}}
  neighbor {{n.address}} activate
  {% for network in af.networks %}
  network {{network.net}} mask {{network.mask}}
  {% endfor %}
  exit-address-family
 {% endfor %}
 {% endfor %}
!
ip classless
!
no ip http server
no ip http secure-server
!
control-plane
!
line con 0
 no modem enable
 transport preferred all
 transport output all
line aux 0
 transport preferred all
 transport output all
line vty 0 4
 login
 transport preferred all
 transport input all
 transport output all
!
end
> YAML variable file

routers:
  - hostname: rtr1
    enable_cef: true
    interfaces:
      - name: GigabitEthernet0/0
        ip_address: 10.1.1.1/24
      - name: GigabitEthernet0/1
        ip_address: 10.1.2.1/24
      - name: GigabitEthernet0/3
        vlan: 100
      - name: GigabitEthernet0/4
        vlan: 200
      - name: vlan100
        ip_address: 172.16.1.1/24
      - name: vlan200
        ip_address: 172.16.2.1/24
    bgp:
      bgp_asn: 1000
      router_id: 1.1.1.1
      neighbors:
        - address: 10.1.1.2
          asn: 2000
          address_families:
            - afi: ipv4
              networks:
                - net: 172.16.1.0
                  mask: 255.255.255.0
                - net: 172.16.2.0
                  mask: 255.255.255.0
        - address: 10.1.2.2
          asn: 3000
          address_families:
            - afi: ipv4
              networks:
                - net: 172.16.1.0
                  mask: 255.255.255.0
                - net: 172.16.2.0
                  mask: 255.255.255.0
            - afi: vpnv4
              safi: vrf FOO
              networks:
                - net: 172.16.0.0
                  mask: 255.255.0.0
  - hostname: rtr2
    enable_cef: true
    interfaces:
      - name: GigabitEthernet0/0
        ip_address: 10.2.1.1/24
      - name: GigabitEthernet0/1
        ip_address: 10.2.2.1/24
      - name: GigabitEthernet0/3
        vlan: 100
      - name: GigabitEthernet0/4
        vlan: 200
      - name: vlan100
        ip_address: 172.16.3.1/24
      - name: vlan200
        ip_address: 172.16.4.1/24
    bgp:
      bgp_asn: 1001
      router_id: 1.1.1.2
      neighbors:
        - address: 10.2.1.2
          asn: 2000
          address_families:
            - afi: ipv4
              networks:
                - net: 172.16.3.0
                  mask: 255.255.255.0
                - net: 172.16.4.0
                  mask: 255.255.255.0
        - address: 10.2.2.2
          asn: 3000
          address_families:
            - afi: ipv4
              networks:
                - net: 172.16.3.0
                  mask: 255.255.255.0
                - net: 172.16.4.0
                  mask: 255.255.255.0
            - afi: vpnv4
              safi: vrf FOO
              networks:
                - net: 172.16.0.0
                  mask: 255.255.0.0
    <...>
   

Rendering a template is a pretty straightforward process in Jinja land. You decide on a Loader to use (e.g to load templates from a file system, use FileSystemLoader), pass the loader into an instance of the Environemnt ( along with other configuration settings if required ), get the specific Template object by name, and call render() on it passing in the contextual data. Lets write some basic Python code to read variables from the YAML document and use the corresponding dictionary as context for rendering our IOS template:

import yaml
from jinja2 import Environment, FileSystemLoader

def render_configs(router_data):
    configs = []
    for router in router_data['routers']:
        l = FileSystemLoader('templates')
        e =  Environment(loader=l)
        t = e.get_template('cisco_ios.j2')
        configs.append(t.render(**router))
    return configs

if __name__ == '__main__':
    with open('routers.yaml') as f:
        router_data = yaml.load(f)
    configs = render_configs(router_data)
    print(configs)

example 1

Executing this code in IPython using the magic %timeit function gives the following results:


In [1]: %timeit render_configs(router_data)
54.7 ms ± 723 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

So upon executing the rendering code 10 times for 10 routers each time, we get an average of 54.7ms per iteration. Now lets re-structure the render function slightly like below:

def render_configs(router_data):
    l = FileSystemLoader('templates')
    e =  Environment(loader=l)
    configs = []
    for router in router_data['routers']:
        t = e.get_template('cisco_ios.j2')
        configs.append(t.render(**router))
    return configs

example 2

After executing it again in IPython:


In [2]: %timeit render_configs(router_data)
6.29 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This time it only takes a total of 6.29ms per loop, almost an 8x speed improvement ! Analyzing the change we made, you can see that we essentially moved from creating a new loader and a new environment each time for 10 devices to using the same shared loader and environment for all the routers. Jinja2’s Environment features an internal in-memory template cache (implemented as an LRU cache) that stores the last N frequently compiled templates. Because we are using the same environment and the same template across all routers, this results in a cache hit almost every time thereby removing the need to compile the same template each time.

In fact, this is the recommended way to use the Environment in the real world where you would routinely render multiple templates across multiple devices at the same time. Also in the real world, you would probably want to use some form of parallelism to render configuration templates across multiple devices depending on the framework or tooling in use. The LRU cache used internally by Jinja2 is thread-safe by design. The above function can be re-written to use one thread per router:

import threading

def render_configs(router_data):
    l = FileSystemLoader('templates')
    e =  Environment(loader=l)
    configs = []
    threads = []
    def _render(router):
        t = e.get_template('cisco_ios.j2')
        configs.append(t.render(**router))
    for router in router_data['routers']:
        t = threading.Thread(target=_render, args=(router,))
        t.start()
        threads.append(t)
    for t in threads:
        t.join()
    return configs

example 3


In [3]: %timeit render_configs(router_data)
16 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

We see that performance here is worse than in the single threaded case. This presumably happens because template rendering is a CPU bound task, and the additional overhead here of switching between threads actually ends up slowing the program down.

Bytecode Caching

Jinja2 supports external caching of the byte-code resulting from compiling templates. Disk-based caches, memcached etc can be used for this purpose. Lets see in our final experiment, the result of enabling a disk cache. We simply pass in an instance of the FileSystemBytecodeCache that ships with Jinja2 into the environment before rendering:

from jinja2 import FileSystemBytecodeCache

def render_configs(router_data):
    l = FileSystemLoader('templates')
    e =  Environment(loader=l, bytecode_cache=FileSystemBytecodeCache('templates/.cache'))
    configs = []
    for router in router_data['routers']:
        t = e.get_template('cisco_ios.j2')
        configs.append(t.render(**router))
    return configs

example 4

Running this in IPython, we get:


In [4]: %timeit render_configs(router_data)
1.05 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

We now get a template render time of just 1.05ms, a super impressive 6x speed up over the previous single threaded example ! So what is going on here ?

The only thing different in this scenario is that Jinja2 stores the compiled byte-code for a template to disk at the location templates/.cache. This happens on the first run of the 1000 iterations that the %timeit function performs. Thus if a template is only ever rendered once, this cache has no effect. The real power of the file system based cache is seen during subsequent runs.

In the case of running render_configs() a 100 times in the example 2 above, the template needs to be loaded and compiled during the start of each loop for the first router being rendered (i.e all 100 times). After the first router is rendered, subsquent routers using the same template will see a hit for the in-memory cache. Compare this with the example 4 of using the file system cache, where the template will see a hit every other time there is an attempt made to load it, which includes the 999 remaining times the function executes. Thus for the 1000 iterations, the template only really needs to be compiled 1 time !

Note that the file system bytecode cache is “smart” enough to invalidate on-disk bytecode if the template source code changes. This is done using an internal checksum that gets assigned to each cache “bucket” and is used to ensure that source code is upto date. Similarly , one can pass in auto_reload=True to the Environment upon initialization to check staleness of templates in the in-memory cache as well. This is useful in cases where you have a long running service (e.g web service) that loads templates from the file-system that may change over time.

Conclusion

Jinja2 is already pretty fast and expressive right out of the box. By combining some best practices and tweaks as highligted here with good template design, you can ensure that you build a scalable and efficient configuration rendering system that will serve you well for years to come, until something better comes along of course :)


comments powered by Disqus