How to speed up Ansible playbooks drastically ?

Dernière mise à jour : 8 sept. 2021

Introduction


Ansible is a well known open source automation engine which can automate, provision, handle configuration management and orchestration. As it doesn’t need an agent by using SSH protocol, and because you don’t need to write code using simple modules, Ansible eases the deployment and management of your applications !

Before discussing how we can optimize your Ansible configuration, here is a quick reminder on how it works. You can see on the following picture an “Ansible Management node”. This host perform operations on target infrastructure by pushing configurations through playbooks and roles. Each host called in the playbook will be configured as you expect them to be. Finally, hosts can be organized in groups through an inventory file which helps us decide which hosts we are controlling, when and for what purpose :

We can use Ansible for so many tasks as provision virtual machines, apply configurations or even patch them.

However, in some contexts like continuous delivery, having fast Ansible scripts (called playbooks) is required to get rapid feedback as well as reducing the possible Ansible load on the target servers.

In this article, we are going to see important concepts which can provide Ansible with a great performance, and finally go through some benchmarking to quantify the possible improvements.


Good practices to speed up playbooks


Yum calls are expensive !


One important Ansible module is the yum module. You can use it to install, upgrade, downgrade, remove, or list packages and groups with the yum package manager (or apt for debian). A common issue is to invoke several times the same module in multiple tasks like so :

- name: install the latest version of nginx
  yum:
    name: nginx
    state: latest

- name: install the latest version of postgresql
  yum:
    name: postgresql
    state: latest

- name: install the latest version of postgresql-server
  yum:
    name: postgresql-server
    state: latest

Yums are expensive ! Ansible is smart and knows how to group yum or apt transactions to install multiple packages into a single transaction, so it’s a huge optimization to install all the required packages in a single task :

- name: Install a list of packages
  yum:
    name:
      - nginx
      - postgresql
      - postgresql-server
    state: present


Avoid using Shell or Command modules


To run a shell command on an Ansible host, you can use modules like shell or command. Both are really time consumers as we will see in the benchmark. Always check if there isn’t a more appropriate module :

- name: Create a directory (BAD WAY using a shell command)
  shell: mkdir /tmp/sokube
      
- name: Create a directory (GOOD WAY using a module)
  file:
    path: /tmp/sokube
    state: directory

It won’t be just faster but it will also leverage the idempotent property of the modules. It means that after 1 run of a playbook to set things to a desired state, further runs of the same playbook should result in 0 change. In simpler terms, idempotency means that Ansible playbooks can be executed several times without any side effects so that consistency of the environment is maintened.

We will see in our benchmark how efficient it is when you use modules, instead of shell commands.


Select the best Strategy


When running a playbook, Ansible uses a strategy that is basically the playbook’s workflow. It’s important to select the correct strategy if we want to improve efficiency. The default one is linear: it will run each task on a number of hosts and wait for each task to complete before starting the next one.

If the target is independent, we can consider the “free” strategy. Tasks will be processed independently on the status of tasks on other hosts, as explained in the following picture :

We can define custom strategies by developing plugins or use existing plugins like mitogen, which we will discuss later on this page.


Increase forks


Forks define the maximum number of simultaneous connections Ansible made on each task. It will help you manage how many hosts should get affected simultaneously. By default, the parameter is 5, which means that only 5 hosts will be configured at the same time. We can improve that value as far as it doesn’t interfere with your infrastructure’s resources.

Forks can be configured in the the ansible.cfg file:


forks=25


Configure Async tasks


By default Ansible runs tasks synchronously, holding the connection to the remote node open until the action is completed. When the task is truly independent, that is no other task is expecting to be finished to get started, defining the task as asynchronous can truly optimize the overall execution, as show in the below example:

---
- name: My Playbook to test Async and Poll
  hosts: webservers
  tasks:
    - name: Copy the script from Ansible host to node for testing
      copy: 
        src: "my-longrunning-script.sh"
        dest: "/tmp"
    
    - name: Execute the long running script
      shell:
        "chmod a+x /tmp/longrunningscript.sh &&  /tmp/my-longrunning-script.sh 60" # Run for 60 seconds 
      async: 120