Mastering Ansible Check Mode

On first glance, working with Ansible check mode can seem difficult to the point of not being worth pursuing. However with a few tricks, check mode can be tamed and used in your daily Ansible life. Read on below, for more ...

Overview

ansible-playbook provides a --check option, which enables Ansible check mode. In this mode, Ansible runs through your playbook and reports what it would do, but makes no actual changes. This allows you to validate your code before applying it for real, providing several benefits including:

allowing you to catch syntax errors in your code
allowing you to validate the changes that Ansible intends to make, to ensure they match your expectations

Additionally, you can also pass --diff. With this option, for each task that Ansible detects changes will be necessary, Ansible will output the difference between the current state and the changes it intends to make.

A simple example of these options:

ansible-playbook --check --diff some_playbook.yml

Limitations

On face value, this seems like a great feature, however it comes with a couple of caveats. These caveats are such that you can be left wondering if check mode is actually of any practical use. Indeed, Ansible's own documentation has this to say about check mode:

Check mode is just a simulation, and if you have steps that use conditionals that depend on the results of prior commands, it may be less useful for you.

The first of the caveats, is that support for check mode is an optional feature. A lot of modules support check mode but not all. Those that do not, will silently be skipped when running in check mode. Two modules that fall into this category are command & shell. Given these modules allow arbitary host commands to be run which could result in something changing on the host, the modules play it safe and simply do nothing. This makes sense in case of commands that would make a change, but often these modules are used to run commands that assess the status of a host, the result of which is then used as a conditional to trigger Ansible to carry other actions. A contrived example:

- name: Run a command
  command: echo 'not-configured'
  register: command_status

- name: Include some more tasks if previous command output is 'not-configured'
  include: some_more_tasks.yml
  when: command_status.stdout == 'not-configured'

As the command module does not support check mode, it will simply be skipped when in effect. This will result in the second task failing (and the rest of the play on that host) because the command_status variable has not been initialised. There are ways to change the conditional to suppress the error which would at least allow the rest of the play to be evaluated. However, that is not optimal, because the contents of the include file will also not be evaluated in check mode as a result.

The second caveat results in cases where check mode will fail a play, but if the same play were run for real, it would complete without error. Consider the following example:

- name: Install a package
  package:
    name: some-package

- name: Add a line to the some-package config file
  lineinfile:
    path: /etc/some-package/some-package.conf
    regexp: "^some-option ="
    line: "some-option = some-value"

In this example we install some-package which as part of its installation routine will create a file /etc/some-package/some-package.conf. We then go onto change one of the parameters in this config file. The problem here is that if you run check mode before this code has actually been applied, the second task will fail due to /etc/some-package/some-package.conf not actually existing yet, because the package has not been installed. When run for real however, all will work without issue.

Possibly confusingly, this is only an issue prior to these tasks being run for real. Consider what happens if we later want to modify the value of the option in the config file. In this case, we update our playbook and run it in check mode - this will correctly report the changes to the config file, given the package is installed and so the config file does now exist.

Making check mode work for us

Selectively Turning Check Mode Off

Any task can have a check_mode property defined. This is a boolean which when set to false or no, will cause check mode to be ignored and the task run for real. This can help deal with the first problem we ran into above. Lets change our example to fix the problem it demonstrated:

- name: Run a command
  command: echo 'not-configured'
  register: command_status
  check_mode: false

- name: Include some more tasks if previous command output is 'not-configured'
  include: some_more_task.yml
  when: command_status.stdout == 'not-configured'

With this modification, our first task will run for real, whether or not we are running in check mode. This means that the command_status variable will always contain a valid value and thus trigger the conditional, when appropriate, even in check mode. The knock on effect, is that the include task will now be correctly triggered in check mode, and so its contents can also be evaluated.

Note: This is generally only appropriate for commands/tasks that are not going to make a change to a host. Commands that just identify the status of the host are perfect for this treatment

Gracefully handling check mode errors

Sorting out our second example is a little more involved. One way that some people choose to handle the condition it highlighted, is to make use of ansible_check_mode. This variable is always automatically set by Ansible on every ansible-playbook run, and is simply a boolean that indicates whether check mode is in effect or not. Therefore, one method of dealing with our issue could be to modify the second task in my second example:

- name: Add a line to the some-package config file
  lineinfile:
    path: /etc/some-package/some-package.conf
    regexp: "^some-option ="
    line: "some-option = some-value"
  when: not ansible_check_mode

Doing this will allow check mode to always run successfully, however it means that the lineinfile task will never be tested in check mode. Whilst this helps with the case of check mode being used before the playbook has been applied, it also means that should we ever want to edit the config file later down the line, we can't use check mode for testing/validating the changes before applying them.

An alternative makes use of Ansible's block feature. This feature has a number of uses, one of which can help us in this case. Have a look at the second example, modified to use the block feature:

- block:

  - name: Install a package
    package:
      name: some-package

  - name: Add a line to the some-package config file
    lineinfile:
      path: /etc/some-package/some-package.conf
      regexp: "^some-option ="
      line: "some-option = some-value"

  rescue:

    - name: If we are running in check mode, output a friendly message explaining when errors can be ignored
      debug:
        msg: "Config file missing errors can be ignored if the package has not yet been installed."
      changed_when: true
      when: ansible_check_mode

    - name: If we are not in check mode, then errors are real and should result in the playbook failing
      fail: 
        msg: "Tasks in block failed. Review the errors for more details."
      when: not ansible_check_mode

So one way the Ansible block feature can be used, is like a try - catch structure in other languages. In our case, the tasks under the - block: statement will run. If either of them results in an error, the tasks under the rescue: statement will be run. This feature allows us to handle the error and let the play continue.

In this rescue: section we have two tasks. Each has a conditional, meaning that one will run when check mode is in effect and the other will run if we are running the play for real.

If running in check mode, all we do is output a message explaining the conditions under which it is OK for a failure to occur in check mode. Once the playbook finishes, overall its status will be successful however the output will register that one or more plays needed to be rescued. Browsing through the output of the play, it is then possible to review the errors that triggered the rescue, but also right next to them will be our message describing when it is OK to ignore those errors. With that in mind, I like to add the changed_when: true parameter to the debug task, as this will ensure that in colourized output, the debug task is highlighted by forcing Ansible to treat it as a task that resulted in a change.

If by contrast we are not running in check mode, then any failure should be treated as a failure, and so we trigger the fail task to ensure this happens. Doing this means the play will behave as normal, skipping the remaining tasks for that host and reporting as failed once the playbook is complete. Again, users can browse the output, and review the output from the failed tasks.

Admittedly, this approach adds a significant amount of additional code, simply to allow check mode to work. You will have to make the decision whether it is worth it in your own use cases.

Conclusion

Hopefully the techniques above show that it is possible to turn check mode into a useful tool. It is by no means perfect, and there will still be situations where we cannot practically model the behaviour that we want to. However, albeit with a some extra code and some discipline, we can cover most situations and most importantly, can mitigate the false positives that can lead people to abandon check mode altogther.

Author:	Stewart Middleton
Published:	2020-06-05