Handling Errors in Ansible

Table of contents

Reading Time: 3 minutes

Hello readers, in this blog we will be looking at how to handle errors in Ansible Playbooks. There are multiple ways for doing the same and we will be looking at them and how to use it in our Playbook.

By default, Ansible will check the return codes of commands and modules and it fails fast. This means that we will be forced to deal with these failures by default until we decide otherwise.

Let us start by looking how to change the default behaviour of Ansible for certain tasks so that error handling behaviour is as per our requirements.

How to Ignore Failed Commands?

Ansible playbooks stop the execution of any more tasks on a host which has encountered any failures. But in some cases, even after a failure, we might want to continue executing tasks on that host. So, we will have to write tasks that look like the one below:

- name: some task
  command: /bin/false
  ignore_errors: yes

Above all, we will have to keep in mind that this feature will only work when the task is able to run and return a value associated with failure. So, if we have any undefined variables or syntax errors, we will still get an error which we will have to address. Also, it will not prevent connection or execution issues.

How to Reset Unreachable Hosts?

Whenever an Ansible Playbook encounters connection failure with a host, it sets the host as ‘UNREACHABLE’. By doing this, Ansible removes this host from the list of active hosts for the run. To reset this list, we can use meta:clear_host_errors to reactivate all the hosts associated with play. This makes the tasks can try to use them again. We can use this in the same way as below:

- hosts: all
  tasks:
    - set_fact:
        was_accessible: "up"

    - meta: clear_host_errors

    - debug:
        msg: "Hello"

    - when:
        - was_accessible is defined
      debug:
        msg: "Hello again, I am up."

Running Handlers Despite Failures

Handlers associated with a task will not run on hosts on which the task has failed. As a result, a host is left in an unexpected state even though the failures are unrelated.

To tackle this problem, we can use the following options:
1. Using –force-handlers command line option
2. Including force_handlers: True in a play
3. Setting force_handlers=True in ansible.cfg configuration file.

- hosts: all
  force_handlers: true

When we force handlers to run, the handlers will run when notified even if a task has failed on the host.

How to Define Failures?

Ansible provides failed_when conditional to allow us to define what “failure” means. Multiple failed_when can be joined using and that requires that a task is marked as failed only when all the failed_when conditions are met. To register a failure when any one of our multiple conditions are met, we can use or operator.

- name: Web page fetcher
  hosts: all

  tasks:
    - name: Fetch webpage
      uri:
        url: https://somewebsite.com
        return_content: true
     register: output

    - name: Check Content
      debug:
        msg: "Checking content..."
    failed_when:
      - '"Some Content" not in output.content'
      - '"Some other content" not in output.content'

failed_when: output.number == 0 or "No such" not in output.stdout

How to Abort a Play?

When there are failures in a play, sometimes it is essential to abort the entire play instead of just skipping a task. In this scenario, we will have to use the any_errors_fatal option. This option will prevent the play and any subsequent plays from running. In the case of a failure, hosts situated in the current batch are given the opportunity to finish the fatal task and after that the execution of the play is stopped.

We can use this option in the way given below:

- hosts: somehosts
  any_errors_fatal: true

Conclusion

We have seen throughout this blog that there are multiple ways to handle errors in ansible playbooks. We looked that we can also define what “failure” means in our playbooks and what are the various actions we can perform when we encounter them!