Hello readers, in this blog we will be looking at how to handle errors in Ansible Playbooks. There are multiple ways for doing the same and we will be looking at them and how to use it in our Playbook.
By default, Ansible will check the return codes of commands and modules and it fails fast. This means that we will be forced to deal with these failures by default until we decide otherwise.
Let us start by looking how to change the default behaviour of Ansible for certain tasks so that error handling behaviour is as per our requirements.
How to Ignore Failed Commands?
Ansible playbooks stop the execution of any more tasks on a host which has encountered any failures. But in some cases, even after a failure, we might want to continue executing tasks on that host. So, we will have to write tasks that look like the one below:
- name: some task
command: /bin/false
ignore_errors: yes
Above all, we will have to keep in mind that this feature will only work when the task is able to run and return a value associated with failure. So, if we have any undefined variables or syntax errors, we will still get an error which we will have to address. Also, it will not prevent connection or execution issues.
How to Reset Unreachable Hosts?
Whenever an Ansible Playbook encounters connection failure with a host, it sets the host as ‘UNREACHABLE’. By doing this, Ansible removes this host from the list of active hosts for the run. To reset this list, we can use meta:clear_host_errors to reactivate all the hosts associated with play. This makes the tasks can try to use them again. We can use this in the same way as below:
- hosts: all tasks: - set_fact: was_accessible: "up" - meta: clear_host_errors - debug: msg: "Hello" - when: - was_accessible is defined debug: msg: "Hello again, I am up."
Running Handlers Despite Failures
Handlers associated with a task will not run on hosts on which the task has failed. As a result, a host is left in an unexpected state even though the failures are unrelated.
To tackle this problem, we can use the following options:
1. Using –force-handlers command line option
2. Including force_handlers: True in a play
3. Setting force_handlers=True in ansible.cfg configuration file.
- hosts: all
force_handlers: true
When we force handlers to run, the handlers will run when notified even if a task has failed on the host.
How to Define Failures?
Ansible provides failed_when conditional to allow us to define what “failure” means. Multiple failed_when can be joined using and
that requires that a task is marked as failed only when all the failed_when conditions are met. To register a failure when any one of our multiple conditions are met, we can use or
operator.
- name: Web page fetcher
hosts: all
tasks:
- name: Fetch webpage
uri:
url: https://somewebsite.com
return_content: true
register: output
- name: Check Content
debug:
msg: "Checking content..."
failed_when:
- '"Some Content" not in output.content'
- '"Some other content" not in output.content'
failed_when: output.number == 0 or "No such" not in output.stdout
How to Abort a Play?
When there are failures in a play, sometimes it is essential to abort the entire play instead of just skipping a task. In this scenario, we will have to use the any_errors_fatal option. This option will prevent the play and any subsequent plays from running. In the case of a failure, hosts situated in the current batch are given the opportunity to finish the fatal task and after that the execution of the play is stopped.
We can use this option in the way given below:
- hosts: somehosts
any_errors_fatal: true
Conclusion
We have seen throughout this blog that there are multiple ways to handle errors in ansible playbooks. We looked that we can also define what “failure” means in our playbooks and what are the various actions we can perform when we encounter them!
References
https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html
