Error Handling in Pentaho Data Integration

Reading Time: 3 minutes

Error Handling is a very important step when we are trying to create an application. It makes the life of an engineer easier when there is a proper way to understand mistakes. Pentaho Data Integration (Kettle) provides a very simple step for different handling. The only effort is to define an error handling output and distribute our error data in the output.

Transformation steps may encounter errors at many levels. They may encounter unexpected data or workplace issues. Depending on the nature of the error, the action may decide to stop the change by doing something different or support the PDI Error handling feature, which allows us to divert the negative lines from the error management action.

Throwing a KettleException: Calling a Hard Stop

If a step encounters an error during row processing, it may log an error and stop the transformation. This is done by calling setErrors( ), stopAll(), setOutputDone(), and returning false from processRow(). Alternatively, the step can throw a KettleException, which also causes the transformation to stop. 

It is sensible to stop the transformation when there is a problem with the environment or configuration of a step. 

Implementing Per-Row Error Handling

You may want to divert bad rows to a specific error handling step. This capability is referred to as the Error Handling feature. A step supporting this feature overrides the BaseStep implementation of supportsErrorHandling( ) to return true. This enables you to specify a target step for bad rows in the Spoon UI. During runtime, the step checks if you configured a target step for error rows by calling getStepMeta( ), isDoingErrorHandling( ) If error rows are diverted, the step passes the offending input row to putError( ) and provides additional information about the errors encountered. It does not throw a KettleException.

The below image gives a basic idea about Error Handling in Pentaho Data Integration. The red dotted line which starts from the “CSV Input” step indicates the Error hop.

To detect the error, we will simply “right-click” on the input step or the step where we want to detect the error. Then we will select “Define Error Handling…

Once we have selected “Define error handling…”, we will find the screen below. We can define any variable word we like. These fields will serve as a guide in explaining our mistakes. In case we find any errors, errors will apply to these fields. We can specify the intended action and know what went wrong.

The fields which are defined are as below :

  • Nr of errors fieldname: It is an integer field that will define how many errors are being found in a field.
  • Error descriptions fieldname: It holds the data the errors like “Error inserting row”,” Data truncation: Out of range value adjusted for column ‘id’ at row 1″, etc.
  • Error Field Fieldname: Display the fields where it throws an error.
  • Error Codes Fieldname: the Error codes values.

If you wanna read about the Basics of PDI, you can follow this blog.


Written by 

Hi, I'm Software Consultant with experience in technologies like Core Java, Advance Java, Functional Programming, and looking forward to learn and explore more into this field. I also love competitive programming, solving live problems on Leetcode, CodeChef.