Paradigms in Pentaho Data Integration

Reading Time: 4 minutes

PDI has three paradigms for storing user input

  1. Arguments
  2. Parameters
  3. Variables

Arguments

A PDI argument is a named, user-supplied, single-value input given as a command-line argument (running a transformation or job manually from Pan or Kitchen, or as part of a script).

Each transformation or job can have a maximum of 10 arguments. Each argument declared as space-separated values given after the rest of the Pan or Kitchen line.
sh pan.sh -file:/example_transformations/example.ktr argOne argTwo argThree.
In the above example, the values argOne, argTwo, and argThree passed into transformation.

where they will handle according to the way transformation designed. If it does not design to handle arguments, nothing will happen. Typically these values would be numbers, words (strings), or variables (system or script variables, not PDI variables).
In Spoon, you can test argument handling by defining a set of arguments, when you run a transformation or job. This is accomplished by typing in values in the Arguments fields in the Execute a Job or Execute a Transformation dialogue.

Parameters

Parameters are like local variables. They are reusable inputs that apply only to the specific transformation, they are defined in. When defining a parameter, you can assign it a default value to use in the event that one is not fetched for it.
This feature makes it unique among dynamic input types in PDI.
If there is a name collision between a parameter and a variable, the parameter wzill take precedence.
To define a parameter, right-click on the transformation workspace and select Transformation settings from the context menu (or just press Ctrl-T), then click on the Parameters tab.
VFS Properties
The vfs subpart required to identify this as a virtual filesystem configuration property.

The scheme subpart represents the VFS driver’s scheme (or VFS type), such as http, sftp, or zip.

The property subpart is the name of a VFS driver’s ConfigBuilder’s setter (the specific VFS element that you want to set). The host optionally defines a specific IP address or hostname that this setting applies to.
You must consult each scheme’s API reference to determine which properties you can create variables for.
Apache provides VFS scheme documentation at http://commons.apache.org/vfs/apidocs/index.html. The org.apache.commons.vfs.provider package lists each of the configurable VFS providers (ftp, http, sftp, etc.).

Each provider has a FileSystemConfigBuilder class that in turn has set*(FileSystemOptions, Object) methods. If a method’s second parameter is a String or a number (Integer, Long, etc.) then you can create a PDI variable to set the value for VFS dialogues.

Variables

A variable in PDI is a piece of user-supplied information that can used dynamically and programmatically in a variety of different scopes.

A variable can be local to a single step or be available to the entire JVM that PDI is running in.
PDI variables can used in steps in both jobs and transformations.

You define variables with the Set Variable step in a transformation, by hand through the kettle.properties file, or through the Set Environment Variables dialogue in the
Edit menu.
TheGet Variable step can explicitly retrieve a value from a variable, or you can use it in any PDI text field that has the diamond dollar sign icon next to it by using a metadata string in either the Unix or Windows formats:
• ${VARIABLE}
• %%VARIABLE%%
Both formats can use and even mixed. In fact, you can create variable recursion by alternating between the Unix and Windows syntaxes. For example, if you wanted to resolve a variable that depends on another variable, then you could use this example: ${%%inner_var%%}.

Variable Scope

The scope of a variable is defined by the location of its definition.

There are two types of variables: global environment variables, and Kettle variables.

Environment Variables

This is the traditional variable type in PDI. You define an environment variable through the Set Environment Variables dialogue in the Edit menu, or by hand by passing it as an option to the Java Virtual Machine (JVM) with the -D flag. Environment variables are an easy way to specify the location of temporary files in a platform-independent way; for example, the ${java.io.tmpdir} variable points to the /tmp/ directory on Unix/Linux/OS X and to the C: \Documents and Settings\

The only problem with using environment variables is that they cannot be used dynamically. For example, if you run two or more transformations or jobs at the same time on the same application server, you may get conflicts. Changes to the environment variables are visible to all software running on the virtual machine.
Kettle Variables
Kettle variables provide a way to store small pieces of information dynamically in a narrower scope than environment variables. A Kettle variable is local to Kettle, and can be scoped down to the job or transformation in which it is set, or up to a related job. The Set Variable step in a transformation allows you to specify the related job that you want to limit the scope to; for example, the parent job, grandparent job, or the root job.

Conclusion

Here we learn about the three paradigms in PDI

for more information click here

Written by 

Chiranjeev kumar is a Software intern at Knoldus. He is passionate about the java programming . He is recognized as a good team player, a dedicated and responsible professional, and a technology enthusiast. He is a quick learner & curious to learn new technologies. His hobbies include listening music , playing video games.