Reading Time: 5 minutes Merging Data Streams This section of the blog will discuss the considerations to be kept in mind when merging data from different streams. In spite of the fact that there are many steps that can receive multiple inputs and therefore merge data, it is better to use specialized steps. When using specialized steps, it is ensured that the rows are merged in a particular order. Continue Reading

Pentaho – Hadoop Cluster connection

Reading Time: 2 minutes Prerequisite: Basic overview of Pentaho. Using Pentaho you can simply solve all big-data analytics problems easily without writing a single line of code and generate required results/ Output for analysis. It can easily able to establish connections with other Big Data Platforms such as Google Dataproc, Hortonworks Data Platform (HDP)  Amazon Elastic MapReduce (EMR), etc Also, it can be integrated with its services like HDFS, Continue Reading

Introduction to Pentaho Data Integration

Reading Time: 4 minutes Pentaho Data Integration (PDI) provides the Extract, Transform, and Load (ETL) capabilities that facilitate the process of capturing, cleansing, and storing data using a uniform and consistent format that is accessible and relevant to end-users and IoT technologies. What is Pentaho? Pentaho is business intelligence (BI) software or a set of tools. It consists of a few set of tools that provides solutions for Data Continue Reading

PDI: An Introduction to Spoon

Reading Time: 4 minutes Prerequisites: Basic knowledge about Big Data and ETL. What is PDI? PDI stands for Pentaho Data Integration. It is a tool that provides us with ETL capabilities to effectively manage huge and complex data ingestion pipelines. Its use cases include: Loading huge data sets into databases. Performing simple to complex transformations on data. Data migration between different databases. and many more… Installing PDI in your Continue Reading