ETL

Apache Beam: Side input Pattern

Reading Time: 3 minutes Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. It is a modern way of defining data processing pipelines. It has rich sources of APIs and mechanisms to solve complex use cases. In some use cases, while we define our data pipelines the requirement is, the pipeline should use some additional inputs. For example, In streaming analytics applications, it Continue Reading

PDI: An Introduction to Spoon

Reading Time: 4 minutes Prerequisites: Basic knowledge about Big Data and ETL. What is PDI? PDI stands for Pentaho Data Integration. It is a tool that provides us with ETL capabilities to effectively manage huge and complex data ingestion pipelines. Its use cases include: Loading huge data sets into databases. Performing simple to complex transformations on data. Data migration between different databases. and many more… Installing PDI in your Continue Reading