The Apache Beam Programming Guide is intended for Beam users who want to use the Beam SDKs to create data processing pipelines. It provides guidance for using the Beam SDK classes to build and test your pipeline. The programming guide is not intended as an exhaustive reference, but as a language-agnostic, high-level guide to pro grammatically building your Beam pipeline. As the programming guide is filled out, the text will include code samples in multiple languages to help illustrate how to implement Beam concepts in your pipelines.
To use Beam, you need to first create a driver program using the classes in one of the Beam SDKs. Your driver program defines your pipeline, including all of the inputs, transforms, and outputs; it also sets execution options for your pipeline . These include the Pipeline Runner, which, in turn, determines what back-end your pipeline will run on.
Terms
Pipeline :- A pipeline encapsulates whole data processing tasks, from start to finish. In other words, a single program which includes input data, transforming that data and writing output data. As a result, all Beam programs must create a pipeline.
PCollection:- A PCollection represents a distributed datasets that Beam pipeline operates on. Firstly, the data can be like coming from a fixed source i.e. from a file or from a unbounded source like Kafka.
PTransform:- A PTransform explains a data processing operation, or a step, in pipeline. Meanwhile every PTransform has one or more PCollection as an input, performs operations and produces zero or more output PCollection objects.
IOTransform:- Beam contains huge number of IOs – library transforms so as to read or write data to external systems.
Foot Steps to construct a Pipeline:
- Create a
Pipeline
object and set the pipeline execution options, including the Pipeline Runner. - Create an initial
PCollection
for pipeline data, either using the IOs to read data from an external storage system, or using aCreate
transform to build aPCollection
from in-memory data. - Apply
PTransform
s to eachPCollection
. Transforms can change, filter, group, analyze, or otherwise process the elements in aPCollection
. A transform creates a new outputPCollection
without modifying the input collection. A typical pipeline applies subsequent transforms to each new outputPCollection
in turn until processing is complete. However, note that a pipeline does not have to be a single straight line of transforms applied one after another: think ofPCollection
s as variables andPTransform
s as functions applied to these variables: the shape of the pipeline can be an arbitrarily complex processing graph. - Use IOs to write the final, transformed
PCollection
(s) to an external source. - Run the pipeline using the designated Pipeline Runner.

Data Source: From where we read data.
Output: To path where data is getting dumped after processing.
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
Code extract to put data into text file
public class WriteToFile {
public static void main(String[] args) {
System.out.println("hello...");
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
List<String> list = Arrays.asList("a","b","c");
p.apply(Create.of(list)).apply(TextIO.write().to("/home/knoldus/Downloads/rkr/textfile").withSuffix(".txt"));
p.run().waitUntilFinish();
System.out.println("h");
}
}
Code Snippet Details: This code first create the pipeline and then it will pick elements from the List and write it to textfile.txt file but you need to pass the right path and in proper format.
References:
https://beam.apache.org/documentation/programming-guide/