Working with Hadoop Filesystem Api

Table of contents
Reading Time: 2 minutes

Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.

To start with it :

1) we first need to include the (sbt) dependencies (for an sbt project) :


libraryDependencies ++= Seq(
	"org.apache.hadoop" % "hadoop-common" % "2.8.0",
	"org.apache.hadoop" % "hadoop-hdfs" % "2.8.0"
)

2) Next step is to configure for the filesystem :


/**
     * This method configures the file system
     * @param coreSitePath Path to core-site.xml in hadoop
     * @param hdfsSitePath Path to hdfs-site.xml in hadoop
     * @return HadoopFileSystem instance
     */
    public FileSystem configureFilesystem(String coreSitePath, String hdfsSitePath) {
        FileSystem fileSystem = null;

        try {
            Configuration conf = new Configuration();
            Path hdfsCoreSitePath = new Path(coreSitePath);
            Path hdfsHDFSSitePath = new Path(hdfsSitePath);
            conf.addResource(hdfsCoreSitePath);
            conf.addResource(hdfsHDFSSitePath);

            fileSystem = FileSystem.get(conf);
            return fileSystem;
        } catch (Exception ex) {
            System.out.println("Error occurred while Configuring Filesystem ");
            ex.printStackTrace();
            return fileSystem;
        }
    }

 

3) After configuring filesystem we are ready to start reading from HDFS or write to HDFS:

Let us start by writing something to HDFS from local filesystem : To perform this operation we will use
“void copyFromLocalFilesystem( Path src, Path dst )”
method of filesystem api.


/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param sourcePath provides the sample input file which can be written to HDFS
     * @param destinationPath refers to path on hdfs where the sample input file will be written
     * @return
     */
    public String writeToHDFS(FileSystem fileSystem, String sourcePath, String destinationPath) {
        try {
            Path inputPath = new Path(sourcePath);
            Path outputPath = new Path(destinationPath);
            fileSystem.copyFromLocalFile(inputPath, outputPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while writing file to hdfs");
            return Constants.FAILURE;
        }
    }

 

Next we can read from HDFS and store to our local file system : To perform this operation we can use
“void copyToLocalFile( Path src, Path dst )”
method of filesystem api.


/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param hdfsStorePath refers to path on hdfs where the sample input file is present
     * @param localSystemPath refers to a location of file on local system in which data read from hadoop file will be written
     * @return
     */
    public String readFileFromHdfs(FileSystem fileSystem, String hdfsStorePath, String localSystemPath) {
        try {
            Path hdfsPath = new Path(hdfsStorePath);
            Path localPath = new Path(localSystemPath);
            fileSystem.copyToLocalFile(hdfsPath, localPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while reading file from hdfs");
            return Constants.FAILURE;
        }
    }

 

4) Final step is to close the filesystem after we are done reading from HDFS or writing to HDFS :


/**
     *  This closes the FileSystem instance
     * @param fileSystem
     */
    public void closeFileSystem(FileSystem fileSystem) {
        try {
            fileSystem.close();
        } catch (Exception ex) {
            System.out.println("Unable to close Hadoop filesystem : " + ex);
        }
    }

 

References :

1) https://hadoop.apache.org/docs/r2.7.1/api/index.html?org/apache/hadoop/fs/FileSystem.html


KNOLDUS-advt-sticker

Written by 

Sangeeta is a Software Consultant and has experience of more than 2 years. She was successful in winning the inter-college competition: Venture Exposition (Project Presentation) in Innotech-Technical Event, at LBSIM. She is familiar with different technologies which include Scala, Java, Play Framework, Hadoop, Spark, HTML, CSS, Javascript. Her Hobbies include dancing, painting and practicing yoga.