Working with Hadoop Filesystem Api

Reading Time: 2 minutes

Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.

To start with it :

1) we first need to include the (sbt) dependencies (for an sbt project) :


libraryDependencies ++= Seq(
	"org.apache.hadoop" % "hadoop-common" % "2.8.0",
	"org.apache.hadoop" % "hadoop-hdfs" % "2.8.0"
)

2) Next step is to configure for the filesystem :


/**
     * This method configures the file system
     * @param coreSitePath Path to core-site.xml in hadoop
     * @param hdfsSitePath Path to hdfs-site.xml in hadoop
     * @return HadoopFileSystem instance
     */
    public FileSystem configureFilesystem(String coreSitePath, String hdfsSitePath) {
        FileSystem fileSystem = null;

        try {
            Configuration conf = new Configuration();
            Path hdfsCoreSitePath = new Path(coreSitePath);
            Path hdfsHDFSSitePath = new Path(hdfsSitePath);
            conf.addResource(hdfsCoreSitePath);
            conf.addResource(hdfsHDFSSitePath);

            fileSystem = FileSystem.get(conf);
            return fileSystem;
        } catch (Exception ex) {
            System.out.println("Error occurred while Configuring Filesystem ");
            ex.printStackTrace();
            return fileSystem;
        }
    }

 

3) After configuring filesystem we are ready to start reading from HDFS or write to HDFS:

Let us start by writing something to HDFS from local filesystem : To perform this operation we will use
“void copyFromLocalFilesystem( Path src, Path dst )”
method of filesystem api.


/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param sourcePath provides the sample input file which can be written to HDFS
     * @param destinationPath refers to path on hdfs where the sample input file will be written
     * @return
     */
    public String writeToHDFS(FileSystem fileSystem, String sourcePath, String destinationPath) {
        try {
            Path inputPath = new Path(sourcePath);
            Path outputPath = new Path(destinationPath);
            fileSystem.copyFromLocalFile(inputPath, outputPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while writing file to hdfs");
            return Constants.FAILURE;
        }
    }

 

Next we can read from HDFS and store to our local file system : To perform this operation we can use
“void copyToLocalFile( Path src, Path dst )”
method of filesystem api.


/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param hdfsStorePath refers to path on hdfs where the sample input file is present
     * @param localSystemPath refers to a location of file on local system in which data read from hadoop file will be written
     * @return
     */
    public String readFileFromHdfs(FileSystem fileSystem, String hdfsStorePath, String localSystemPath) {
        try {
            Path hdfsPath = new Path(hdfsStorePath);
            Path localPath = new Path(localSystemPath);
            fileSystem.copyToLocalFile(hdfsPath, localPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while reading file from hdfs");
            return Constants.FAILURE;
        }
    }

 

4) Final step is to close the filesystem after we are done reading from HDFS or writing to HDFS :


/**
     *  This closes the FileSystem instance
     * @param fileSystem
     */
    public void closeFileSystem(FileSystem fileSystem) {
        try {
            fileSystem.close();
        } catch (Exception ex) {
            System.out.println("Unable to close Hadoop filesystem : " + ex);
        }
    }

 

References :

1) https://hadoop.apache.org/docs/r2.7.1/api/index.html?org/apache/hadoop/fs/FileSystem.html


KNOLDUS-advt-sticker

Written by 

Sangeeta is a Software Consultant and has experience of more than 2 years. She was successful in winning the inter-college competition: Venture Exposition (Project Presentation) in Innotech-Technical Event, at LBSIM. She is familiar with different technologies which include Scala, Java, Play Framework, Hadoop, Spark, HTML, CSS, Javascript. Her Hobbies include dancing, painting and practicing yoga.

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!