In this blog, I will present you with a java program to append to a file in HDFS.
I will be using Maven as the build tool.
Now to start with-
First, we need to add maven dependencies in pom.xml.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<dependencies> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-hdfs</artifactId> | |
<version>2.8.0</version> | |
</dependency> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-common</artifactId> | |
<version>2.8.0</version> | |
</dependency> | |
</dependencies> |
Now we need to import the following classes-
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import java.io.*;
We will be using hadoop.conf.Configuration class to set the file system configurations as per the configuration of Hadoop cluster installed.
Let us now start with configuring the file system
public FileSystem configureFileSystem(String coreSitePath, String hdfsSitePath) { FileSystem fileSystem = null; try { Configuration conf = new Configuration(); conf.setBoolean("dfs.support.append", true); Path coreSite = new Path(coreSitePath); Path hdfsSite = new Path(hdfsSitePath); conf.addResource(coreSite); conf.addResource(hdfsSite); fileSystem = FileSystem.get(conf); } catch (IOException ex) { System.out.println("Error occurred while configuring FileSystem"); } return fileSystem; }
Make sure that the property “dfs.support.append” in hdfs-site.xml is set to true.
You can either set it manually by editing hdfs-site.xml file or programmatically using-
conf.setBoolean("dfs.support.append", true);
Now that the file system is configured, we can access the files stored in Hdfs.
Let us start with appending to a file in Hdfs.
public String appendToFile(FileSystem fileSystem, String content, String dest) throws IOException { Path destPath = new Path(dest); if (!fileSystem.exists(destPath)) { System.err.println("File doesn't exist"); return "Failure"; } Boolean isAppendable = Boolean.valueOf(fileSystem.getConf().get("dfs.support.append")); if(isAppendable) { FSDataOutputStream fs_append = fileSystem.append(destPath); PrintWriter writer = new PrintWriter(fs_append); writer.append(content); writer.flush(); fs_append.hflush(); writer.close(); fs_append.close(); return "Success"; } else { System.err.println("Please set the dfs.support.append property to true"); return "Failure"; } }
Now to see whether the data has been correctly written to HDFS, let us write a method to read from HDFS and return the content as String.
public String readFromHdfs(FileSystem fileSystem, String hdfsFilePath) { Path hdfsPath = new Path(hdfsFilePath); StringBuilder fileContent = new StringBuilder(""); try{ BufferedReader bfr=new BufferedReader(new InputStreamReader(fileSystem.open(hdfsPath))); String str; while ((str = bfr.readLine()) != null) { fileContent.append(str+"\n"); } } catch (IOException ex){ System.out.println("----------Could not read from HDFS---------\n"); } return fileContent.toString(); }
After that we have successfully written and read file in HDFS, it’s time to close the file system.
public void closeFileSystem(FileSystem fileSystem){ try { fileSystem.close(); } catch (IOException ex){ System.out.println("----------Could not close the FileSystem----------"); } }
Now before executing the code, you should have hadoop running in your system.
You just need to go to your HADOOP_HOME and run following command-
./sbin/start-all.sh
For complete program, refer to my github repository https://github.com/ksimar/HDFS_AppendAPI
Happy Coding !!
Reblogged this on Prabhat Kashyap – Scala-Trek.