Preparation
- Get all .jar files in
hadoop-2.7.x/share/hadoop
- Add to Build Path
Build Path –> Configure Build Path… –> Libraries –> Add External JARs…
Add all .jar files. - Copy log4j.properties file to HadoopAPIDemo/src dir.
Hadoop API
This functions are implemented by Configuration and FileSystem classes. Configuration is for configuring connections to remote HDFS and FileSystem is used to get file system of HDFS. Full Code can be found on github.
Read file
By URL
1234URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());URL url = new URL("hdfs://192.168.137.201:8020/usr/centos/hadoop/a.txt");URLConnection conn = url.openConnection();InputStream is = conn.getInputStream();By API
12345Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://192.168.137.201:8020");FileSystem fs = FileSystem.get(conf);Path p = new Path("/usr/centos/hadoop/a.txt");FSDataInputStream fis = fs.open(p);1234567Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://192.168.137.201:8020");FileSystem fs = FileSystem.get(conf);Path p = new Path("/usr/centos/hadoop/a.txt");FSDataInputStream fis = fs.open(p);ByteArrayOutputStream baos = new ByteArrayOutputStream();IOUtils.copyBytes(fis, baos, 1024); // copy in hadoop api
Make directory
Give write permission:$> hdfs dfs -chmod 777 /usr
12345Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://192.168.137.201:8020");FileSystem fs = FileSystem.get(conf);Path p = new Path("/usr/win7admin");fs.mkdirs(p);Put file
1234567Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://192.168.137.201:8020");FileSystem fs = FileSystem.get(conf);Path p = new Path("/usr/win7admin/win7.txt");FSDataOutputStream fsdos = fs.create(p);fsdos.writeBytes("Hello, Windows7!");fsdos.close();Remove file
12345Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://192.168.137.201:8020");FileSystem fs = FileSystem.get(conf);Path p = new Path("/usr/win7admin");fs.delete(p, true);