hadoop-5

HDFS basic operations and configurations.

Commands:

1
2
3
4
5
6
7
8
9
10
11
$> hadoop version // show version of hadoop.
$> hadoop fs // hdfs
$> hadoop jar
/***** File *****/
// `hdfs dfs` equal to `hadoop fs`
$> hdfs dfs mkdir -p /usr/centos/hadoop // create dir on hdfs.
$> hdfs dfs -ls -R / // list dirs on hdfs.
$> hdfs dfs -put <localfile> <hdfs dir> // upload a file to hdfs.
$> hdfs dfs -rm <remote dir/file> // remove a dir/file on hdfs.
$> hdfs dfs -appendToFile <localfile> <hdfsfile> // append a file on hdfs.
$> hdfs dfs -cat <hdfsfile> // `cat` a file on hdfs.

Block storage

  1. Block size
    • Disk seek time: 10ms
    • Disk transmission timr: 100MB/s
    • Block size: 128MB
    • seek : transmission = 1 : 100
  2. Config tmp directory

    • Show all config item for each modules
      Decompress hadoop-2.7.4.tar.gz
      Get files share/hadoop/[common | hdfs | mapredude | yarn]/hadoop-xxxx-2.7.4.jar (mapredude –> hadoop-mapreduce-client-core-2.7.4.jar)
      Decompress that file and get xxxx-default.xml
    • core-site.xml
      hadoop.tmp.dir is /home/centos01/hadoop

      1
      2
      3
      $> xsync hdfs-site.xml
      $> hdfs namenode -format // format namenode only
      $> start-dfs.sh // start hdfs
    • hdfs-site.xml
      dfs.namenode.name.dir is file://${hadoop.tmp.dir}/dfs/namenode // name node
      dfs.namenode.data.dir is file://${hadoop.tmp.dir}/dfs/data // data node
      dfs.namenode.checkpoint.dir is file://${hadoop.tmp.dir}/dfs/namesecondary // secondary name node

      1
      2
      3
      $> xsync hdfs-site.xml
      $> hdfs namenode -format // format namenode only
      $> start-dfs.sh // start hdfs