hadoop-2

Use Hadoop pesudo distributed.

Start Hadoop

  1. Specify Env for Hadoop
    Edit ${HADOOP_HOME}/etc/hadoop/hadoop_env.sh

    1
    export JAVA_HOME=/opt/jdk
  2. Start all processes

    • Format and start

      1
      2
      3
      $> hadoop namenode -format // format file system
      $> start-all.sh // start all processes
      $> jps // check processes
    • Check file system by WebUI

      1
      $> /etc/init.d/iptables stop // stop firewall

      Open in browser:
      http://localhost:50070

  3. Check HDFS file system
    1
    2
    3
    $> hdfs dfs -ls /
    $> hdfs dfs -mkdir -p /usr/centos01/hadoop // make a directory
    $> hdfs dfs -lsr /

Hadoop Introduce

  1. Hadoop ports
    50070 –> namenode http port
    50075 –> datanode http port
    50090 –> 2nd namenode http port
    8020 –> namenode rpc port
    50010 –> datanode rpc port
  2. Hadoop 4 modules
    • common + hdfs
      NameNode, SecondaryNameNode, DataNode
    • mapred + yarn
      NodeManager, ResourceManager
  3. Hadoop scripts
    • ALL
      start-all.sh: start all nodes
      stop-all.sh: stop all nodes
    • HDFS
      start-dfs.sh: NameNode, DataNode, 2ndNameNode
      stop-dfs.sh :
    • YARN:
      start-yarn.sh: NodeManager, ResourceManager
      stop-yarn.sh: