hadoop-1

Start to introduce Hadoop.

Show Full Path in Term

Edit /etc/profile add:

1
export PS1='[\u@\h `pwd`]\$'

$> source /etc/profile

Install JDK8

  1. Install from tar.gz package.
  2. Decompress the package.
  3. Edit /etc/profile file add PATH

    1
    2
    export JAVA_HOME=/path/to/jdk
    export PATH=$PATH:$JAVA_HOME/bin
  4. Test

    1
    2
    $> source /etc/profile
    $> java -version

Install Hadoop2.7

  1. Download tar.gz package.
  2. Decompress the package.
  3. Edit /etc/profile file add PATH

    1
    2
    export HADOOP_HOME=/path/to/jdk
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  4. Test

    1
    2
    3
    $> source /etc/profile
    $> hadoop version
    $> hdfs dfs -ls \

Hadoop Configuration

  1. Standalone(local) mode
  2. Pseudoditributed mode

    • cd into ${HADOOP_HOME}/etc/hadoop
    • edit files

      1. edit core-site.xml

        1
        2
        3
        4
        5
        6
        7
        <?xml version="1.0"?>
        <configuration>
        <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost/</value>
        </property>
        </configuration>
      2. edit hdfs-site.xml

        1
        2
        3
        4
        5
        6
        7
        <?xml version="1.0"?>
        <configuration>
        <property>
        <name>dfs.replication</name>
        <value>1</value>
        </property>
        </configuration>
      3. edit mapred-site.xml
        $> cp mapred-site.xml.template mapred-site.xml

        1
        2
        3
        4
        5
        6
        7
        <?xml version="1.0"?>
        <configuration>
        <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        </property>
        </configuration>
      4. edit yarn-site.xml

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        <?xml version="1.0"?>
        <configuration>
        <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>localhost</value>
        </property>
        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        </property>
        </configuration>
      5. SSH

        • Three packages

          1
          2
          3
          openssh ---> ssh-keygen
          openssh-client ---> ssh
          openssh-server ---> sshd
        • Check ssh

          1
          2
          $> yum list installed ssh // check `openssh` `openssh-server` `openssh-client`
          $> ps -Af | grep sshd // check if has sshd process
        • Generate ssh keys in client

          1
          $> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

          ~/.ssh/id_rsa/id_rsa is private key and ~/.ssh/id_rsa/id_rsa.pub is public key.

        • Append public key to ~/.ssh/authorized_keys in server
          1
          2
          $> cat id_rsa.pub >> authorized_keys
          $> chmod 644 authorized_keys // turn off write permission of group and others
  3. Full Distributed

  4. Config by link file

    • Create 3 config folders as ${HADOOP_HOME}/etc/hadoop:

      1
      2
      3
      ${HADOOP_HOME}/etc/local
      ${HADOOP_HOME}/etc/pesudo
      ${HADOOP_HOME}/etc/full
    • Make symbolic link to needed
      $> link -s pesudo hadoop // switch to pesudo mode