Sanya Hadoop Multinode Cluster ta amfani da CDH4 a cikin RHEL/CentOS 6.5


Hadoop shine tushen tushen shirye-shiryen budewa wanda apache ya haɓaka don sarrafa manyan bayanai. Yana amfani da HDFS (Tsarin Fayil Rarraba Hadoop) don adana bayanai a cikin duk bayanan da ke cikin gungu a cikin hanyar rarrabawa da ƙirƙira ƙirar don aiwatar da bayanan.

Namenode (NN) babban daemon ne wanda ke sarrafa HDFS kuma Jobtracker (JT) shine babban daemon don injin mapreduce.

A cikin wannan koyawa ina amfani da CentOS 6.3 VMs 'master' da 'node' wato. (Maigida da kumburin sunaye na ne). The 'master' IP ne 172.21.17.175 da kuma node IP ne '172.21.17.188'. Umurnai masu zuwa kuma suna aiki akan nau'ikan RHEL/CentOS 6.x.

 hostname

master
 ifconfig|grep 'inet addr'|head -1

inet addr:172.21.17.175  Bcast:172.21.19.255  Mask:255.255.252.0
 hostname

node
 ifconfig|grep 'inet addr'|head -1

inet addr:172.21.17.188  Bcast:172.21.19.255  Mask:255.255.252.0

Da farko ka tabbata cewa duk rundunan tari suna nan a cikin '/ sauransu/hosts'fayil (a kowane kumburi), idan ba ku da saitin DNS.

 cat /etc/hosts

172.21.17.175 master
172.21.17.188 node
 cat /etc/hosts

172.21.17.197 qabox
172.21.17.176 ansible-ground

Sanya Hadoop Multinode Cluster a cikin CentOS

Muna amfani da ma'ajin CDH na hukuma don shigar da CDH4 akan duk runduna (Master da Node) a cikin tari.

Je zuwa shafin saukar da CDH na hukuma kuma ku ɗauki sigar CDH4 (watau 4.6) ko kuna iya amfani da bin umarnin wget don saukar da ma'ajiyar ku shigar da shi.

# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/i386/cloudera-cdh-4-0.i386.rpm
# yum --nogpgcheck localinstall cloudera-cdh-4-0.i386.rpm
# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
# yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm

Kafin shigar da Hadoop Multinode Cluster, ƙara Maɓallin GPG na Jama'a na Cloudera zuwa ma'ajiyar ku ta hanyar gudanar da ɗayan umarni mai zuwa bisa ga tsarin tsarin ku.

## on 32-bit System ##

# rpm --import http://archive.cloudera.com/cdh4/redhat/6/i386/cdh/RPM-GPG-KEY-cloudera
## on 64-bit System ##

# rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

Na gaba, gudanar da umarni mai zuwa don shigarwa da saita JobTracker da NameNode akan uwar garken Jagora.

 yum clean all 
 yum install hadoop-0.20-mapreduce-jobtracker
 yum clean all
 yum install hadoop-hdfs-namenode

Hakanan, gudanar da waɗannan umarni akan uwar garken Jagora don saita kumburin suna na biyu.

 yum clean all 
 yum install hadoop-hdfs-secondarynam

Na gaba, saitin tasktracker & datanode akan duk runduna tari (Node) ban da JobTracker, NameNode, da Sakandare (ko Jiran aiki) Runduna NameNode (a kan kumburi a wannan yanayin).

 yum clean all
 yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

Kuna iya shigar da abokin ciniki na Hadoop akan na'ura daban (a wannan yanayin na shigar dashi akan datanode zaku iya shigar dashi akan kowace na'ura).

 yum install hadoop-client

Yanzu idan mun gama tare da matakai na sama, bari mu ci gaba don ƙaddamar da hdfs (domin a yi akan duk nodes).

Kwafi tsohuwar saitin zuwa /etc/hadoop directory ( akan kowane kulli a cikin tari ).

 cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
 cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster

Yi amfani da zaɓin umarni don saita kundin adireshi na al'ada, kamar haka ( akan kowane kulli a cikin tari ).

 alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
reading /var/lib/alternatives/hadoop-conf

 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
 alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
reading /var/lib/alternatives/hadoop-conf

 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

Yanzu buɗe fayil 'core-site.xml' kuma sabunta fs.defaultFS akan kowane kumburi a cikin tari.

 cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>fs.defaultFS</name>
 <value>hdfs://master/</value>
</property>
</configuration>
 cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>fs.defaultFS</name>
 <value>hdfs://master/</value>
</property>
</configuration>

Sabunta na gaba dfs.permissions.superusergroup a cikin hdfs-site.xml akan kowane kumburi a cikin tari.

 cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
     <name>dfs.name.dir</name>
     <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
  </property>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>
</configuration>
 cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
     <name>dfs.name.dir</name>
     <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
  </property>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>
</configuration>

Lura: Da fatan za a tabbatar cewa, tsarin da ke sama yana nan akan duk nodes (yi akan kumburi ɗaya kuma gudanar da scp don kwafi akan sauran nodes).

Sabunta dfs.name.dir ko dfs.namenode.name.dir a cikin 'hdfs-site.xml' akan NameNode (a kan Jagora da Node). Da fatan za a canza ƙimar kamar yadda aka haskaka.

 cat /etc/hadoop/conf/hdfs-site.xml
<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:///data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>
 cat /etc/hadoop/conf/hdfs-site.xml
<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value>
</property>

Yi aiwatar da umarni a ƙasa don ƙirƙirar tsarin shugabanci & sarrafa izinin mai amfani akan na'urar Namenode (Master) da Datanode (Node).

 mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
 chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
  mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
  chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn

Tsara Sunan Node (a kan Jagora), ta hanyar ba da umarni mai zuwa.

 sudo -u hdfs hdfs namenode -format

Ƙara dukiya mai zuwa zuwa fayil ɗin hdfs-site.xml kuma maye gurbin ƙima kamar yadda aka nuna akan Jagora.

<property>
  <name>dfs.namenode.http-address</name>
  <value>172.21.17.175:50070</value>
  <description>
    The address and port on which the NameNode UI will listen.
  </description>
</property>

Lura: A cikin yanayin mu darajar yakamata ta zama adireshin IP na babban VM.

Yanzu bari mu tura MRv1 (Map-reduce version 1). Bude 'mapred-site.xml'fayil mai bin dabi'u kamar yadda aka nuna.

 cp hdfs-site.xml mapred-site.xml
 vi mapred-site.xml
 cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
 <name>mapred.job.tracker</name>
 <value>master:8021</value>
</property>
</configuration>

Na gaba, kwafi 'mapred-site.xml'fayil zuwa injin node ta amfani da umarnin scp mai zuwa.

 scp /etc/hadoop/conf/mapred-site.xml node:/etc/hadoop/conf/
mapred-site.xml                                                                      100%  200     0.2KB/s   00:00

Yanzu saita kundayen adireshi na gida don amfani da MRv1 Daemons. Sake buɗe fayil ɗin 'mapred-site.xml' kuma yi canje-canje kamar yadda aka nuna a ƙasa don kowane TaskTracker.

<property>
 <name>mapred.local.dir</name>
 <value>/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local</value>
</property>

Bayan tantance waɗannan kundayen adireshi a cikin fayil ɗin 'mapred-site.xml', dole ne ku ƙirƙiri kundayen adireshi kuma ku sanya madaidaicin izinin fayil ɗin akan kowane kulli a cikin tarin ku.

mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local
chown -R mapred:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local

Yanzu gudanar da umarni mai zuwa don fara HDFS akan kowane kumburi a cikin tari.

 for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
 for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done

Ana buƙatar ƙirƙirar /tmp tare da izini daidai kamar yadda aka ambata a ƙasa.

 sudo -u hdfs hadoop fs -mkdir /tmp
 sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
 sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
 sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
 sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred

Yanzu tabbatar da tsarin Fayil na HDFS.

 sudo -u hdfs hadoop fs -ls -R /

drwxrwxrwt   - hdfs hadoop          	0 2014-05-29 09:58 /tmp
drwxr-xr-x   	- hdfs hadoop          	0 2014-05-29 09:59 /var
drwxr-xr-x  	- hdfs hadoop          	0 2014-05-29 09:59 /var/lib
drwxr-xr-x   	- hdfs hadoop         	0 2014-05-29 09:59 /var/lib/hadoop-hdfs
drwxr-xr-x   	- hdfs hadoop          	0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache
drwxr-xr-x   	- mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred
drwxr-xr-x   	- mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred
drwxrwxrwt   - mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging

Bayan ka fara HDFS kuma ka ƙirƙiri '/tmp', amma kafin ka fara JobTracker da fatan za a ƙirƙiri jagorar HDFS da aka ƙayyade ta ma'aunin 'mapred.system.dir' (ta tsohuwa & # 36 {hadoop.tmp.dir}/mapred/system kuma canza mai shi zuwa taswira.

 sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system
 sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system

Don fara MapReduce: da fatan za a fara ayyukan TT da JT.

 service hadoop-0.20-mapreduce-tasktracker start

Starting Tasktracker:                               [  OK  ]
starting tasktracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-node.out
 service hadoop-0.20-mapreduce-jobtracker start

Starting Jobtracker:                                [  OK  ]

starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-master.out

Na gaba, ƙirƙiri kundin adireshin gida don kowane mai amfani da hadoop. ana ba da shawarar ku yi wannan akan NameNode; misali.

[roo[email  conf]# sudo -u hdfs hadoop fs -mkdir  /user/<user>
 sudo -u hdfs hadoop fs -chown <user> /user/<user>

Lura: inda shine sunan mai amfani da Linux na kowane mai amfani.

A madadin, zaku iya ƙirƙirar kundin adireshin gida kamar haka.

 sudo -u hdfs hadoop fs -mkdir /user/$USER
 sudo -u hdfs hadoop fs -chown $USER /user/$USER

Bude burauzar ku kuma buga url azaman http://ip_address_of_namenode:50070 don samun damar Namenode.

Bude wani shafin a cikin burauzar ku kuma buga url azaman http://ip_address_of_jobtracker:50030 don samun damar JobTracker.

An gwada wannan hanya cikin nasara akan RHEL/CentOS 5.X/6.X. Da fatan za a yi sharhi a ƙasa idan kun fuskanci kowane matsala tare da shigarwa, zan taimake ku tare da mafita.