欢迎来到魔据教育大数据学院,专注大数据工程师培养!
当前位置:首页 > 学习资料 > 讲师博文 > Hadoop集群配置

Hadoop集群配置

时间:2017-08-17 17:28:59作者:王鹏

 

小编这里主要介绍下Hadoop HA集群中Hadoop参数所配置及启动、停止及环境验证的方法。
一、 修改配置文件
这里小编用的实验平台,共配置了 hadoop-env.sh、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、slaves六个文件。实际生活中,随着要求的提高,会有不同,这里仅供参考学习使用。
修改文件的目录地址:
/home/user/workspace/hadoop/etc/hadoop/
1. 文件 hadoop-env.sh
export JAVA_HOME=/home/user/workspace/jdk

1.png

export HADOOP_CLASSPATH=.:$CLASSPATH:
$HADOOP_CLASSPATH:$HADOOP_HOME/bin

2.png

 

export HADOOP_LOG_DIR=/home/user/yarn_data/hadoop/log
2. 文件 core-site.xml
<configuration>
<property>
   <name>fs.defaultFS</name>
   <value>hdfs://cluster1</value>
</property>
<property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
</property>
<property>
   <name>hadoop.tmp.dir</name>
   <value>/home/user/yarn_data/tmp</value>
   <description>Abase for other temporary directories.</description>
</property>
<property>
   <name>hadoop.proxyuser.hduser.hosts</name>
   <value>*</value>
</property>
<property>
   <name>hadoop.proxyuser.hduser.groups</name>
   <value>*</value>
</property>
<property>
   <name>ha.zookeeper.quorum</name>
   <value>master:2181,master0:2181,slave1:2181,slave2:2181,slave3:2181</value>
</property>
</configuration>
3. 文件 hdfs-site.xml
<configuration>
<property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/user/dfs/name</value>
</property>
<property>
   <name>dfs.datanode.data.dir</name>
   <value>/home/user/dfs/data</value>
</property>
<property>
   <name>dfs.replication</name>
   <value>2</value>
</property>
<property>
   <name>dfs.permissions</name>
   <value>false</value>
</property>
<property>
   <name>dfs.permissions.enabled</name>
   <value>false</value>
</property>
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>
<property>
   <name>dfs.datanode.max.xcievers</name>
   <value>4096</value>
</property>
<property>
   <name>dfs.nameservices</name>
   <value>cluster1</value>
</property>
<property>
   <name>dfs.ha.namenodes.cluster1</name>
   <value>hadoop1,hadoop2</value>
</property>
<property>
   <name>dfs.namenode.rpc-address.cluster1.hadoop1</name>
   <value>master:9000</value>
</property>
<property>
   <name>dfs.namenode.rpc-address.cluster1.hadoop2</name>
   <value>master0:9000</value>
</property>
<property>
   <name>dfs.namenode.http-address.cluster1.hadoop1</name>
   <value>master:50070</value>
</property>
<property>
   <name>dfs.namenode.http-address.cluster1.hadoop2</name>
   <value>master0:50070</value>
</property>
<property>
   <name>dfs.namenode.servicerpc-address.cluster1.hadoop1</name>
   <value>master:53310</value>
</property>
<property>
   <name>dfs.namenode.servicerpc-address.cluster1.hadoop2</name>
   <value>master0:53310</value>
</property>
<property>
   <name>dfs.namenode.shared.edits.dir</name>
   <value>qjournal://slave1:8485;slave2:8485;slave3:8485/cluster1</value>
</property>
<property>
   <name>dfs.journalnode.edits.dir</name>
   <value>/home/user/yarn_data/journal</value>
</property>
<property>
   <name>dfs.journalnode.http-address</name>
   <value>0.0.0.0:8480</value>
</property>
<property>
   <name>dfs.journalnode.rpc-address</name>
   <value>0.0.0.0:8485</value>
</property>
<property>
 <name>dfs.client.failover.proxy.provider.cluster1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
</value>
</property>
<property>
   <name>dfs.ha.automatic-failover.enabled.cluster1</name>
   <value>true</value>
</property>
<property>
   <name>ha.zookeeper.quorum</name>
   <value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
   <name>dfs.ha.fencing.methods</name>
   <value>sshfence</value>
</property>
<property>
   <name>dfs.ha.fencing.ssh.private-key-files</name>
   <value>/home/user/.ssh/id_rsa</value>
</property>
<property>
   <name>dfs.ha.fencing.ssh.connect-timeout</name>
   <value>10000</value>
</property>
<property>
   <name>dfs.namenode.handler.count</name>
   <value>100</value>
</property>
</configuration>
 
4. 文件yarn-site.xml
重点核心文件:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
   <name>yarn.resourcemanager.connect.retry-interval.ms</name>
   <value>2000</value>
</property>
<property>
   <name>yarn.resourcemanager.ha.enabled</name>
   <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<property>
  <name>ha.zookeeper.quorum</name>
  <value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
   <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
   <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>master</value>
</property>
<property>
   <name>yarn.resourcemanager.hostname.rm2</name>
   <value>master0</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.id</name>
  <value>rm1</value><!—主Namenode此处为rm1,从Namenode此处值为rm2-->
</property>
<property>
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-state-store.address</name>
  <value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
  <name>yarn.resourcemanager.store.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>gagcluster-yarn</value>
</property>
<property>
  <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
  <value>5000</value>  
</property>
<property>
  <name>yarn.resourcemanager.address.rm1</name>
  <value>master:8132</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address.rm1</name>
  <value>master:8130</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>master:8188</value>
</property>
<property>
   <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
   <value>master:8131</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address.rm1</name>
  <value>master:8033</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.admin.address.rm1</name>
  <value>master:23142</value>
</property>
<property>
  <name>yarn.resourcemanager.address.rm2</name>
  <value>master0:8132</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address.rm2</name>
  <value>master0:8130</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>master0:8188</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
  <value>master0:8131</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address.rm2</name>
  <value>master0:8033</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.admin.address.rm2</name>
  <value>master0:23142</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.nodemanager.local-dirs</name>
  <value>/home/user/yarn_data/local</value>
</property>
<property>
  <name>yarn.nodemanager.log-dirs</name>
  <value>/home/user/yarn_data/log/hadoop</value>
</property>
<property>
  <name>mapreduce.shuffle.port</name>
  <value>23080</value>
</property>
<property>
  <name>yarn.client.failover-proxy-provider</name>
  <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
    <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
    <value>/yarn-leader-election</value>
</property>
</configuration>
5. 文件 mapred-site.xml
重点核心文件:
<configuration>
<property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    <property>
            <name>mapreduce.jobhistory.address</name>
            <value>0.0.0.0:10020</value>
    </property>
    <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>0.0.0.0:19888</value>
     </property>
</configuration>
6. 文件 slaves
重点核心文件:

3.png

二、 配置环境变量
在所有机器上的/etc/profile文件上配置环境变量
# set hadoop environment
export HADOOP_PREFIX=/home/user/workspace/hadoop
export PATH=$HADOOP_PREFIX/bin:${PATH}
export PATH=$PATH:$HADOOP_PREFIX/sbin
三、 启动Hadoop HA环境
第一次启动环境与不是第一次启动环境方法是不一样的。
1. 第一次启动Hadoop HA环境
在所有机器上的/etc/profile文件上配置环境变量:
1) 每台机器启动Zookeeper启动命令如下
     zkServer.sh start
2) 在某一个Namenode下,一般选主Namenode节点下,执行如下命令,以创建命名空间
    hdfs zkfc -formatZK
3) 在每个节点用如下命令启动日志程序
   ./sbin/hadoop-daemon.sh start journalnode
4) 在主Namenode节点用下面命令格式化Namenode和journalnode目录
    hadoop namenode -format cluster1
5) 在主Namenode节点,启动Namenode进程
        ./sbin/hadoop-daemon.sh start namenode
6) 在从Namenode依顺序启动如下命令。其中第一行命令,是把从Namenode目录格式化并把元数据从主Namenode节点Copy到从Namenode来,并且这个命令不会把Journalnode目录再格式化了!
    hdfs namenode -bootstrapStandby
   ./sbin/hadoop-daemon.sh start namenode
7) 两个Namenode节点都执行以下命令
   ./sbin/hadoop-daemon.sh start zkfc
8) 所有datanode节点都执行下面命令,启动Datanode
   ./sbin/hadoop-daemon.sh start datanode
2. 非第一次启动HadoopHA环境
在所有机器上的/etc/profile文件上配置环境变量:
1) 每台机器启动Zookeeper启动命令如下
    zkServer.sh start
2) 两个Namenode节点都执行以下命令
        ./sbin/hadoop-daemon.sh start zkfc
3) 在主Namenode上启动
./sbin/start-all.sh
4) 在从Namenode上启动
./sbin/ yarn-daemon.sh start resourcemanager

4.png

5.png

6.png

3. 启动HadoopHA环境结果

如下图所示:

7.png

8.png

9.png

四、 停止Hadoop HA环境
在所有机器上的/etc/profile文件上配置环境变量
1) 在主Namenode上执行下面命令
./sbin/stop-dfs.sh
2) 二台Namenode上执行如下命令
./sbin/hadoop-daemon.sh stop zkfc
3) 每台机器上执行
zkServer.sh stop
五、 验证Hadoop HA环境
启动环境后,按下图输入网址会看到Master机器处于Active状态,Master0机器处于Standby状态。你可以Kill掉Master机器的Namenode进程,或者把Master机器关机,这时你会看到Master0机器由Standby状态改为Active状态。在浏览器输入以下访问地址进行查看。

图片1.png

图片2.png


更多大数据相关资讯敬请关注魔据教育,为您分享最及时的大数据资讯。
学习大数据敬请关注魔据教育微信二维码。
魔据教育微信二维码

【版权与免责声明】如发现内容存在版权问题,烦请提供相关信息发邮件至kefu@mojuedu.com,我们将及时沟通与处理。本站内容除非来源注明魔据教育,否则均为网友转载,涉及言论、版权与本站无关。

全国咨询热线:18501996998,值班手机:18501996998(7*24小时)

在线咨询:张老师QQ 320169340

企业合作服务专线:010-82340234-821, 院校合作洽谈专线:010-82340234

Copyright 2001-2019 魔据教育 - 北京华育兴业科技有限公司 版权所有,京ICP备17018991号-2

免费在线咨询立即咨询

免费索取技术资料立即索取

大数据技术交流QQ:226594285

电话咨询010-82340234