目的: 在虚拟机中安装hadoop集群,并且正确运行hadoop自带的实例程序wordcount.
| 环境 | 说明 |
|---|---|
| 系统 | CentOS7 64 |
| Hadoop | hadoop-2.8.5 |
| java | java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64 |
集群规划
- Single Node集群
| 主从 | 主机id |
|---|---|
| Master | hserver1 |
| Slave | hserver1 |
- Mutli Node集群
一共有3台虚拟机, 每台虚拟机的作用规划如下:
| 主从 | 主机id |
|---|---|
| Master | hserver1 |
| Slave | hserver2, hserver3 |
| SERVER | IP ADDR | PROCESS |
|---|---|---|
| hserver1 | 192.168.48.200 | NameNode, DataNode, NodeManager, ResourceManager, JobHistoryServer |
| hserver2 | 192.168.48.201 | DataNode, NodeManager |
| hserver3 | 192.168.48.202 | DataNode, SecondaryNameNode, NodeManager |
这里为了方便, 直接使用root操作.(实际上不安全)
Single Node安装
准备工作
安装好JDK, 并设置好环境变量.从官网中下载hadoop-2.8.6.tar.gz, 并解压到/usr/local下.
修改三个server的ip为对应静态地址, 修改hostname.
注: hadoop 3.0 以上版本的默认端口与低版本 hadoop 不同。例如 Hadoop-3.2.1 上面的 NameNode WebUI 的默认端口为 9870,具体的默认端口参考: Default ports used by HDFS services.
配置hadoop
hadoop的配置文件都在etc/hadoop目录下.
修改环境变量
1.1hadoop-env.sh设置JAVA环境和hadoop home目录1
2
3
4
5Set Hadoop-specific environment variables here.
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64
HADOOP_HOME=/usr/local/hadoop-2.8.5
export JAVA_HOME=${JAVA_HOME}
export HADOOP_HOME1.2
yarn-env.sh配置 JAVA 环境1
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64
参考 关于hadoop3搭建的一些问题的解决, 当运行的 JDK 版本高于 9 时,会出现如下错误:
1
2
3
4Error injecting constructor, java.lang.NoClassDefFoundError: javax/activation/DataSource
at org.apache.hadoop.yarn.server.nodemanager.webapp.JAXBContextResolver.<init>(JAXBContextResolver.java:52)
at org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer$NMWebApp.setup(WebServer.java:153)
while locating org.apache.hadoop.yarn.server.nodemanager.webapp.JAXBContextResolver解决方法是在
yarn-env.sh中添加环境变量 :1
2export YARN_RESOURCEMANAGER_OPTS="--add-modules=ALL-SYSTEM"
export YARN_NODEMANAGER_OPTS="--add-modules=ALL-SYSTEM"配置
core-site.xml1
2
3
4
5
6
7
8
9<property>
<name>fs.defaultFS</name>
<value>hdfs://hserver1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.8.5/tmp</value>
<description>temporary data dir</description>
</property>- 配置
hdfs-site.xml
1 | <configuration> |
- 配置
mapred-site.xml1
2
3
4
5
6
7
8
9
10
11
12
13
14<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hserver1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hserver1:19888</value>
</property>
</configuration> - 配置
yarn-site.xml简单的hadoop配置完成了1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hserver1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hserver1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hserver1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hserver1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hserver1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hserver1:8088</value>
</property>
</configuration>
开启hadoop
- 格式化
namenode1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21[gt@hserver1 hadoop-2.8.5]$ pwd
/usr/local/hadoop-2.8.5
[gt@hserver1 hadoop-2.8.5]$ sudo bash ./bin/hdfs namenod -format
Error: Could not find or load main class namenod
[gt@hserver1 hadoop-2.8.5]$ sudo bash ./bin/hdfs namenode -format
18/11/13 04:04:28 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = root
STARTUP_MSG: host = localhost/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.5
STARTUP_MSG: classpath = ...
STARTUP_MSG: java = 1.8.0_191
************************************************************/
18/11/13 04:04:28 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/11/13 04:04:28 INFO namenode.NameNode: createNameNode [-format]
...
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/
报错:Please specify HADOOP_NAMENODE_USER. 意思是需要指定hadoop namenode的用户, 需要在hadoop-env.sh头部添加(后面如果有export的话可以省略export):或者将这几句添加到1
2
3
4
5export HADOOP_NAMENODE_USER=root
export HADOOP_SECONDARYNAMENODE_USER=root
export HADOOP_JOBTRACKER_USER=root
export HADOOP_DATANODE_USER=root
export HADOOP_TASKTRACKER_USER=rootstart-all.sh,stop-all.sh,start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh的头部。 开启hadoop
直接使用sbin目录下的脚本start-all.sh开启, 使用stop-all.sh停止。
hadoop-2.8.5版本中会报deprecate(但是hadoop-3.1.1不会), 可以依次使用start-dfs.sh,start-yarn.sh代替。1
2
3
4
5
6
7
8
9
10[gt@hserver1 hadoop-2.8.5]$ sudo bash ./sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hserver1]
hserver1: starting namenode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-namenode-hserver1.out
localhost: starting datanode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-datanode-hserver1.out
Starting secondary namenodes [hserver1]
hserver1: starting secondarynamenode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-secondarynamenode-hserver1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-gt-resourcemanager-hserver1.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-root-nodemanager-hserver1.out注: 开启hadoop时可能会提示需要输入用户密码, 是由于需要ssh登入, 可以使用ssh密钥免密登陆.
比如现在是root用户, 那么需要将ssh-keygen生成的id_rsa.pub公钥保存到/root/.ssh/authorized_keys文件中。查看
NodeManager
按照配置,NodeManager地址为dfs.http.address:hserver1:50070ordfs.secondary.http.address:hserver1:50090。
在浏览器地址栏输入:hserver1:50070or192.168.48.200:50070


注: 需要在/etc/hosts文件中添加127.0.0.1 hserver1对应项才能使用主机名访问方式。- 查看
ResourceManager
根据配置,yarn.resourcemanager.webapp.address的地址为hserver1:8088.
在浏览器中输入hserver1:8088
运行
wordcount实例 - 在hadoop home目录下创建目录
input, 创建文件f1.in,f2.in1
2
3
4[gt@hserver1 hadoop-2.8.5]$ cat input/f1.in
hello hadoop
hello java
hello world - 在运行的hadoop中创建目录[gray]删除为
1
[gt@hserver1 hadoop-2.8.5]$ sudo bash ./bin/hdfs dfs -mkdir /input
hdfs dfs -rm -r -f /input[gray]
可以在NodeManager中看到新建的hdfs目录hserver1:50070/explore.html#/
或者从shell中查看:
1 | sudo ./bin/hadoop dfs -ls /input |
- 上传文件
将/usr/local/hadoop-2.8.5/input目录的文件上传到hadoop目录1
2
3
4
5[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hdfs dfs -put input/* /input
[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hdfs dfs -ls /input
Found 2 items
-rw-r--r-- 3 root supergroup 36 2018-11-13 07:54 /input/f1.in
-rw-r--r-- 3 root supergroup 2710 2018-11-13 07:54 /input/f2.in
- 运行
wordcount实例wordcount实例的位置在share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar
执行下面的命令(新建了hadoop目录为/output)可以在NodeManager的LiveNode中找到对应Node的地址, 查看该Node的状态1
sudo ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input /output/wordcount.out

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hserver1]
hserver1: starting namenode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-namenode-hserver1.out
localhost: starting datanode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-datanode-hserver1.out
Starting secondary namenodes [hserver1]
hserver1: starting secondarynamenode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-secondarynamenode-hserver1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-gt-resourcemanager-hserver1.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-root-nodemanager-hserver1.out
18/11/13 21:06:17 INFO client.RMProxy: Connecting to ResourceManager at hserver1/127.0.0.1:8032
18/11/13 21:06:17 INFO input.FileInputFormat: Total input files to process : 2
18/11/13 21:06:18 INFO mapreduce.JobSubmitter: number of splits:2
18/11/13 21:06:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1542171961373_0001
18/11/13 21:06:19 INFO impl.YarnClientImpl: Submitted application application_1542171961373_0001
18/11/13 21:06:19 INFO mapreduce.Job: The url to track the job: http://hserver1:8088/proxy/application_1542171961373_0001/
18/11/13 21:06:19 INFO mapreduce.Job: Running job: job_1542171961373_0001
18/11/13 21:06:27 INFO mapreduce.Job: Job job_1542171961373_0001 running in uber mode : false
18/11/13 21:06:27 INFO mapreduce.Job: map 0% reduce 0%
18/11/13 21:06:36 INFO mapreduce.Job: map 50% reduce 0%
18/11/13 21:06:37 INFO mapreduce.Job: map 100% reduce 0%
18/11/13 21:06:43 INFO mapreduce.Job: map 100% reduce 100%
18/11/13 21:06:45 INFO mapreduce.Job: Job job_1542171961373_0001 completed successfully
18/11/13 21:06:45 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=3766
FILE: Number of bytes written=480592
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2940
HDFS: Number of bytes written=2633
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=14943
Total time spent by all reduces in occupied slots (ms)=5311
Total time spent by all map tasks (ms)=14943
Total time spent by all reduce tasks (ms)=5311
Total vcore-milliseconds taken by all map tasks=29886
Total vcore-milliseconds taken by all reduce tasks=10622
Total megabyte-milliseconds taken by all map tasks=15301632
Total megabyte-milliseconds taken by all reduce tasks=5438464
Map-Reduce Framework
Map input records=16
Map output records=429
Map output bytes=4450
Map output materialized bytes=3772
Input split bytes=194
Combine input records=429
Combine output records=283
Reduce input groups=283
Reduce shuffle bytes=3772
Reduce input records=283
Reduce output records=283
Spilled Records=566
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=665
CPU time spent (ms)=1700
Physical memory (bytes) snapshot=709869568
Virtual memory (bytes) snapshot=6373437440
Total committed heap usage (bytes)=494927872
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2746
File Output Format Counters
Bytes Written=2633
查看运行结果, 可以通过查看输出文件查看结果:
1 | [root@hserver1 hadoop-2.8.5]# sudo ./bin/hdfs dfs -ls /output |
更多可以查看part-r-00000中的内容。到这里实例运行算是结束了!!!真的是泪流满面!!!
以下是运行过程中遇到的问题
错误1: Error: Could not find or load main class jar
写错了hadoop, 不是hdfs
错误2:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input /output
18/11/13 08:07:17 INFO client.RMProxy: Connecting to ResourceManager at hserver1/127.0.0.1:8032
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hserver1:9000/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:141)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
log上说是/output已经存在。。。那运行前保证输出文件不存在。
错误3:
1 | 18/11/13 08:54:23 INFO mapreduce.Job: Task Id : attempt_1542127670113_0001_m_000000_0, Status : FAILED |
问题4: 运行一直卡在mapreduce.job: map 100 reduce 0
内存太小, 实测需要至少2G。另外还需要配置yarn和mapred的cpu-vcores.
yarn-site.xml
1 | <property> |
mapred-site.xml1
2
3
4
5
6
7
8<property>
<name>mapreduce.map.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>2</value>
</property>
错误5:
1 | INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1542168958134_0002 |
原因: yarn配置yarn.scheduler.maximum-allocation-mb太小, 实测需要至少2G。
Mutli Node集群搭建
准备工作
在Single Node的基础上搭建Multi Node集群, 先将hserver1虚拟机使用
完整克隆复制得到hserver2,hserver3.
修改hserver2, hserver3的内存大小为2G.(否则, 我的主机内存是8G, 3个4G的虚拟机不能够同时运行)开启3个虚拟机, 修改hserver2的
hostname为hserver2,ip为192.168.48.201. 修改hserver3的hostname为hserver3,ip为192.168.48.202.
将hserver2,hserver3添加到/etc/hosts文件中:1
2
3
4
5127.0.0.1 hserver1
::1 hserver1
192.168.48.200 hserver1
192.168.48.201 hserver2
192.168.48.202 hserver3注意
127.0.0.1地址对应的hostname是不同的。完成后可以使用ping hserver1测试是否修改成功。免密登陆
将hserver1的ssh公钥添加到hserver2, hserver31
2sudo scp ~/.ssh/id_rsa.pub root@hserver2:~
sudo scp ~/.ssh/id_rsa.pub root@hserver2:~由于我的是直接克隆得到的, 因此hserver2, hserver3中本来就有了hserver1的公钥
配置hadoop
指定master, 指定slaves
master的指定是通过NameNode所在的机器指定。
指定slaves编辑etc/hadoop/slaves, 删除localhost1
2
3hserver1
hserver2
hserver3添加hserver3为NodeManager
编辑etc/hadoop/hdfs-site.xml1
2
3
4<property>
<name>dfs.secondary.http.address</name>
<value>hserver3:50090</value><!-- 配置这个表示hserver3作为Secondary NameNode -->
</property>将hadoop拷贝到hserver2, hserver2
1
2sudo scp -r /usr/local/hadoop-2.8.5 root@hserver2:/usr/local/
sudo scp -r /usr/local/hadoop-2.8.5 root@hserver2:/usr/local/也可以只复制
etc部分, 毕竟原来已经有了hadoop…而且只该了etc部分, 全部复制太慢了。
开启hadoop
在NameNode的主机上(hserver1)格式化NameNode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hdfs namenode -format
18/11/14 01:26:49 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = root
STARTUP_MSG: host = hserver1/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.5
STARTUP_MSG: classpath = ...
************************************************************/
18/11/14 01:26:49 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/11/14 01:26:49 INFO namenode.NameNode: createNameNode [-format]
18/11/14 01:26:55 WARN common.Util: Path /usr/local/hadoop-2.8.5/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
18/11/14 01:26:55 WARN common.Util: Path /usr/local/hadoop-2.8.5/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-9878c048-fcde-4a68-89ef-d8ea83cf74c1
18/11/14 01:26:56 INFO namenode.FSEditLog: Edit logging is async:true
18/11/14 01:26:56 INFO namenode.FSNamesystem: KeyProvider: null
18/11/14 01:26:56 INFO namenode.FSNamesystem: fsLock is fair: true
18/11/14 01:26:56 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
18/11/14 01:26:56 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
18/11/14 01:26:56 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
18/11/14 01:26:56 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
18/11/14 01:26:56 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Nov 14 01:26:56
18/11/14 01:26:56 INFO util.GSet: Computing capacity for map BlocksMap
18/11/14 01:26:56 INFO util.GSet: VM type = 64-bit
18/11/14 01:26:56 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
18/11/14 01:26:56 INFO util.GSet: capacity = 2^21 = 2097152 entries
18/11/14 01:26:56 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
18/11/14 01:26:56 INFO blockmanagement.BlockManager: defaultReplication = 3
18/11/14 01:26:56 INFO blockmanagement.BlockManager: maxReplication = 512
18/11/14 01:26:56 INFO blockmanagement.BlockManager: minReplication = 1
18/11/14 01:26:56 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
18/11/14 01:26:56 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
18/11/14 01:26:56 INFO blockmanagement.BlockManager: encryptDataTransfer = false
18/11/14 01:26:56 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
18/11/14 01:26:56 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
18/11/14 01:26:56 INFO namenode.FSNamesystem: supergroup = supergroup
18/11/14 01:26:56 INFO namenode.FSNamesystem: isPermissionEnabled = false
18/11/14 01:26:56 INFO namenode.FSNamesystem: HA Enabled: false
18/11/14 01:26:56 INFO namenode.FSNamesystem: Append Enabled: true
18/11/14 01:26:57 INFO util.GSet: Computing capacity for map INodeMap
18/11/14 01:26:57 INFO util.GSet: VM type = 64-bit
18/11/14 01:26:57 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
18/11/14 01:26:57 INFO util.GSet: capacity = 2^20 = 1048576 entries
18/11/14 01:26:57 INFO namenode.FSDirectory: ACLs enabled? false
18/11/14 01:26:57 INFO namenode.FSDirectory: XAttrs enabled? true
18/11/14 01:26:57 INFO namenode.NameNode: Caching file names occurring more than 10 times
18/11/14 01:26:57 INFO util.GSet: Computing capacity for map cachedBlocks
18/11/14 01:26:57 INFO util.GSet: VM type = 64-bit
18/11/14 01:26:57 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
18/11/14 01:26:57 INFO util.GSet: capacity = 2^18 = 262144 entries
18/11/14 01:26:57 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
18/11/14 01:26:57 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
18/11/14 01:26:57 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
18/11/14 01:26:57 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/11/14 01:26:57 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/11/14 01:26:57 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/11/14 01:26:57 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/11/14 01:26:57 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/11/14 01:26:57 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/11/14 01:26:57 INFO util.GSet: VM type = 64-bit
18/11/14 01:26:57 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
18/11/14 01:26:57 INFO util.GSet: capacity = 2^15 = 32768 entries
18/11/14 01:26:57 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1814272250-127.0.0.1-1542187617617
18/11/14 01:26:57 INFO common.Storage: Storage directory /usr/local/hadoop-2.8.5/hdfs/name has been successfully formatted.
18/11/14 01:26:57 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop-2.8.5/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/11/14 01:26:58 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop-2.8.5/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 320 bytes saved in 0 seconds.
18/11/14 01:26:58 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/11/14 01:26:58 INFO util.ExitUtil: Exiting with status 0
18/11/14 01:26:58 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hserver1/127.0.0.1
在master上(配置有hdfs, hserver1)开启dfs
1
2
3
4
5
6
7
8[gt@hserver1 hadoop-2.8.5]$ sudo ./sbin/start-dfs.sh
Starting namenodes on [hserver1]
hserver1: starting namenode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-namenode-hserver1.out
hserver3: starting datanode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-datanode-hserver3.out
hserver1: starting datanode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-datanode-hserver1.out
hserver2: starting datanode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-datanode-hserver2.out
Starting secondary namenodes [hserver3]
hserver3: starting secondarynamenode, logging to /usr/local/hadoop-2.8.5/logs/hadoop-root-secondarynamenode-hserver3.out在配置有resource manager的主机上(hserver1)开启yarn
1
2
3
4
5
6[gt@hserver1 hadoop-2.8.5]$ sudo ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-gt-resourcemanager-hserver1.out
hserver2: starting nodemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-root-nodemanager-hserver2.out
hserver3: starting nodemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-root-nodemanager-hserver3.out
hserver1: starting nodemanager, logging to /usr/local/hadoop-2.8.5/logs/yarn-root-nodemanager-hserver1.out使用
jps(需要安装 jdk, java development toolkits)查看各个server中运行的进程1
2
3
4
5
6[gt@hserver1 hadoop-2.8.5]$ sudo jps
4082 DataNode
4419 ResourceManager
3956 NameNode
4523 NodeManager
4940 Jps
1
2
3
4[gt@hserver2 hadoop-2.8.5]$ sudo jps
4434 Jps
3977 DataNode
4140 NodeManager
1
2
3
4
5[gt@hserver3 hadoop-2.8.5]$ sudo jps
4004 NodeManager
3847 SecondaryNameNode
4359 Jps
3756 DataNode
运行wordcount实例
先将hserver1中input目录下文件put到dfs的input目录下
1
2
3
4
5
6
7
8[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hdfs dfs -mkdir /input
[sudo] password for gt:
[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hdfs dfs -mkdir /output
[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hdfs dfs -put input/* /input
[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hdfs dfs -ls /input
Found 2 items
-rw-r--r-- 3 root supergroup 36 2018-11-14 01:56 /input/f1.in
-rw-r--r-- 3 root supergroup 2710 2018-11-14 01:56 /input/f2.in运行wordcount实例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70[gt@hserver1 hadoop-2.8.5]$ sudo ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input /output/wordcount.out
18/11/14 01:58:47 INFO client.RMProxy: Connecting to ResourceManager at hserver1/127.0.0.1:8032
18/11/14 01:58:49 INFO input.FileInputFormat: Total input files to process : 2
18/11/14 01:58:50 INFO mapreduce.JobSubmitter: number of splits:2
18/11/14 01:58:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1542188103773_0001
18/11/14 01:59:02 INFO impl.YarnClientImpl: Submitted application application_1542188103773_0001
18/11/14 01:59:02 INFO mapreduce.Job: The url to track the job: http://hserver1:8088/proxy/application_1542188103773_0001/
18/11/14 01:59:02 INFO mapreduce.Job: Running job: job_1542188103773_0001
18/11/14 02:02:24 INFO mapreduce.Job: Job job_1542188103773_0001 running in uber mode : false
18/11/14 02:02:24 INFO mapreduce.Job: map 0% reduce 0%
18/11/14 02:07:09 INFO mapreduce.Job: map 100% reduce 0%
18/11/14 02:07:28 INFO mapreduce.Job: map 100% reduce 100%
18/11/14 02:07:32 INFO mapreduce.Job: Job job_1542188103773_0001 completed successfully
18/11/14 02:07:35 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=3766
FILE: Number of bytes written=480592
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2940
HDFS: Number of bytes written=2633
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=536968
Total time spent by all reduces in occupied slots (ms)=11662
Total time spent by all map tasks (ms)=536968
Total time spent by all reduce tasks (ms)=11662
Total vcore-milliseconds taken by all map tasks=1073936
Total vcore-milliseconds taken by all reduce tasks=23324
Total megabyte-milliseconds taken by all map tasks=549855232
Total megabyte-milliseconds taken by all reduce tasks=11941888
Map-Reduce Framework
Map input records=16
Map output records=429
Map output bytes=4450
Map output materialized bytes=3772
Input split bytes=194
Combine input records=429
Combine output records=283
Reduce input groups=283
Reduce shuffle bytes=3772
Reduce input records=283
Reduce output records=283
Spilled Records=566
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=43965
CPU time spent (ms)=54050
Physical memory (bytes) snapshot=663736320
Virtual memory (bytes) snapshot=6380019712
Total committed heap usage (bytes)=470286336
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2746
File Output Format Counters
Bytes Written=2633
[gt@hserver1 hadoop-2.8.5]$

停止hadoop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18[gt@hserver1 hadoop-2.8.5]$ sudo sbin/stop-all.sh
[sudo] password for gt:
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [hserver1]
hserver1: stopping namenode
hserver1: stopping datanode
hserver3: stopping datanode
hserver2: stopping datanode
Stopping secondary namenodes [hserver3]
hserver3: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
resourcemanager did not stop gracefully after 5 seconds: killing with kill -9
hserver3: no nodemanager to stop
hserver1: stopping nodemanager
hserver2: no nodemanager to stop
hserver1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
遇到的问题
hdfs 在上传文件的时候不能连接 9000 端口
问题出现
在使用
hdfs dfs -put ...上传文件时出现上述的错误:1
2hdfs@vm1:/home/gt> hdfs dfs -put ./as_training_simple.utf8 /
2019-12-05 21:46:54,708 WARN ipc.Client: Address change detected. Old: vm1/192.168.56.5:9000 New: vm1/127.0.0.1:9000 put: Call From localhost/127.0.0.1 to vm1:9000 failed on connection exception: java.net.ConnectExce ption: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
但是我用
jps查看各个组件都己经启动了(包括 NameNode, DataNode):

然后使用
netstat -nltp查看确实已经有进程在9000 端口监听:
使用
telnet vm1 9000尝试连接也得到 connection refused :
解决思路
可能是 NameNode 没有启动。这种情况可以尝试启动 dfs(
start-dfs.sh),或者 重新格式化 NameNode (hdfs namenode -format)后再启动。当然,我不属于这种情况,首先,我已经使用了dfs 在 dfs 上创建了目录,然后,我的namenode 已经启动,并且 9000 已经在监听。
可能是 DataNode 没有启动。这种情况是在网上搜到的,虽然不懂为什么会影响 9000 端口拒绝连入,但是还是排查了一下。
另外,dir.data.node.dir 的目录一定要有至少755权限。
这种情况我也不符合,因为我的 DataNode 已经启动了,并且目录具有相应权限。
可能是防火墙屏蔽了。在 ubuntu 18.0.4 LTS 上的防火墙是
ufw,但是默认是关闭的(disable)。在我的 opensuse Leap 15.1 上是
firewalld, 默认开启,但是我已经关闭了,所以也不是这种情况。可能是进行监听的 ip 只是环回地址:我就是这种情况!!!
我在查看
netstat -nltp时,发现我能够通过浏览器访问http://192.168.56.5:9870的 namenode Web UI ,但是却不能通过 telnet 访问telnet 192.168.56.5 9000, 对比这两个端口,发现前面的本地 ip 地址不同,9870 是0.0.0.0的任意地址监听,而 9000 是127.0.0.1的环回地址在监听. 我想可能修改这个本地地址可能有效。但是在哪里改呢?hadoop 在配置的时候,只有
core-site.xml中的fs.defaultFS有相应的地址,网上搜的是将hdfs://localhost:9000修改为hdfs://<your-ip>:9000, 但是我配置的时候已经是hdfs://vm1:9000了。于是我改成了 ip 地址hdfs://192.168.56.5:9000. 重新启动还是一样的。那么到底还要再哪里改呢?/etc/hosts 文件!我原来的 hosts 文件如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16127.0.0.1 localhost vm1
# special IPv6 addresses
::1 localhost vm1 ipv6-localhost ipv6-loopback
fe00::0 ipv6-localnet
ff00::0 ipv6-mcastprefix
ff02::1 ipv6-allnodes
ff02::2 ipv6-allrouters
ff02::3 ipv6-allhosts
10.0.2.5 vm1 localhost
10.0.2.6 vm2
10.0.2.7 vm3
192.168.56.5 vm1
192.168.56.6 vm2
192.168.56.7 vm3于是去掉
127.0.0.1后面的vm1, 去掉::1ipv6 地址中的vm1, 去掉10.0.2.5中的localhost, 重新启动 dfs (start-dfs.sh):1
2
3
4hdfs@vm1:/home/gt> start-dfs.sh
Starting namenodes on [vm1] vm1: Warning: Permanently added the ECDSA host key for IP address '10.0.2.5' to the list of known hosts.
Starting datanodes
Starting secondary namenodes [vm3]表示 namenode 和 datanode 都已经启动了。于是尝试 put 文件,可惜还是不行,查看 9000 端口:

可以看到 9000 前面的本地地址已经修改成了
10.0.2.5, 但是这是NAT网卡得到的 IP,各个虚拟机之间不能互相 ping 通。因此这个 ip 也是不行的。再修改 hosts 文件,调整 hosts 文件中映射到
vm1的 ip 顺序:先映射到Host-Only 网卡上的ip1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17127.0.0.1 localhost
# special IPv6 addresses
# ::1 localhost vm1 ipv6-localhost ipv6-loopback
fe00::0 ipv6-localnet
ff00::0 ipv6-mcastprefix
ff02::1 ipv6-allnodes
ff02::2 ipv6-allrouters
ff02::3 ipv6-allhosts
192.168.56.5 vm1 # 这个 ip 要在前面
192.168.56.6 vm2
192.168.56.7 vm3
10.0.2.5 vm1 # NAT ip 要在后面
10.0.2.6 vm2
10.0.2.7 vm3重新启动 dfs (start-dfs.sh) 后,上传文件到dfs 上已经成功了,9000 端口的状态如下:

发现 9000 端口的本地地址已经是
Host-Only的 ip 地址了。猜想 hosts 文件映射到主机名时应该是‘先到先得’的。总之,遇到 9000 Connection Refused 异常,
先判断 NameNode 是否已经工作,
9000 端口是否已经再监听,
防火墙是否开启或者屏蔽了 9000 端口,
监听 9000 的本次地址是否对其他 虚拟机可见(
桥接模式或者Host-Only下的 ip 地址)。
唉~自以为骚的操作,其实害惨了自己!还有一个小问题,我想使用 Virtualbox 的端口转发( forwarding ports ), 但是 又想将 NAT 网卡下的 ip 设为
static ip addr, 导致我在使用 端口转发时连不上虚拟机(实际上虚拟机也 ping 不通 10.0.2.1 ).最后修改回DHCP模式的 ip 才可以使用!其中,还遇到使用
hdfs dfsadmin -report查看datanode 的使用情况,发现Remaining Usage: 0B的问题。由于无法复现了,就不贴图了。但是也是由于 DataNode 连不上 NameNode 的原因,datanode 上的磁盘空间没有上报到 namenode , 导致总空间大小的 0B.
总结
hadoop 作为主流开源分布式处理框架, 学会其搭建和简单使用是必要的。
回顾前面的经验, 搭建主要过程如下:
- 准备hadoop包, 准备java环境, 免密登陆
- 配置hadoop, 主要的配置文件有: hadoop-env.sh, yarn-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
配置的主要内容包括: NameNode, DataNode, Yarn - 格式化NameNode
- 开启hadoop, 关闭hadoop
- 运行wordcount实例
存在的问题: 没有考虑安全问题, 如果将hadoop直接暴露在公网环境下, 存在一种virus通过yarn的8088端口, 提交脚本执行挖矿程序,
可以使用一下方法检查:1
2sudo ls /var/tmp/java # 如果存在则多数是中毒了
sudo crontab -l -u root # 如果存在一个奇怪的定时任务, 那么也是中毒了
对于hadoop, 应该设置专用的user对其控制(开启, 关闭), 在官网上有介绍。当暴露与公网环境后, 需要设置防火墙。
总之, hadoop应该继续深入学习, 包括mapreduce的原理。