分类:hadoop| 发布时间:2018-04-13 11:48:00
本文主要描述了在 Ubuntu 系统中如何通过 Hadoop Pipe 运行 C++ 程序, 假设你已经通过上一篇文章搭建好 hadoop 的环境了。
首先 clone 下这个最简单的例子:
https://github.com/alexanderkoumis/hadoop-wordcount-cpp.git
然后进行编译,你可能需要修改里面的 makefile。
Hadoop Pipe 需要运行在伪分布模式或者全分布模式下,这里给出如何配置伪分布模式
在伪分布模式下工作时必须启动守护进程,而启动守护进程的前提是已经成功安装 SSH。 需要确保用户能够 SSH 到本地主机,并且可以不输入密码登录
% sudo apt-ge install ssh
% ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
% cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
输入以下指令进行测试:
% ssh localhost
如果成功,则无需键入密码。
配置文件所在目录默认为 $HADOOP_INSTALL/etc/hadoop
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4.5</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>99.0</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>[[copy 'hadoop classpath' output to here]]</value>
</property>
</configuration>
% hadoop namenode -format
% start-all.sh
启动后,可通过 jps 查看守护进程
% hdfs dfs -put wordcount /
% hdfs dfs -put sotu_2015.txt /
% mapred pipes -D mapreduce.pipes.isjavarecordreader=true\
-D mapreduce.pipes.isjavarecordwriter=true\
-input /sotu_2015.txt -output /output -program /wordcount
报错
Error: java.io.FileNotFoundException: /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1523779524051_0001/jobTokenPassword (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:236)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:219)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:401)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1026)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:703)
at org.apache.hadoop.mapred.pipes.Application.writePasswordToLocalFile(Application.java:173)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:109)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:72)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
通过在源码查找 jobTokenPassword,发现代码在:
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/Application.java
的
String localPasswordFile = new File(conf.get(MRConfig.LOCAL_DIR))
+ Path.SEPARATOR + "jobTokenPassword";
查看 3.0.1 版本的相应代码为(3.0.1 版本无此问题):
String localPasswordFile = new File(".") + Path.SEPARATOR
+ "jobTokenPassword";
同时通过在 github 查找发现这是 3.1.0 改出来的一个 BUG
https://github.com/apache/hadoop/commit/995cba65fe29966583e36f9491d9a27b323918ae
将其修复后重新编译,执行成功后可通过如下命令查看结果
% hdfs dfs -cat /output/part-00000