凌云的博客

行胜于言

Hadoop 运行 Pipe c++ 程序

分类:hadoop| 发布时间:2018-04-13 11:48:00


概述

本文主要描述了在 Ubuntu 系统中如何通过 Hadoop Pipe 运行 C++ 程序, 假设你已经通过上一篇文章搭建好 hadoop 的环境了。

例子

首先 clone 下这个最简单的例子:

https://github.com/alexanderkoumis/hadoop-wordcount-cpp.git

然后进行编译,你可能需要修改里面的 makefile。

配置伪分布模式

Hadoop Pipe 需要运行在伪分布模式或者全分布模式下,这里给出如何配置伪分布模式

配置 SSH

在伪分布模式下工作时必须启动守护进程,而启动守护进程的前提是已经成功安装 SSH。 需要确保用户能够 SSH 到本地主机,并且可以不输入密码登录

% sudo apt-ge install ssh
% ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
% cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

输入以下指令进行测试:

% ssh localhost

如果成功,则无需键入密码。

修改配置文件

配置文件所在目录默认为 $HADOOP_INSTALL/etc/hadoop

  • core-site.xml
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost</value>
    </property>
</configuration>
  • hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
  • mapred-site.xml
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:8021</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  • yarn-site.xml
<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>4.5</value>
    </property>
    <property>
        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
        <value>99.0</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>[[copy 'hadoop classpath' output to here]]</value>
    </property>
</configuration>
  • hadoop-env.sh 添加 JAVA_HOME

格式化 namenode

% hadoop namenode -format

启动守护进程

% start-all.sh

启动后,可通过 jps 查看守护进程

运行程序

  • 往 HDFS 写入数据
% hdfs dfs -put wordcount /
% hdfs dfs -put sotu_2015.txt /
  • 运行 mapred pipe
% mapred pipes -D mapreduce.pipes.isjavarecordreader=true\
    -D mapreduce.pipes.isjavarecordwriter=true\
    -input /sotu_2015.txt -output /output -program /wordcount

报错

Error: java.io.FileNotFoundException: /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1523779524051_0001/jobTokenPassword (Permission denied)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:236)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:219)
    at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318)
    at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307)
    at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338)
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:401)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1026)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:703)
    at org.apache.hadoop.mapred.pipes.Application.writePasswordToLocalFile(Application.java:173)
    at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:109)
    at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:72)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

通过在源码查找 jobTokenPassword,发现代码在:

hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/pipes/Application.java

String localPasswordFile = new File(conf.get(MRConfig.LOCAL_DIR))
    + Path.SEPARATOR + "jobTokenPassword";

查看 3.0.1 版本的相应代码为(3.0.1 版本无此问题):

String localPasswordFile = new File(".") + Path.SEPARATOR
    + "jobTokenPassword";

同时通过在 github 查找发现这是 3.1.0 改出来的一个 BUG

https://github.com/apache/hadoop/commit/995cba65fe29966583e36f9491d9a27b323918ae

将其修复后重新编译,执行成功后可通过如下命令查看结果

% hdfs dfs -cat /output/part-00000

参考