IBM大数据处理平台BigInsights还(2)

接上篇《初探IBM大数据处理平台BigInsights还(1)》,本篇讲述Hadoop的一些基础命令及利用MapReduce运行一个简单的WordCount程序

,

1,在HDFS文件系统上创建测试目录

Hadoop fs mkdir/user/biadmin/测试

,

2,将文件拷贝到测试目录下

Hadoop fs——/var/adm/ibmvmcoc-postinstall/BIlicense_en。txt/user/biadmin/测试

,

3,查看测试目录下是否多了这个文件

biadmin@bivm:/etc/ibmvmcoc-postinstall>hadoop fs - ls/user/biadmin/测试

发现1项目

-rw-r - r - 1 biadmin biadmin 62949 2016-01-01 22:34/user/biadmin/测试/BIlicense_en.txt

,

4,运行一个简单的MapReduce程序

WordCount是用JAVA写的针对hadoop MapReduce的一个小程序,用于统计文本中每个单词的出现次数,关于WordCount更多内容请参考http://wiki.apache.org/hadoop/WordCount

,

执行程序是hadoop-example.jar,内容是在刚刚创建的测试目录下,输出到WordCount_outpt子目录中。如果没有此目录,会自动创建。

biadmin@bivm:/etc/ibmvmcoc-postinstall> <强> hadoop jar/opt/ibm/biginsights还包含IHC/hadoop-example。jar wordcount/user/biadmin/测试WordCount_output

16/01/01 22:36:08信息输入。FileInputFormat:总输入路径过程:1

16/01/01 mapred 22:36:18信息。JobClient:运行工作:job_201601012120_0001

16/01/01 mapred 22:36:19信息。JobClient:地图0%减少0%

16/01/01 mapred 22:37:58信息。JobClient:地图100%减少0%

16/01/01 mapred 22:39:07信息。JobClient:地图100%减少100%

16/01/01 mapred 22:39:14信息。JobClient:工作完成:job_201601012120_0001

16/01/01 mapred 22:39:15信息。JobClient:计数器:29日

16/01/01 mapred 22:39:15信息。JobClient:文件系统计数器

16/01/01 mapred 22:39:15信息。JobClient:文件:BYTES_READ=33219

16/01/01 mapred 22:39:15信息。JobClient:文件:BYTES_WRITTEN=419738

16/01/01 mapred 22:39:15信息。JobClient: HDFS: BYTES_READ=63073

16/01/01 mapred 22:39:15信息。JobClient: HDFS: BYTES_WRITTEN=24073

16/01/01 mapred 22:39:15信息。JobClient: org.apache.hadoop.mapreduce。JobCounter

16/01/01 mapred 22:39:15信息。JobClient: TOTAL_LAUNCHED_MAPS=1

16/01/01 mapred 22:39:15信息。JobClient: TOTAL_LAUNCHED_REDUCES=1

16/01/01 mapred 22:39:15信息。JobClient: DATA_LOCAL_MAPS=1

16/01/01 mapred 22:39:15信息。JobClient: SLOTS_MILLIS_MAPS=95300

16/01/01 mapred 22:39:15信息。JobClient: SLOTS_MILLIS_REDUCES=50249

16/01/01 mapred 22:39:15信息。JobClient: FALLOW_SLOTS_MILLIS_MAPS=0

16/01/01 mapred 22:39:15信息。JobClient: FALLOW_SLOTS_MILLIS_REDUCES=0

16/01/01 mapred 22:39:15信息。JobClient: org.apache.hadoop.mapreduce。TaskCounter

16/01/01 mapred 22:39:15信息。JobClient: MAP_INPUT_RECORDS=755

16/01/01 mapred 22:39:15信息。JobClient: MAP_OUTPUT_RECORDS=9865

16/01/01 mapred 22:39:15信息。JobClient: MAP_OUTPUT_BYTES=102036

16/01/01 mapred 22:39:15信息。JobClient: MAP_OUTPUT_MATERIALIZED_BYTES=33219

16/01/01 mapred 22:39:15信息。JobClient: SPLIT_RAW_BYTES=124

16/01/01 mapred 22:39:15信息。JobClient: COMBINE_INPUT_RECORDS=9865

16/01/01 mapred 22:39:15信息。JobClient: COMBINE_OUTPUT_RECORDS=2322

16/01/01 mapred 22:39:15信息。JobClient: REDUCE_INPUT_GROUPS=2322

16/01/01 mapred 22:39:15信息。JobClient: REDUCE_SHUFFLE_BYTES=33219

16/01/01 mapred 22:39:15信息。JobClient: REDUCE_INPUT_RECORDS=2322

16/01/01 mapred 22:39:15信息。JobClient: REDUCE_OUTPUT_RECORDS=2322

16/01/01 mapred 22:39:15信息。JobClient: SPILLED_RECORDS=4644

16/01/01 mapred 22:39:15信息。JobClient: CPU_MILLISECONDS=22130

16/01/01 mapred 22:39:15信息。JobClient: PHYSICAL_MEMORY_BYTES=538050560

16/01/01 mapred 22:39:15信息。JobClient: VIRTUAL_MEMORY_BYTES=3549384704

16/01/01 mapred 22:39:15信息。JobClient: COMMITTED_HEAP_BYTES=2097152000

16/01/01 mapred 22:39:15信息。JobClient:文件输入格式计数器

16/01/01 mapred 22:39:15信息。JobClient:字节读=62949

16/01/01 mapred 22:39:15信息。JobClient: org.apache.hadoop.mapreduce.lib.output。FileOutputFormat $ Counter

16/01/01 mapred 22:39:15信息。JobClient: BYTES_WRITTEN=24073

,

会自动创建WordCount_output目录

biadmin@bivm:/etc/ibmvmcoc-postinstall>hadoop fs - ls WordCount_output

发现3项

-rw-r - r - 1 biadmin biadmin 0 2016-01-01 22:39 WordCount_output/_SUCCESS

drwx - x - x - biadmin biadmin 0 2016-01-01 22:36 WordCount_output/_logs

-rw-r - r - 1 biadmin biadmin 24073 2016-01-01 22:39 WordCount_output/部分- r - 00000

,

biadmin@bivm: ~比;hadoop fs猫WordCount_output/* 00

IBM大数据处理平台BigInsights还(2)