Browse Source

Merge branch 'master' into 'master'

Master

See merge request xuesong/cloud-computing-course!6
wangyu
陆雪松 3 years ago
parent
commit
83111e835a
34 changed files with 2667 additions and 7 deletions
  1. +16
    -7
      Assignment3.md
  2. +146
    -0
      file/assignment3/hadoop/capacity-scheduler.xml
  3. +40
    -0
      file/assignment3/hadoop/configuration.xsl
  4. +4
    -0
      file/assignment3/hadoop/container-executor.cfg
  5. +65
    -0
      file/assignment3/hadoop/core-site.xml
  6. +0
    -0
      file/assignment3/hadoop/excludes
  7. +85
    -0
      file/assignment3/hadoop/hadoop-env.cmd
  8. +126
    -0
      file/assignment3/hadoop/hadoop-env.sh
  9. +75
    -0
      file/assignment3/hadoop/hadoop-metrics.properties
  10. +68
    -0
      file/assignment3/hadoop/hadoop-metrics2.properties
  11. +226
    -0
      file/assignment3/hadoop/hadoop-policy.xml
  12. +120
    -0
      file/assignment3/hadoop/hdfs-site.xml
  13. +55
    -0
      file/assignment3/hadoop/httpfs-env.sh
  14. +35
    -0
      file/assignment3/hadoop/httpfs-log4j.properties
  15. +1
    -0
      file/assignment3/hadoop/httpfs-signature.secret
  16. +17
    -0
      file/assignment3/hadoop/httpfs-site.xml
  17. +27
    -0
      file/assignment3/hadoop/id_rsa
  18. +135
    -0
      file/assignment3/hadoop/kms-acls.xml
  19. +59
    -0
      file/assignment3/hadoop/kms-env.sh
  20. +41
    -0
      file/assignment3/hadoop/kms-log4j.properties
  21. +173
    -0
      file/assignment3/hadoop/kms-site.xml
  22. +323
    -0
      file/assignment3/hadoop/log4j.properties
  23. +20
    -0
      file/assignment3/hadoop/mapred-env.cmd
  24. +14
    -0
      file/assignment3/hadoop/mapred-env.sh
  25. +92
    -0
      file/assignment3/hadoop/mapred-queues.xml.template
  26. +51
    -0
      file/assignment3/hadoop/mapred-site.xml
  27. +21
    -0
      file/assignment3/hadoop/mapred-site.xml.template
  28. +1
    -0
      file/assignment3/hadoop/slaves
  29. +80
    -0
      file/assignment3/hadoop/ssl-client.xml.example
  30. +88
    -0
      file/assignment3/hadoop/ssl-server.xml.example
  31. +60
    -0
      file/assignment3/hadoop/yarn-env.cmd
  32. +127
    -0
      file/assignment3/hadoop/yarn-env.sh
  33. +0
    -0
      file/assignment3/hadoop/yarn-excludes
  34. +276
    -0
      file/assignment3/hadoop/yarn-site.xml

+ 16
- 7
Assignment3.md View File

@ -91,6 +91,7 @@
1. su hadoop 切换到hadoop用户
2. hdfs dfsadmin -report 查看信息
3. exit (用完退出该用户,避免后续出现操作权限问题)
@ -100,7 +101,7 @@
### 二)熟悉基本命令
> Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。 我们可以登录Master节点通过一些基本命令操作文件,操作的命令与我们在Linux系统命令类似。
> Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。 我们可以登录Master节点通过一些基本命令操作文件,操作的命令与我们在Linux系统命令类似。`所有的操作都在root用户下执行,避免出现权限问题`
#### 列出文件
@ -112,7 +113,7 @@
> hadoop fs -mkdir <paths> 接受路径指定的uri作为参数,创建这些目录。其行为类似于Unix的mkdir -p,它会创建路径中的各级父目录。
>
> eg: hadoop fs -mkdir /dir1 /dir2 (该目录时被创建在HDFS文件系统中,而不是本地文件系统)
> eg: hadoop fs -mkdir /dir1 /dir2 (该目录时被创建在HDFS文件系统中,而不是本地文件系统)
#### 上传文件
@ -151,9 +152,9 @@
>
> 2. 在HDFS文件系统中创建文件夹 test, 然后将该文件上传到该文件夹中 `使用 mkdir 和 put `
>
> 3. 使用cat命令查看文件内容并截图 `截图中需要包含文件夹信息`
> 3. 使用cat命令查看文件内容并截图 `截图中需要包含文件夹信息` (查看的是HDFS文件系统中的info,txt,而不是本地文件系统)
>
> `操作时主要使用的用户和目录,避免出现permission denied问题`
> `操作时注意使用的用户和目录,避免出现permission denied问题`
### 三) 客户端搭建
@ -180,7 +181,7 @@
```
sh /root/install_uhadoop_client_new.sh client_ip client_user password port
client_ip: 客户机IP (申请的UHost)
client_user: 客户机上需要安装客户端的用户名
client_user: 客户机上需要安装客户端的用户名(root)
password: 客户机root密码
port:客户机ssh连接端口 (通常是22)
```
@ -226,6 +227,10 @@ hadoop jar /home/hadoop/hadoop-examples.jar wordcount /input /output 如果/ou
`**************作业4:统计/home/hadoop/etc/hadoop目录下所有文件的词频并截图,插入实验报告中***************`
`******特别注意,做完作业4 即可请删除UHadoop集群、EIP 和UHost主机等资源******`
#### WordCount 实现原理
> MapReduce主要分为两步Map步和Reduce步,引用网上流传很广的一个故事来解释,现在你要统计一个图书馆里面有多少本书,为了完成这个任务,你可以指派小明去统计书架1,指派小红去统计书架2,这个指派的过程就是Map步,最后,每个人统计完属于自己负责的书架后,再对每个人的结果进行累加统计,这个过程就是Reduce步。下图是WordCount的实现原理图,[WordCount实现](https://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.html#用法)。
@ -234,7 +239,11 @@ hadoop jar /home/hadoop/hadoop-examples.jar wordcount /input /output 如果/ou
`**************作业5:用任一语言实现单线程的WordCount,记录运行时间,然后给出运行时间截图,插入实验报告中(选做)***************`
`作业要求`
> 需要统计的文件夹: [/home/hadoop/etc/hadoop](file/assignment3/hadoop/)
>
> 在本地执行,并记录执行时间
`******特别注意,实验结束后请删除UHadoop集群、EIP 和UHost主机等资源******`

+ 146
- 0
file/assignment3/hadoop/capacity-scheduler.xml View File

@ -0,0 +1,146 @@
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>100</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>40</value>
<description>
Number of missed scheduling opportunities after which the CapacityScheduler
attempts to schedule rack-local containers.
Typically this should be set to number of nodes in the cluster, By default is setting
approximately number of nodes in one rack which is 40.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings</name>
<value></value>
<description>
A list of mappings that will be used to assign jobs to queues
The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
Typically this list will be used to map users to queues,
for example, u:%user:%user maps all users to queues with the same name
as the user.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
<value>false</value>
<description>
If a queue mapping is present, will it override the value specified
by the user? This can be used by administrators to place jobs in queues
that are different than the one specified by the user.
The default is false.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
<value>1</value>
<description>
Controls the number of OFF_SWITCH assignments allowed
during a node's heartbeat. Increasing this value can improve
scheduling rate for OFF_SWITCH containers. Lower values reduce
"clumping" of applications on particular nodes. The default is 1.
Legal values are 1-MAX_INT. This config is refreshable.
</description>
</property>
</configuration>

+ 40
- 0
file/assignment3/hadoop/configuration.xsl View File

@ -0,0 +1,40 @@
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/>
<xsl:template match="configuration">
<html>
<body>
<table border="1">
<tr>
<td>name</td>
<td>value</td>
<td>description</td>
</tr>
<xsl:for-each select="property">
<tr>
<td><a name="{name}"><xsl:value-of select="name"/></a></td>
<td><xsl:value-of select="value"/></td>
<td><xsl:value-of select="description"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

+ 4
- 0
file/assignment3/hadoop/container-executor.cfg View File

@ -0,0 +1,4 @@
yarn.nodemanager.linux-container-executor.group=#configured value of yarn.nodemanager.linux-container-executor.group
banned.users=#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
allowed.system.users=##comma separated list of system users who CAN run applications

+ 65
- 0
file/assignment3/hadoop/core-site.xml View File

@ -0,0 +1,65 @@
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://Ucluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>uhadoop-ia1nlbku-master1:2181,uhadoop-ia1nlbku-master2:2181,uhadoop-ia1nlbku-core1:2181</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
</property>
<property>
<name>ipc.maximum.data.length</name>
<value>69415731</value>
</property>
</configuration>

+ 0
- 0
file/assignment3/hadoop/excludes View File


+ 85
- 0
file/assignment3/hadoop/hadoop-env.cmd View File

@ -0,0 +1,85 @@
@echo off
@rem Licensed to the Apache Software Foundation (ASF) under one or more
@rem contributor license agreements. See the NOTICE file distributed with
@rem this work for additional information regarding copyright ownership.
@rem The ASF licenses this file to You under the Apache License, Version 2.0
@rem (the "License"); you may not use this file except in compliance with
@rem the License. You may obtain a copy of the License at
@rem
@rem http://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
@rem Set Hadoop-specific environment variables here.
@rem The only required environment variable is JAVA_HOME. All others are
@rem optional. When running a distributed configuration it is best to
@rem set JAVA_HOME in this file, so that it is correctly defined on
@rem remote nodes.
@rem The java implementation to use. Required.
set JAVA_HOME=%JAVA_HOME%
@rem The jsvc implementation to use. Jsvc is required to run secure datanodes.
@rem set JSVC_HOME=%JSVC_HOME%
@rem set HADOOP_CONF_DIR=
@rem Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
if exist %HADOOP_HOME%\contrib\capacity-scheduler (
if not defined HADOOP_CLASSPATH (
set HADOOP_CLASSPATH=%HADOOP_HOME%\contrib\capacity-scheduler\*.jar
) else (
set HADOOP_CLASSPATH=%HADOOP_CLASSPATH%;%HADOOP_HOME%\contrib\capacity-scheduler\*.jar
)
)
@rem The maximum amount of heap to use, in MB. Default is 1000.
@rem set HADOOP_HEAPSIZE=
@rem set HADOOP_NAMENODE_INIT_HEAPSIZE=""
@rem Extra Java runtime options. Empty by default.
@rem set HADOOP_OPTS=%HADOOP_OPTS% -Djava.net.preferIPv4Stack=true
@rem Command specific options appended to HADOOP_OPTS when specified
if not defined HADOOP_SECURITY_LOGGER (
set HADOOP_SECURITY_LOGGER=INFO,RFAS
)
if not defined HDFS_AUDIT_LOGGER (
set HDFS_AUDIT_LOGGER=INFO,NullAppender
)
set HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_NAMENODE_OPTS%
set HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS %HADOOP_DATANODE_OPTS%
set HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_SECONDARYNAMENODE_OPTS%
@rem The following applies to multiple commands (fs, dfs, fsck, distcp etc)
set HADOOP_CLIENT_OPTS=%HADOOP_CLIENT_OPTS%
@rem set heap args when HADOOP_HEAPSIZE is empty
if not defined HADOOP_HEAPSIZE (
set HADOOP_CLIENT_OPTS=-Xmx512m %HADOOP_CLIENT_OPTS%
)
@rem set HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData %HADOOP_JAVA_PLATFORM_OPTS%"
@rem On secure datanodes, user to run the datanode as after dropping privileges
set HADOOP_SECURE_DN_USER=%HADOOP_SECURE_DN_USER%
@rem Where log files are stored. %HADOOP_HOME%/logs by default.
@rem set HADOOP_LOG_DIR=%HADOOP_LOG_DIR%\%USERNAME%
@rem Where log files are stored in the secure data environment.
set HADOOP_SECURE_DN_LOG_DIR=%HADOOP_LOG_DIR%\%HADOOP_HDFS_USER%
@rem The directory where pid files are stored. /tmp by default.
@rem NOTE: this should be set to a directory that can only be written to by
@rem the user that will run the hadoop daemons. Otherwise there is the
@rem potential for a symlink attack.
set HADOOP_PID_DIR=%HADOOP_PID_DIR%
set HADOOP_SECURE_DN_PID_DIR=%HADOOP_PID_DIR%
@rem A string representing this instance of hadoop. %USERNAME% by default.
set HADOOP_IDENT_STRING=%USERNAME%

+ 126
- 0
file/assignment3/hadoop/hadoop-env.sh View File

@ -0,0 +1,126 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}
# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol. Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/conf"}
# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
export hadoop_pid_dir_prefix=/var/run/hadoop-hdfs
export hdfs_log_dir_prefix=/var/log/hadoop-hdfs
# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""
export HADOOP_HEAPSIZE=1024
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_NAMENODE_HEAPSIZE=3072
export HADOOP_DATANODE_HEAPSIZE=3413
export namenode_opt_newsize=128m
export namenode_opt_maxnewsize=128m
export namenode_heapsize=${HADOOP_NAMENODE_HEAPSIZE}m
export dtnode_heapsize=${HADOOP_DATANODE_HEAPSIZE}m
# Enable extra debugging of Hadoop's JAAS binding, used to set up
# Kerberos security.
# export HADOOP_JAAS_DEBUG=true
# Extra Java runtime options. Empty by default.
# For Kerberos debugging, an extended option set logs more invormation
# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=${hdfs_log_dir_prefix}/hs_err_pid%p.log -XX:NewSize=${namenode_opt_newsize} -XX:MaxNewSize=${namenode_opt_maxnewsize} -Xloggc:$hdfs_log_dir_prefix/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms${namenode_heapsize} -Xmx${namenode_heapsize} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,RFAAUDIT"
export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}"
export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=${hdfs_log_dir_prefix}/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:${hdfs_log_dir_prefix}/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms${dtnode_heapsize} -Xmx${dtnode_heapsize} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,RFAAUDIT ${HADOOP_DATANODE_OPTS} -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly"
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS"
# set heap args when HADOOP_HEAPSIZE is empty
if [ "$HADOOP_HEAPSIZE" = "" ]; then
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
fi
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"
# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol. This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
# Where log files are stored. $HADOOP_HOME/logs by default.
export HADOOP_LOG_DIR=$hdfs_log_dir_prefix
# Where log files are stored in the secure data environment.
#export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
###
# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""
###
# Advanced Users Only!
###
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
export HADOOP_PID_DIR=$hadoop_pid_dir_prefix
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

+ 75
- 0
file/assignment3/hadoop/hadoop-metrics.properties View File

@ -0,0 +1,75 @@
# Configuration of the "dfs" context for null
dfs.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "dfs" context for file
#dfs.class=org.apache.hadoop.metrics.file.FileContext
#dfs.period=10
#dfs.fileName=/tmp/dfsmetrics.log
# Configuration of the "dfs" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# dfs.period=10
# dfs.servers=localhost:8649
# Configuration of the "mapred" context for null
mapred.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "mapred" context for file
#mapred.class=org.apache.hadoop.metrics.file.FileContext
#mapred.period=10
#mapred.fileName=/tmp/mrmetrics.log
# Configuration of the "mapred" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# mapred.period=10
# mapred.servers=localhost:8649
# Configuration of the "jvm" context for null
#jvm.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "jvm" context for file
#jvm.class=org.apache.hadoop.metrics.file.FileContext
#jvm.period=10
#jvm.fileName=/tmp/jvmmetrics.log
# Configuration of the "jvm" context for ganglia
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# jvm.period=10
# jvm.servers=localhost:8649
# Configuration of the "rpc" context for null
rpc.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "rpc" context for file
#rpc.class=org.apache.hadoop.metrics.file.FileContext
#rpc.period=10
#rpc.fileName=/tmp/rpcmetrics.log
# Configuration of the "rpc" context for ganglia
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# rpc.period=10
# rpc.servers=localhost:8649
# Configuration of the "ugi" context for null
ugi.class=org.apache.hadoop.metrics.spi.NullContext
# Configuration of the "ugi" context for file
#ugi.class=org.apache.hadoop.metrics.file.FileContext
#ugi.period=10
#ugi.fileName=/tmp/ugimetrics.log
# Configuration of the "ugi" context for ganglia
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# ugi.period=10
# ugi.servers=localhost:8649

+ 68
- 0
file/assignment3/hadoop/hadoop-metrics2.properties View File

@ -0,0 +1,68 @@
# syntax: [prefix].[source|sink].[instance].[options]
# See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
# default sampling period, in seconds
*.period=10
# The namenode-metrics.out will contain metrics from all context
#namenode.sink.file.filename=namenode-metrics.out
# Specifying a special sampling period for namenode:
#namenode.sink.*.period=8
#datanode.sink.file.filename=datanode-metrics.out
#resourcemanager.sink.file.filename=resourcemanager-metrics.out
#nodemanager.sink.file.filename=nodemanager-metrics.out
#mrappmaster.sink.file.filename=mrappmaster-metrics.out
#jobhistoryserver.sink.file.filename=jobhistoryserver-metrics.out
# the following example split metrics of different
# context to different sinks (in this case files)
#nodemanager.sink.file_jvm.class=org.apache.hadoop.metrics2.sink.FileSink
#nodemanager.sink.file_jvm.context=jvm
#nodemanager.sink.file_jvm.filename=nodemanager-jvm-metrics.out
#nodemanager.sink.file_mapred.class=org.apache.hadoop.metrics2.sink.FileSink
#nodemanager.sink.file_mapred.context=mapred
#nodemanager.sink.file_mapred.filename=nodemanager-mapred-metrics.out
#
# Below are for sending metrics to Ganglia
#
# for Ganglia 3.0 support
# *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink30
#
# for Ganglia 3.1 support
# *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
# *.sink.ganglia.period=10
# default for supportsparse is false
# *.sink.ganglia.supportsparse=true
#*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
#*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
# Tag values to use for the ganglia prefix. If not defined no tags are used.
# If '*' all tags are used. If specifiying multiple tags separate them with
# commas. Note that the last segment of the property name is the context name.
#
#*.sink.ganglia.tagsForPrefix.jvm=ProcesName
#*.sink.ganglia.tagsForPrefix.dfs=
#*.sink.ganglia.tagsForPrefix.rpc=
#*.sink.ganglia.tagsForPrefix.mapred=
#namenode.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649
#datanode.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649
#resourcemanager.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649
#nodemanager.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649
#mrappmaster.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649
#jobhistoryserver.sink.ganglia.servers=yourgangliahost_1:8649,yourgangliahost_2:8649

+ 226
- 0
file/assignment3/hadoop/hadoop-policy.xml View File

@ -0,0 +1,226 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>security.client.protocol.acl</name>
<value>*</value>
<description>ACL for ClientProtocol, which is used by user code
via the DistributedFileSystem.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.client.datanode.protocol.acl</name>
<value>*</value>
<description>ACL for ClientDatanodeProtocol, the client-to-datanode protocol
for block recovery.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.datanode.protocol.acl</name>
<value>*</value>
<description>ACL for DatanodeProtocol, which is used by datanodes to
communicate with the namenode.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.inter.datanode.protocol.acl</name>
<value>*</value>
<description>ACL for InterDatanodeProtocol, the inter-datanode protocol
for updating generation timestamp.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.namenode.protocol.acl</name>
<value>*</value>
<description>ACL for NamenodeProtocol, the protocol used by the secondary
namenode to communicate with the namenode.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.admin.operations.protocol.acl</name>
<value>*</value>
<description>ACL for AdminOperationsProtocol. Used for admin commands.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.refresh.user.mappings.protocol.acl</name>
<value>*</value>
<description>ACL for RefreshUserMappingsProtocol. Used to refresh
users mappings. The ACL is a comma-separated list of user and
group names. The user and group list is separated by a blank. For
e.g. "alice,bob users,wheel". A special value of "*" means all
users are allowed.</description>
</property>
<property>
<name>security.refresh.policy.protocol.acl</name>
<value>*</value>
<description>ACL for RefreshAuthorizationPolicyProtocol, used by the
dfsadmin and mradmin commands to refresh the security policy in-effect.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.ha.service.protocol.acl</name>
<value>*</value>
<description>ACL for HAService protocol used by HAAdmin to manage the
active and stand-by states of namenode.</description>
</property>
<property>
<name>security.zkfc.protocol.acl</name>
<value>*</value>
<description>ACL for access to the ZK Failover Controller
</description>
</property>
<property>
<name>security.qjournal.service.protocol.acl</name>
<value>*</value>
<description>ACL for QJournalProtocol, used by the NN to communicate with
JNs when using the QuorumJournalManager for edit logs.</description>
</property>
<property>
<name>security.mrhs.client.protocol.acl</name>
<value>*</value>
<description>ACL for HSClientProtocol, used by job clients to
communciate with the MR History Server job status etc.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<!-- YARN Protocols -->
<property>
<name>security.resourcetracker.protocol.acl</name>
<value>*</value>
<description>ACL for ResourceTrackerProtocol, used by the
ResourceManager and NodeManager to communicate with each other.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.resourcemanager-administration.protocol.acl</name>
<value>*</value>
<description>ACL for ResourceManagerAdministrationProtocol, for admin commands.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.applicationclient.protocol.acl</name>
<value>*</value>
<description>ACL for ApplicationClientProtocol, used by the ResourceManager
and applications submission clients to communicate with each other.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.applicationmaster.protocol.acl</name>
<value>*</value>
<description>ACL for ApplicationMasterProtocol, used by the ResourceManager
and ApplicationMasters to communicate with each other.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.containermanagement.protocol.acl</name>
<value>*</value>
<description>ACL for ContainerManagementProtocol protocol, used by the NodeManager
and ApplicationMasters to communicate with each other.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.resourcelocalizer.protocol.acl</name>
<value>*</value>
<description>ACL for ResourceLocalizer protocol, used by the NodeManager
and ResourceLocalizer to communicate with each other.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.job.task.protocol.acl</name>
<value>*</value>
<description>ACL for TaskUmbilicalProtocol, used by the map and reduce
tasks to communicate with the parent tasktracker.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.job.client.protocol.acl</name>
<value>*</value>
<description>ACL for MRClientProtocol, used by job clients to
communciate with the MR ApplicationMaster to query job status etc.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
<property>
<name>security.applicationhistory.protocol.acl</name>
<value>*</value>
<description>ACL for ApplicationHistoryProtocol, used by the timeline
server and the generic history service client to communicate with each other.
The ACL is a comma-separated list of user and group names. The user and
group list is separated by a blank. For e.g. "alice,bob users,wheel".
A special value of "*" means all users are allowed.</description>
</property>
</configuration>

+ 120
- 0
file/assignment3/hadoop/hdfs-site.xml View File

@ -0,0 +1,120 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/dfs/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/dfs/dn</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>Ucluster</value>
</property>
<property>
<name>dfs.ha.namenodes.Ucluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.Ucluster.nn1</name>
<value>uhadoop-ia1nlbku-master1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.Ucluster.nn2</name>
<value>uhadoop-ia1nlbku-master2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.Ucluster.nn1</name>
<value>uhadoop-ia1nlbku-master1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.Ucluster.nn2</name>
<value>uhadoop-ia1nlbku-master2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://uhadoop-ia1nlbku-master1:8485;uhadoop-ia1nlbku-master2:8485;uhadoop-ia1nlbku-core1:8485/Ucluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/dfs/jn</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.Ucluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence(hadoop:22)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
<description>SSH connection timeout, in milliseconds, to use with the builtin shfence fencer. </description>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/conf/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.heartbeat.recheck-interval</name>
<value>45000</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>7320</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
</property>
<property>
<name>dfs.image.compress</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.num.checkpoints.retained</name>
<value>12</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>20</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>20</value>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>900000</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/conf/excludes</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
</configuration>

+ 55
- 0
file/assignment3/hadoop/httpfs-env.sh View File

@ -0,0 +1,55 @@
#!/bin/bash
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. See accompanying LICENSE file.
#
# Set httpfs specific environment variables here.
# Settings for the Embedded Tomcat that runs HttpFS
# Java System properties for HttpFS should be specified in this variable
#
# export CATALINA_OPTS=
# HttpFS logs directory
#
# export HTTPFS_LOG=${HTTPFS_HOME}/logs
export HTTPFS_LOG=/var/log/hadoop-hdfs
# HttpFS temporary directory
#
# export HTTPFS_TEMP=${HTTPFS_HOME}/temp
# The HTTP port used by HttpFS
#
# export HTTPFS_HTTP_PORT=14000
export HTTPFS_HTTP_PORT=14000
# The Admin port used by HttpFS
#
# export HTTPFS_ADMIN_PORT=`expr ${HTTPFS_HTTP_PORT} + 1`
# The hostname HttpFS server runs on
#
# export HTTPFS_HTTP_HOSTNAME=`hostname -f`
# Indicates if HttpFS is using SSL
#
# export HTTPFS_SSL_ENABLED=false
# The location of the SSL keystore if using SSL
#
# export HTTPFS_SSL_KEYSTORE_FILE=${HOME}/.keystore
# The password of the SSL keystore if using SSL
#
# export HTTPFS_SSL_KEYSTORE_PASS=password

+ 35
- 0
file/assignment3/hadoop/httpfs-log4j.properties View File

@ -0,0 +1,35 @@
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. See accompanying LICENSE file.
#
# If the Java System property 'httpfs.log.dir' is not defined at HttpFSServer start up time
# Setup sets its value to '${httpfs.home}/logs'
log4j.appender.httpfs=org.apache.log4j.DailyRollingFileAppender
log4j.appender.httpfs.DatePattern='.'yyyy-MM-dd
log4j.appender.httpfs.File=${httpfs.log.dir}/httpfs.log
log4j.appender.httpfs.Append=true
log4j.appender.httpfs.layout=org.apache.log4j.PatternLayout
log4j.appender.httpfs.layout.ConversionPattern=%d{ISO8601} %5p %c{1} [%X{hostname}][%X{user}:%X{doAs}] %X{op} %m%n
log4j.appender.httpfsaudit=org.apache.log4j.DailyRollingFileAppender
log4j.appender.httpfsaudit.DatePattern='.'yyyy-MM-dd
log4j.appender.httpfsaudit.File=${httpfs.log.dir}/httpfs-audit.log
log4j.appender.httpfsaudit.Append=true
log4j.appender.httpfsaudit.layout=org.apache.log4j.PatternLayout
log4j.appender.httpfsaudit.layout.ConversionPattern=%d{ISO8601} %5p [%X{hostname}][%X{user}:%X{doAs}] %X{op} %m%n
log4j.logger.httpfsaudit=INFO, httpfsaudit
log4j.logger.org.apache.hadoop.fs.http.server=INFO, httpfs
log4j.logger.org.apache.hadoop.lib=INFO, httpfs

+ 1
- 0
file/assignment3/hadoop/httpfs-signature.secret View File

@ -0,0 +1 @@
hadoop httpfs secret

+ 17
- 0
file/assignment3/hadoop/httpfs-site.xml View File

@ -0,0 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
</configuration>

+ 27
- 0
file/assignment3/hadoop/id_rsa View File

@ -0,0 +1,27 @@
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAtbn8y8LombFy9dmv5efbOJSOEMpYe4lWOYIMKb9k2X/lJOrN
CRhHzWJa8n2xn6zyCWZ7ZZEBn7DY/YAmATyZ5NVRGj/XNQ9RwjokqWmMMA9u0anC
j7JMn5gaApufsEsNp3aokaBQZQqdZONijAtEb/62GcbOoq8h/tg8x7Uk+3Fu7X6d
hrUck3dTSQq2qHdi24sSq9zvu4oGF8guL5Eqxijre+eYylzaJC2LUwco/UInluf9
Vzx0ch2A5qxyH5DzmpItZFMg2Z80alwQxx+1LnHfednbT1606LJkgair29h6R72i
svrztasrPawTq84MMCxiar32PRmIrKipKPfiCwIDAQABAoIBACUMauZbsTIMRESt
AbhcYYwSdTglGI7u+94zjilAtN3Gvj+dgvmUsqbDo4kGaR0FlD6oXwXg3zTgSAy+
gIEGCtXlS2iPlV9i5Sc01V6YfxUZQF2MP3cuQYLT7pGTiqXVV05J2an+xgUjed0k
omWssmImypdMubne/I5JJXMNkiGUsSMMg5BzXMwrTwoSDfak5Bj8xVMEMLXm4XVJ
Gnar1EvSXa3jfVUSxzhN20bpUXol11Y0x04ssJWcHyFO9DHOJmjR/ycIq0AG5g15
ue2XhJAQFV/vhq2yzxvB5tzfS1Hu7eDLSOL3Eke170MPOqetzjgbW5j5Ty6GoYSn
BI87jbkCgYEA8evYw9W3w7oNah8Rt4jysreX0dw+Uy+yG9AjKw6gA6oTU+Ck1HTV
yD+jsINinLVTCQBE6LrlO79N8CcXaj1dhL1KPMI5Ku+mpatPLkNEC2s9/lUEszpX
blsd7/WB8tMURV6WR5wqnFmj5npUxhwoxTmd/h6qogx7rxjRzyD2vAUCgYEAwE1b
TiXjPv3uXI8PbrzAA1sOvf0sOudmHZZ6bH7DnAg9pYzeRGjWykBB0fuQ2AMzPnhc
lIZ/hSRhEsLz5xX8BHy+clQLxs4DZpkxpUXY0w+VCF1Rpec0+fpVT3UjBWv5MDjH
1DcjUc6kDrFlyyiEMH1lG2ymUsGMWxN+YoE/ks8CgYEApmZI5PrduX0BuRrBvvIt
rYvmm2zYWbOW2NajOfyHR732KV19Qr1SRrivSLw2Wf/Gq4xJ2aKkBiKh4yugSW0I
JENnCr+1Prk0cQOSJQoThZ8wNv4Xi4f3l2qI/wJpbbKOYOCckYjzLjPiLqe6I8I+
sNneuGozh976PAfgWI4d6FkCgYAPpHs/4R8aGafRCaYUuO/Zlged9sEpTXdmAr6U
or8gqx7wn4PZBroqG43/GbqPh7scYsgNIN+woePvlcInUwd8CfWn8SRAGLP4HZAH
RKY9jO/vjT++Ag+yIeXcn8eogj7z6DqBDbcmyWtY8p84JmSSWTDnSTBCXRIgunY2
ZxMXywKBgQCfF5n1OwOwByXwuCfhYfs2B+t6WvkGqdQQuYW+LJGcJU1AkO2NvYxp
YWRybrlLvUz3crzJAxHO2If1jdpffDVY781HBYoeZwKfTg0Q+ts1zOW5PkR3rVZM
TCfMhcjgB1chwsR6Wf9N0NPRIc3+QSZStNOajCn/ATVTGliYObtb7w==
-----END RSA PRIVATE KEY-----

+ 135
- 0
file/assignment3/hadoop/kms-acls.xml View File

@ -0,0 +1,135 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<!-- This file is hot-reloaded when it changes -->
<!-- KMS ACLs -->
<property>
<name>hadoop.kms.acl.CREATE</name>
<value>*</value>
<description>
ACL for create-key operations.
If the user is not in the GET ACL, the key material is not returned
as part of the response.
</description>
</property>
<property>
<name>hadoop.kms.acl.DELETE</name>
<value>*</value>
<description>
ACL for delete-key operations.
</description>
</property>
<property>
<name>hadoop.kms.acl.ROLLOVER</name>
<value>*</value>
<description>
ACL for rollover-key operations.
If the user is not in the GET ACL, the key material is not returned
as part of the response.
</description>
</property>
<property>
<name>hadoop.kms.acl.GET</name>
<value>*</value>
<description>
ACL for get-key-version and get-current-key operations.
</description>
</property>
<property>
<name>hadoop.kms.acl.GET_KEYS</name>
<value>*</value>
<description>
ACL for get-keys operations.
</description>
</property>
<property>
<name>hadoop.kms.acl.GET_METADATA</name>
<value>*</value>
<description>
ACL for get-key-metadata and get-keys-metadata operations.
</description>
</property>
<property>
<name>hadoop.kms.acl.SET_KEY_MATERIAL</name>
<value>*</value>
<description>
Complementary ACL for CREATE and ROLLOVER operations to allow the client
to provide the key material when creating or rolling a key.
</description>
</property>
<property>
<name>hadoop.kms.acl.GENERATE_EEK</name>
<value>*</value>
<description>
ACL for generateEncryptedKey CryptoExtension operations.
</description>
</property>
<property>
<name>hadoop.kms.acl.DECRYPT_EEK</name>
<value>*</value>
<description>
ACL for decryptEncryptedKey CryptoExtension operations.
</description>
</property>
<property>
<name>default.key.acl.MANAGEMENT</name>
<value>*</value>
<description>
default ACL for MANAGEMENT operations for all key acls that are not
explicitly defined.
</description>
</property>
<property>
<name>default.key.acl.GENERATE_EEK</name>
<value>*</value>
<description>
default ACL for GENERATE_EEK operations for all key acls that are not
explicitly defined.
</description>
</property>
<property>
<name>default.key.acl.DECRYPT_EEK</name>
<value>*</value>
<description>
default ACL for DECRYPT_EEK operations for all key acls that are not
explicitly defined.
</description>
</property>
<property>
<name>default.key.acl.READ</name>
<value>*</value>
<description>
default ACL for READ operations for all key acls that are not
explicitly defined.
</description>
</property>
</configuration>

+ 59
- 0
file/assignment3/hadoop/kms-env.sh View File

@ -0,0 +1,59 @@
#!/bin/bash
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. See accompanying LICENSE file.
#
# Set kms specific environment variables here.
# Settings for the Embedded Tomcat that runs KMS
# Java System properties for KMS should be specified in this variable
#
# export CATALINA_OPTS=
# KMS logs directory
#
# export KMS_LOG=${KMS_HOME}/logs
# KMS temporary directory
#
# export KMS_TEMP=${KMS_HOME}/temp
# The HTTP port used by KMS
#
# export KMS_HTTP_PORT=16000
# The Admin port used by KMS
#
# export KMS_ADMIN_PORT=`expr ${KMS_HTTP_PORT} + 1`
# The maximum number of Tomcat handler threads
#
# export KMS_MAX_THREADS=1000
# The maximum size of Tomcat HTTP header
#
# export KMS_MAX_HTTP_HEADER_SIZE=65536
# The location of the SSL keystore if using SSL
#
# export KMS_SSL_KEYSTORE_FILE=${HOME}/.keystore
# The password of the SSL keystore if using SSL
#
# export KMS_SSL_KEYSTORE_PASS=password
# The full path to any native libraries that need to be loaded
# (For eg. location of natively compiled tomcat Apache portable
# runtime (APR) libraries
#
# export JAVA_LIBRARY_PATH=${HOME}/lib/native

+ 41
- 0
file/assignment3/hadoop/kms-log4j.properties View File

@ -0,0 +1,41 @@
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. See accompanying LICENSE file.
#
# If the Java System property 'kms.log.dir' is not defined at KMS start up time
# Setup sets its value to '${kms.home}/logs'
log4j.appender.kms=org.apache.log4j.DailyRollingFileAppender
log4j.appender.kms.DatePattern='.'yyyy-MM-dd
log4j.appender.kms.File=${kms.log.dir}/kms.log
log4j.appender.kms.Append=true
log4j.appender.kms.layout=org.apache.log4j.PatternLayout
log4j.appender.kms.layout.ConversionPattern=%d{ISO8601} %-5p %c{1} - %m%n
log4j.appender.kms-audit=org.apache.log4j.DailyRollingFileAppender
log4j.appender.kms-audit.DatePattern='.'yyyy-MM-dd
log4j.appender.kms-audit.File=${kms.log.dir}/kms-audit.log
log4j.appender.kms-audit.Append=true
log4j.appender.kms-audit.layout=org.apache.log4j.PatternLayout
log4j.appender.kms-audit.layout.ConversionPattern=%d{ISO8601} %m%n
log4j.logger.kms-audit=INFO, kms-audit
log4j.additivity.kms-audit=false
log4j.rootLogger=ALL, kms
log4j.logger.org.apache.hadoop.conf=ERROR
log4j.logger.org.apache.hadoop=INFO
log4j.logger.com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator=OFF
# make zookeeper log level an explicit config, and not changing with rootLogger.
log4j.logger.org.apache.zookeeper=INFO
log4j.logger.org.apache.curator=INFO

+ 173
- 0
file/assignment3/hadoop/kms-site.xml View File

@ -0,0 +1,173 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<!-- KMS Backend KeyProvider -->
<property>
<name>hadoop.kms.key.provider.uri</name>
<value>jceks://file@/${user.home}/kms.keystore</value>
<description>
URI of the backing KeyProvider for the KMS.
</description>
</property>
<property>
<name>hadoop.security.keystore.java-keystore-provider.password-file</name>
<value>kms.keystore.password</value>
<description>
If using the JavaKeyStoreProvider, the file name for the keystore password.
</description>
</property>
<!-- KMS Cache -->
<property>
<name>hadoop.kms.cache.enable</name>
<value>true</value>
<description>
Whether the KMS will act as a cache for the backing KeyProvider.
When the cache is enabled, operations like getKeyVersion, getMetadata,
and getCurrentKey will sometimes return cached data without consulting
the backing KeyProvider. Cached values are flushed when keys are deleted
or modified.
</description>
</property>
<property>
<name>hadoop.kms.cache.timeout.ms</name>
<value>600000</value>
<description>
Expiry time for the KMS key version and key metadata cache, in
milliseconds. This affects getKeyVersion and getMetadata.
</description>
</property>
<property>
<name>hadoop.kms.current.key.cache.timeout.ms</name>
<value>30000</value>
<description>
Expiry time for the KMS current key cache, in milliseconds. This
affects getCurrentKey operations.
</description>
</property>
<!-- KMS Audit -->
<property>
<name>hadoop.kms.audit.aggregation.window.ms</name>
<value>10000</value>
<description>
Duplicate audit log events within the aggregation window (specified in
ms) are quashed to reduce log traffic. A single message for aggregated
events is printed at the end of the window, along with a count of the
number of aggregated events.
</description>
</property>
<!-- KMS Security -->
<property>
<name>hadoop.kms.authentication.type</name>
<value>simple</value>
<description>
Authentication type for the KMS. Can be either &quot;simple&quot;
or &quot;kerberos&quot;.
</description>
</property>
<property>
<name>hadoop.kms.authentication.kerberos.keytab</name>
<value>${user.home}/kms.keytab</value>
<description>
Path to the keytab with credentials for the configured Kerberos principal.
</description>
</property>
<property>
<name>hadoop.kms.authentication.kerberos.principal</name>
<value>HTTP/localhost</value>
<description>
The Kerberos principal to use for the HTTP endpoint.
The principal must start with 'HTTP/' as per the Kerberos HTTP SPNEGO specification.
</description>
</property>
<property>
<name>hadoop.kms.authentication.kerberos.name.rules</name>
<value>DEFAULT</value>
<description>
Rules used to resolve Kerberos principal names.
</description>
</property>
<!-- Authentication cookie signature source -->
<property>
<name>hadoop.kms.authentication.signer.secret.provider</name>
<value>random</value>
<description>
Indicates how the secret to sign the authentication cookies will be
stored. Options are 'random' (default), 'string' and 'zookeeper'.
If using a setup with multiple KMS instances, 'zookeeper' should be used.
</description>
</property>
<!-- Configuration for 'zookeeper' authentication cookie signature source -->
<property>
<name>hadoop.kms.authentication.signer.secret.provider.zookeeper.path</name>
<value>/hadoop-kms/hadoop-auth-signature-secret</value>
<description>
The Zookeeper ZNode path where the KMS instances will store and retrieve
the secret from.
</description>
</property>
<property>
<name>hadoop.kms.authentication.signer.secret.provider.zookeeper.connection.string</name>
<value>#HOSTNAME#:#PORT#,...</value>
<description>
The Zookeeper connection string, a list of hostnames and port comma
separated.
</description>
</property>
<property>
<name>hadoop.kms.authentication.signer.secret.provider.zookeeper.auth.type</name>
<value>none</value>
<description>
The Zookeeper authentication type, 'none' (default) or 'sasl' (Kerberos).
</description>
</property>
<property>
<name>hadoop.kms.authentication.signer.secret.provider.zookeeper.kerberos.keytab</name>
<value>/etc/hadoop/conf/kms.keytab</value>
<description>
The absolute path for the Kerberos keytab with the credentials to
connect to Zookeeper.
</description>
</property>
<property>
<name>hadoop.kms.authentication.signer.secret.provider.zookeeper.kerberos.principal</name>
<value>kms/#HOSTNAME#</value>
<description>
The Kerberos service principal used to connect to Zookeeper.
</description>
</property>
</configuration>

+ 323
- 0
file/assignment3/hadoop/log4j.properties View File

@ -0,0 +1,323 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
hadoop.root.logger=INFO,console
hadoop.log.dir=.
hadoop.log.file=hadoop.log
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter
# Logging Threshold
log4j.threshold=ALL
# Null Appender
log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
#
# Rolling File Appender - cap space usage at 5gb.
#
hadoop.log.maxfilesize=256MB
hadoop.log.maxbackupindex=20
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# Daily Rolling File Appender
#
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
# Rollover at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# console
# Add "console" to rootlogger above if you want to use this
#
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
#
# TaskLog Appender
#
#Default values
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
#
# HDFS block state change log from block manager
#
# Uncomment the following to log normal block state change
# messages from BlockManager in NameNode.
#log4j.logger.BlockStateChange=DEBUG
#
#Security appender
#
hadoop.security.logger=INFO,NullAppender
hadoop.security.log.maxfilesize=256MB
hadoop.security.log.maxbackupindex=20
log4j.category.SecurityLogger=${hadoop.security.logger}
hadoop.security.log.file=SecurityAuth-${user.name}.audit
log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
#
# Daily Rolling Security appender
#
log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
#
# hadoop configuration logging
#
# Uncomment the following line to turn off configuration deprecation warnings.
# log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
#
# hdfs audit logging
#
hdfs.audit.logger=INFO,NullAppender
hdfs.audit.log.maxfilesize=256MB
hdfs.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
#
# NameNode metrics logging.
# The default is to retain two namenode-metrics.log files up to 64MB each.
#
namenode.metrics.logger=INFO,NullAppender
log4j.logger.NameNodeMetricsLog=${namenode.metrics.logger}
log4j.additivity.NameNodeMetricsLog=false
log4j.appender.NNMETRICSRFA=org.apache.log4j.RollingFileAppender
log4j.appender.NNMETRICSRFA.File=${hadoop.log.dir}/namenode-metrics.log
log4j.appender.NNMETRICSRFA.layout=org.apache.log4j.PatternLayout
log4j.appender.NNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
log4j.appender.NNMETRICSRFA.MaxBackupIndex=1
log4j.appender.NNMETRICSRFA.MaxFileSize=64MB
#
# DataNode metrics logging.
# The default is to retain two datanode-metrics.log files up to 64MB each.
#
datanode.metrics.logger=INFO,NullAppender
log4j.logger.DataNodeMetricsLog=${datanode.metrics.logger}
log4j.additivity.DataNodeMetricsLog=false
log4j.appender.DNMETRICSRFA=org.apache.log4j.RollingFileAppender
log4j.appender.DNMETRICSRFA.File=${hadoop.log.dir}/datanode-metrics.log
log4j.appender.DNMETRICSRFA.layout=org.apache.log4j.PatternLayout
log4j.appender.DNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
log4j.appender.DNMETRICSRFA.MaxBackupIndex=1
log4j.appender.DNMETRICSRFA.MaxFileSize=64MB
#
# mapred audit logging
#
mapred.audit.logger=INFO,NullAppender
mapred.audit.log.maxfilesize=256MB
mapred.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
# Custom Logging levels
#log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
#log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
#log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
# Jets3t library
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
# AWS SDK & S3A FileSystem
log4j.logger.com.amazonaws=ERROR
log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR
log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN
#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
#
# Job Summary Appender
#
# Use following logger to send summary to separate file defined by
# hadoop.mapreduce.jobsummary.log.file :
# hadoop.mapreduce.jobsummary.logger=INFO,JSA
#
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
hadoop.mapreduce.jobsummary.log.maxbackupindex=20
log4j.appender.JSA=org.apache.log4j.RollingFileAppender
log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
#
# shuffle connection log from shuffleHandler
# Uncomment the following line to enable logging of shuffle connections
# log4j.logger.org.apache.hadoop.mapred.ShuffleHandler.audit=DEBUG
#
# Yarn ResourceManager Application Summary Log
#
# Set the ResourceManager summary log filename
yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
# Set the ResourceManager summary log level and appender
yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
#yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
# To enable AppSummaryLogging for the RM,
# set yarn.server.resourcemanager.appsummary.logger to
# <LEVEL>,RMSUMMARY in hadoop-env.sh
# Appender for ResourceManager Application Summary Log
# Requires the following properties to be set
# - hadoop.log.dir (Hadoop Log directory)
# - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
# - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
log4j.appender.RMSUMMARY.MaxFileSize=256MB
log4j.appender.RMSUMMARY.MaxBackupIndex=20
log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
# HS audit log configs
#mapreduce.hs.audit.logger=INFO,HSAUDIT
#log4j.logger.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=${mapreduce.hs.audit.logger}
#log4j.additivity.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=false
#log4j.appender.HSAUDIT=org.apache.log4j.DailyRollingFileAppender
#log4j.appender.HSAUDIT.File=${hadoop.log.dir}/hs-audit.log
#log4j.appender.HSAUDIT.layout=org.apache.log4j.PatternLayout
#log4j.appender.HSAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
#log4j.appender.HSAUDIT.DatePattern=.yyyy-MM-dd
# Http Server Request Logs
#log4j.logger.http.requests.namenode=INFO,namenoderequestlog
#log4j.appender.namenoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
#log4j.appender.namenoderequestlog.Filename=${hadoop.log.dir}/jetty-namenode-yyyy_mm_dd.log
#log4j.appender.namenoderequestlog.RetainDays=3
#log4j.logger.http.requests.datanode=INFO,datanoderequestlog
#log4j.appender.datanoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
#log4j.appender.datanoderequestlog.Filename=${hadoop.log.dir}/jetty-datanode-yyyy_mm_dd.log
#log4j.appender.datanoderequestlog.RetainDays=3
#log4j.logger.http.requests.resourcemanager=INFO,resourcemanagerrequestlog
#log4j.appender.resourcemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
#log4j.appender.resourcemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-resourcemanager-yyyy_mm_dd.log
#log4j.appender.resourcemanagerrequestlog.RetainDays=3
#log4j.logger.http.requests.jobhistory=INFO,jobhistoryrequestlog
#log4j.appender.jobhistoryrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
#log4j.appender.jobhistoryrequestlog.Filename=${hadoop.log.dir}/jetty-jobhistory-yyyy_mm_dd.log
#log4j.appender.jobhistoryrequestlog.RetainDays=3
#log4j.logger.http.requests.nodemanager=INFO,nodemanagerrequestlog
#log4j.appender.nodemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
#log4j.appender.nodemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-nodemanager-yyyy_mm_dd.log
#log4j.appender.nodemanagerrequestlog.RetainDays=3
# WebHdfs request log on datanodes
# Specify -Ddatanode.webhdfs.logger=INFO,HTTPDRFA on datanode startup to
# direct the log to a separate file.
#datanode.webhdfs.logger=INFO,console
#log4j.logger.datanode.webhdfs=${datanode.webhdfs.logger}
#log4j.appender.HTTPDRFA=org.apache.log4j.DailyRollingFileAppender
#log4j.appender.HTTPDRFA.File=${hadoop.log.dir}/hadoop-datanode-webhdfs.log
#log4j.appender.HTTPDRFA.layout=org.apache.log4j.PatternLayout
#log4j.appender.HTTPDRFA.layout.ConversionPattern=%d{ISO8601} %m%n
#log4j.appender.HTTPDRFA.DatePattern=.yyyy-MM-dd
# Appender for viewing information for errors and warnings
yarn.ewma.cleanupInterval=300
yarn.ewma.messageAgeLimitSeconds=86400
yarn.ewma.maxUniqueMessages=250
log4j.appender.EWMA=org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender
log4j.appender.EWMA.cleanupInterval=${yarn.ewma.cleanupInterval}
log4j.appender.EWMA.messageAgeLimitSeconds=${yarn.ewma.messageAgeLimitSeconds}
log4j.appender.EWMA.maxUniqueMessages=${yarn.ewma.maxUniqueMessages}

+ 20
- 0
file/assignment3/hadoop/mapred-env.cmd View File

@ -0,0 +1,20 @@
@echo off
@rem Licensed to the Apache Software Foundation (ASF) under one or more
@rem contributor license agreements. See the NOTICE file distributed with
@rem this work for additional information regarding copyright ownership.
@rem The ASF licenses this file to You under the Apache License, Version 2.0
@rem (the "License"); you may not use this file except in compliance with
@rem the License. You may obtain a copy of the License at
@rem
@rem http://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
set HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
set HADOOP_MAPRED_ROOT_LOGGER=%HADOOP_LOGLEVEL%,RFA

+ 14
- 0
file/assignment3/hadoop/mapred-env.sh View File

@ -0,0 +1,14 @@
export JAVA_HOME=/usr/java/latest
export HADOOP_MAPRED_LOG_DIR=/var/log/hadoop-mapreduce
export HADOOP_MAPRED_PID_DIR=/var/run/hadoop-mapreduce
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
#export HADOOP_JOB_HISTORYSERVER_OPTS=
#export HADOOP_MAPRED_LOG_DIR="" # Where log files are stored. $HADOOP_MAPRED_HOME/logs by default.
#export HADOOP_JHS_LOGGER=INFO,RFA # Hadoop JobSummary logger.
#export HADOOP_MAPRED_PID_DIR= # The pid files are stored. /tmp by default.
#export HADOOP_MAPRED_IDENT_STRING= #A string representing this instance of hadoop. $USER by default
#export HADOOP_MAPRED_NICENESS= #The scheduling priority for daemons. Defaults to 0.

+ 92
- 0
file/assignment3/hadoop/mapred-queues.xml.template View File

@ -0,0 +1,92 @@
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- This is the template for queue configuration. The format supports nesting of
queues within queues - a feature called hierarchical queues. All queues are
defined within the 'queues' tag which is the top level element for this
XML document. The queue acls configured here for different queues are
checked for authorization only if the configuration property
mapreduce.cluster.acls.enabled is set to true. -->
<queues>
<!-- Configuration for a queue is specified by defining a 'queue' element. -->
<queue>
<!-- Name of a queue. Queue name cannot contain a ':' -->
<name>default</name>
<!-- properties for a queue, typically used by schedulers,
can be defined here -->
<properties>
</properties>
<!-- State of the queue. If running, the queue will accept new jobs.
If stopped, the queue will not accept new jobs. -->
<state>running</state>
<!-- Specifies the ACLs to check for submitting jobs to this queue.
If set to '*', it allows all users to submit jobs to the queue.
If set to ' '(i.e. space), no user will be allowed to do this
operation. The default value for any queue acl is ' '.
For specifying a list of users and groups the format to use is
user1,user2 group1,group2
It is only used if authorization is enabled in Map/Reduce by setting
the configuration property mapreduce.cluster.acls.enabled to true.
Irrespective of this ACL configuration, the user who started the
cluster and cluster administrators configured via
mapreduce.cluster.administrators can do this operation. -->
<acl-submit-job> </acl-submit-job>
<!-- Specifies the ACLs to check for viewing and modifying jobs in this
queue. Modifications include killing jobs, tasks of jobs or changing
priorities.
If set to '*', it allows all users to view, modify jobs of the queue.
If set to ' '(i.e. space), no user will be allowed to do this
operation.
For specifying a list of users and groups the format to use is
user1,user2 group1,group2
It is only used if authorization is enabled in Map/Reduce by setting
the configuration property mapreduce.cluster.acls.enabled to true.
Irrespective of this ACL configuration, the user who started the
cluster and cluster administrators configured via
mapreduce.cluster.administrators can do the above operations on all
the jobs in all the queues. The job owner can do all the above
operations on his/her job irrespective of this ACL configuration. -->
<acl-administer-jobs> </acl-administer-jobs>
</queue>
<!-- Here is a sample of a hierarchical queue configuration
where q2 is a child of q1. In this example, q2 is a leaf level
queue as it has no queues configured within it. Currently, ACLs
and state are only supported for the leaf level queues.
Note also the usage of properties for the queue q2.
<queue>
<name>q1</name>
<queue>
<name>q2</name>
<properties>
<property key="capacity" value="20"/>
<property key="user-limit" value="30"/>
</properties>
</queue>
</queue>
-->
</queues>

+ 51
- 0
file/assignment3/hadoop/mapred-site.xml View File

@ -0,0 +1,51 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>uhadoop-ia1nlbku-master2:10020</value>
</property>
<property>
<name>mapreduce.job.reduce.slowstart.completedmaps</name>
<value>0.95</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1843M</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1843M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>uhadoop-ia1nlbku-master2:19888</value>
</property>
</configuration>

+ 21
- 0
file/assignment3/hadoop/mapred-site.xml.template View File

@ -0,0 +1,21 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
</configuration>

+ 1
- 0
file/assignment3/hadoop/slaves View File

@ -0,0 +1 @@
localhost

+ 80
- 0
file/assignment3/hadoop/ssl-client.xml.example View File

@ -0,0 +1,80 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>ssl.client.truststore.location</name>
<value></value>
<description>Truststore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
<property>
<name>ssl.client.keystore.location</name>
<value></value>
<description>Keystore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.keystore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.keypassword</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
</configuration>

+ 88
- 0
file/assignment3/hadoop/ssl-server.xml.example View File

@ -0,0 +1,88 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>ssl.server.truststore.location</name>
<value></value>
<description>Truststore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value></value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
<property>
<name>ssl.server.keystore.location</name>
<value></value>
<description>Keystore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value></value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value></value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.server.exclude.cipher.list</name>
<value>TLS_ECDHE_RSA_WITH_RC4_128_SHA,SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA,
SSL_RSA_WITH_DES_CBC_SHA,SSL_DHE_RSA_WITH_DES_CBC_SHA,
SSL_RSA_EXPORT_WITH_RC4_40_MD5,SSL_RSA_EXPORT_WITH_DES40_CBC_SHA,
SSL_RSA_WITH_RC4_128_MD5</value>
<description>Optional. The weak security cipher suites that you want excluded
from SSL communication.</description>
</property>
</configuration>

+ 60
- 0
file/assignment3/hadoop/yarn-env.cmd View File

@ -0,0 +1,60 @@
@echo off
@rem Licensed to the Apache Software Foundation (ASF) under one or more
@rem contributor license agreements. See the NOTICE file distributed with
@rem this work for additional information regarding copyright ownership.
@rem The ASF licenses this file to You under the Apache License, Version 2.0
@rem (the "License"); you may not use this file except in compliance with
@rem the License. You may obtain a copy of the License at
@rem
@rem http://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
@rem User for YARN daemons
if not defined HADOOP_YARN_USER (
set HADOOP_YARN_USER=%yarn%
)
if not defined YARN_CONF_DIR (
set YARN_CONF_DIR=%HADOOP_YARN_HOME%\conf
)
if defined YARN_HEAPSIZE (
@rem echo run with Java heapsize %YARN_HEAPSIZE%
set JAVA_HEAP_MAX=-Xmx%YARN_HEAPSIZE%m
)
if not defined YARN_LOG_DIR (
set YARN_LOG_DIR=%HADOOP_YARN_HOME%\logs
)
if not defined YARN_LOGFILE (
set YARN_LOGFILE=yarn.log
)
@rem default policy file for service-level authorization
if not defined YARN_POLICYFILE (
set YARN_POLICYFILE=hadoop-policy.xml
)
if not defined YARN_ROOT_LOGGER (
set YARN_ROOT_LOGGER=%HADOOP_LOGLEVEL%,console
)
set YARN_OPTS=%YARN_OPTS% -Dhadoop.log.dir=%YARN_LOG_DIR%
set YARN_OPTS=%YARN_OPTS% -Dyarn.log.dir=%YARN_LOG_DIR%
set YARN_OPTS=%YARN_OPTS% -Dhadoop.log.file=%YARN_LOGFILE%
set YARN_OPTS=%YARN_OPTS% -Dyarn.log.file=%YARN_LOGFILE%
set YARN_OPTS=%YARN_OPTS% -Dyarn.home.dir=%HADOOP_YARN_HOME%
set YARN_OPTS=%YARN_OPTS% -Dyarn.id.str=%YARN_IDENT_STRING%
set YARN_OPTS=%YARN_OPTS% -Dhadoop.home.dir=%HADOOP_YARN_HOME%
set YARN_OPTS=%YARN_OPTS% -Dhadoop.root.logger=%YARN_ROOT_LOGGER%
set YARN_OPTS=%YARN_OPTS% -Dyarn.root.logger=%YARN_ROOT_LOGGER%
if defined JAVA_LIBRARY_PATH (
set YARN_OPTS=%YARN_OPTS% -Djava.library.path=%JAVA_LIBRARY_PATH%
)
set YARN_OPTS=%YARN_OPTS% -Dyarn.policy.file=%YARN_POLICYFILE%

+ 127
- 0
file/assignment3/hadoop/yarn-env.sh View File

@ -0,0 +1,127 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
export HADOOP_YARN_HOME=/home/hadoop
export YARN_LOG_DIR=/var/log/hadoop-yarn
export YARN_PID_DIR=/var/run/hadoop-yarn
export HADOOP_LIBEXEC_DIR=/home/hadoop/libexec
export JAVA_HOME=/usr/java/latest
# User for YARN daemons
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
# resolve links - $0 may be a softlink
export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"
# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
if [ "$JAVA_HOME" != "" ]; then
#echo "run java in $JAVA_HOME"
JAVA_HOME=$JAVA_HOME
fi
if [ "$JAVA_HOME" = "" ]; then
echo "Error: JAVA_HOME is not set."
exit 1
fi
JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx1000m
# For setting YARN specific HEAP sizes please use this
# Parameter and set appropriately
# YARN_HEAPSIZE=1000
# check envvars which might override default args
if [ "$YARN_HEAPSIZE" != "" ]; then
JAVA_HEAP_MAX="-Xmx""$YARN_HEAPSIZE""m"
fi
# Resource Manager specific parameters
# Specify the max Heapsize for the ResourceManager using a numerical value
# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set
# the value to 1000.
# This value will be overridden by an Xmx setting specified in either YARN_OPTS
# and/or YARN_RESOURCEMANAGER_OPTS.
# If not specified, the default value will be picked from either YARN_HEAPMAX
# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.
export YARN_RESOURCEMANAGER_HEAPSIZE=1024
# Specify the max Heapsize for the timeline server using a numerical value
# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set
# the value to 1000.
# This value will be overridden by an Xmx setting specified in either YARN_OPTS
# and/or YARN_TIMELINESERVER_OPTS.
# If not specified, the default value will be picked from either YARN_HEAPMAX
# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.
export YARN_TIMELINESERVER_HEAPSIZE=1000
# Specify the JVM options to be used when starting the ResourceManager.
# These options will be appended to the options specified as YARN_OPTS
# and therefore may override any similar flags set in YARN_OPTS
#export YARN_RESOURCEMANAGER_OPTS=
# Node Manager specific parameters
# Specify the max Heapsize for the NodeManager using a numerical value
# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set
# the value to 1000.
# This value will be overridden by an Xmx setting specified in either YARN_OPTS
# and/or YARN_NODEMANAGER_OPTS.
# If not specified, the default value will be picked from either YARN_HEAPMAX
# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.
export YARN_NODEMANAGER_HEAPSIZE=1024
# Specify the JVM options to be used when starting the NodeManager.
# These options will be appended to the options specified as YARN_OPTS
# and therefore may override any similar flags set in YARN_OPTS
#export YARN_NODEMANAGER_OPTS=
# so that filenames w/ spaces are handled correctly in loops below
IFS=
# default log directory & file
if [ "$YARN_LOG_DIR" = "" ]; then
YARN_LOG_DIR="$HADOOP_YARN_HOME/logs"
fi
if [ "$YARN_LOGFILE" = "" ]; then
YARN_LOGFILE='yarn.log'
fi
# default policy file for service-level authorization
if [ "$YARN_POLICYFILE" = "" ]; then
YARN_POLICYFILE="hadoop-policy.xml"
fi
# restore ordinary behaviour
unset IFS
YARN_OPTS="$YARN_OPTS -Dhadoop.log.dir=$YARN_LOG_DIR"
YARN_OPTS="$YARN_OPTS -Dyarn.log.dir=$YARN_LOG_DIR"
YARN_OPTS="$YARN_OPTS -Dhadoop.log.file=$YARN_LOGFILE"
YARN_OPTS="$YARN_OPTS -Dyarn.log.file=$YARN_LOGFILE"
YARN_OPTS="$YARN_OPTS -Dyarn.home.dir=$YARN_COMMON_HOME"
YARN_OPTS="$YARN_OPTS -Dyarn.id.str=$YARN_IDENT_STRING"
YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
fi
YARN_OPTS="$YARN_OPTS -Dyarn.policy.file=$YARN_POLICYFILE"
YARN_OPTS="$YARN_OPTS -Dfile.encoding=UTF-8"

+ 0
- 0
file/assignment3/hadoop/yarn-excludes View File


+ 276
- 0
file/assignment3/hadoop/yarn-site.xml View File

@ -0,0 +1,276 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<configuration>
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>ucloud-yarn-rm-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>uhadoop-ia1nlbku-master1:2181,uhadoop-ia1nlbku-master2:2181,uhadoop-ia1nlbku-core1:2181</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>true</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>true</value>
</property>
<property>
<name>yarn.admin.acl</name>
<value>yarn,mapred,hdfs,hadoop</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>uhadoop-ia1nlbku-master1:23140</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>uhadoop-ia1nlbku-master1:23130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>uhadoop-ia1nlbku-master1:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>uhadoop-ia1nlbku-master1:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>uhadoop-ia1nlbku-master1:23125</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>uhadoop-ia1nlbku-master1:23141</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>uhadoop-ia1nlbku-master2:23140</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>uhadoop-ia1nlbku-master2:23130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>uhadoop-ia1nlbku-master2:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>uhadoop-ia1nlbku-master2:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>uhadoop-ia1nlbku-master2:23125</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>uhadoop-ia1nlbku-master2:23141</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5460</value>
<description>64G. Physical memory, in MB, to be made available to running containers.</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>65535</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/yarn/logs</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
</property>
<property>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>5184000</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>86400</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://Ucluster/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>hadoop</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>65535</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:23333</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/home/hadoop/conf/yarn-excludes</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>32</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://uhadoop-ia1nlbku-master2:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>uhadoop-ia1nlbku-master2</value>
</property>
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.leveldb-timeline-store.path</name>
<value>/data/yarn/timeline</value>
</property>
<property>
<name>yarn.timeline-service.leveldb-state-store.path</name>
<value>/data/yarn/timeline</value>
</property>
<property>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10200</value>
</property>
<property>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:8190</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.handler-thread-count</name>
<value>10</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
</configuration>

Loading…
Cancel
Save