Apache Hive On Yarn

Table of contents

Reading Time: 3 minutes

YARN is a software rewrite that decouples MapReduce’s resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. For example, Hadoop clusters can now run interactive querying and streaming data applications simultaneously with MapReduce batch jobs.in this blog we will use apache yarn on apache hive,so lets get started

add file yarn-site.xml inside your /usr/local/hadoop/etc/hadoop folder wth following content
<configuration>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx768m</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework.</description>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
<description>The number of virtual cores required for each map task.</description>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
<description>The number of virtual cores required for each map task.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of reduces.</description>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>jobtracker.alexjf.net:8021</value>
</property>
</configuration>
first start the dfs with following command
now start yarn resource manager with command yarn resourcemanager start
now start yarn nodemanager with command yarn nodemanager start
start your hive cli and fired a insert into query since it is a map reduce query
now why does this job fails? there are two ways to see the application logs one is by typing command yarn logs -applicationId <applicationId>
other way is through navigating to job racking specified from yarn ui Job Tracking URL: http://knoldus:8088/proxy/application_1494156353788_0001/
From the error message, you can see that you’re using more virtual memory than your current limit of 1.0gb. This can be resolved in two ways: one is by increasing memory of yarn.app.mapreduce.am.resource.mb to a higher value such as 4096 or you can specify this when starting hive now fired your query
now even if you navigate to yarn ui query is successfull