Apache Hive On Yarn

Table of contents
Reading Time: 3 minutes

YARN is a software rewrite that decouples MapReduce’s resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. For example, Hadoop clusters can now run interactive querying and streaming data applications simultaneously with MapReduce batch jobs.in this blog we will use apache yarn on apache hive,so lets get started

  • add file yarn-site.xml inside your /usr/local/hadoop/etc/hadoop folder wth following content
    <configuration>
    <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>1024</value>
    </property>
    <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx768m</value>
    </property>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    <description>Execution framework.</description>
    </property>
    <property>
    <name>mapreduce.map.cpu.vcores</name>
    <value>1</value>
    <description>The number of virtual cores required for each map task.</description>
    </property>
    <property>
    <name>mapreduce.reduce.cpu.vcores</name>
    <value>1</value>
    <description>The number of virtual cores required for each map task.</description>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    <property>
    <name>mapreduce.map.memory.mb</name>
    <value>1024</value>
    <description>Larger resource limit for maps.</description>
    </property>
    <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx768m</value>
    <description>Heap-size for child jvms of maps.</description>
    </property>
    <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>1024</value>
    <description>Larger resource limit for reduces.</description>
    </property>
    <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx768m</value>
    <description>Heap-size for child jvms of reduces.</description>
    </property>
    <property>
    <name>mapreduce.jobtracker.address</name>
    <value>jobtracker.alexjf.net:8021</value>
    </property>
    </configuration>

     

  • first start the dfs with following command dfs.png
  • now start yarn resource manager with command yarn resourcemanager start dfs1.png
  • now start yarn nodemanager with command yarn nodemanager start dfs2.pngdf3
  • start your hive cli and fired a insert into query since it is a map reduce querydfs4.png
  • now why does this job fails? there are two ways to see the application logs one is by typing command yarn logs -applicationId <applicationId>df8.png
  • other way is through navigating to job racking specified from yarn ui  Job Tracking URL: http://knoldus:8088/proxy/application_1494156353788_0001/ dfs5.png
  • From the error message, you can see that you’re using more virtual memory than your current limit of 1.0gb. This can be resolved in two ways: one is by increasing memory of yarn.app.mapreduce.am.resource.mb to a higher value such as 4096 or you can specify this when starting hive now fired your query
  • dfs6
  • now even if you navigate to yarn ui query is successfull dfs7

i hope this blog will be helpfull for hive and yarn starters

KNOLDUS-advt-sticker

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading