Apache Hive On Yarn


YARN is a software rewrite that decouples MapReduce’s resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. For example, Hadoop clusters can now run interactive querying and streaming data applications simultaneously with MapReduce batch jobs.in this blog we will use apache yarn on apache hive,so lets get started

  • add file yarn-site.xml inside your /usr/local/hadoop/etc/hadoop folder wth following content
    <configuration>
    <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>1024</value>
    </property>
    <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx768m</value>
    </property>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    <description>Execution framework.</description>
    </property>
    <property>
    <name>mapreduce.map.cpu.vcores</name>
    <value>1</value>
    <description>The number of virtual cores required for each map task.</description>
    </property>
    <property>
    <name>mapreduce.reduce.cpu.vcores</name>
    <value>1</value>
    <description>The number of virtual cores required for each map task.</description>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    <property>
    <name>mapreduce.map.memory.mb</name>
    <value>1024</value>
    <description>Larger resource limit for maps.</description>
    </property>
    <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx768m</value>
    <description>Heap-size for child jvms of maps.</description>
    </property>
    <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>1024</value>
    <description>Larger resource limit for reduces.</description>
    </property>
    <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx768m</value>
    <description>Heap-size for child jvms of reduces.</description>
    </property>
    <property>
    <name>mapreduce.jobtracker.address</name>
    <value>jobtracker.alexjf.net:8021</value>
    </property>
    </configuration>

     

  • first start the dfs with following command dfs.png
  • now start yarn resource manager with command yarn resourcemanager start dfs1.png
  • now start yarn nodemanager with command yarn nodemanager start dfs2.pngdf3
  • start your hive cli and fired a insert into query since it is a map reduce querydfs4.png
  • now why does this job fails? there are two ways to see the application logs one is by typing command yarn logs -applicationId <applicationId>df8.png
  • other way is through navigating to job racking specified from yarn ui  Job Tracking URL: http://knoldus:8088/proxy/application_1494156353788_0001/ dfs5.png
  • From the error message, you can see that you’re using more virtual memory than your current limit of 1.0gb. This can be resolved in two ways: one is by increasing memory of yarn.app.mapreduce.am.resource.mb to a higher value such as 4096 or you can specify this when starting hive now fired your query
  • dfs6
  • now even if you navigate to yarn ui query is successfull dfs7

i hope this blog will be helpfull for hive and yarn starters

KNOLDUS-advt-sticker

Advertisements
This entry was posted in Scala. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s