Creating A Simple Hive Udf In Scala


Sometimes the query you want to write can’t be expressed easily (or at all) using the built-in functions that Hive provides. By allowing you to write a user-defined function (UDF), Hive makes it easy to plug in your own processing code and invoke it from a Hive query,UDFs have to be written in Java, the language that Hive itself is written in. but in this blog we will write it in scala

A UDF must satisfy the following two properties:

• A UDF must be a subclass of org.apache.hadoop.hive.ql.exec.UDF.

• A UDF must implement at least one evaluate() method.

The evaluate() method is not defined by an interface, since it may take an arbitrary number of arguments, of arbitrary types, and it may return a value of arbitrary type.

Hive introspects the UDF to find the evaluate() method that matches the Hive function that was invoked.

lets get started scala version that i am using is scala 2.11,now add following properties in your build.sbt file

name := "hiveudf_example"

version := "1.0"

scalaVersion := "2.11.1"

unmanagedJars in Compile += file("/usr/lib/hive/lib/hive-exec-2.0.0.jar")

path in the file is the path of  your hive home i am hardcording it u can give it yours,create your main file as follows

package com.knoldus.udf

import org.apache.hadoop.hive.ql.exec.UDF

class Scala_Hive_Udf extends UDF {

  def evaluate(str: String): String = {
    str.trim
  }

}

i am creating udf for trim method in hive,you can create any method you want,now next task is to create assembly for your project,add sbt assembly plugin in your plugins.sbt file

logLevel := Level.Warn


addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

next step is to create jar go to your sbt console and hit command

sbt assembly

you can find your jar inside the target folder,now submit this jar to hive as udf,first start hive using hive command and submit the jar using ADD JAR command followed by path of your jar

Logging initialized using configuration in jar:file:/home/knoldus/Documents/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> ADD JAR /home/knoldus/Desktop/opensource/hiveudf_example/target/scala-2.11/hiveudf_example-assembly-1.0.jar
> ;
Added [/home/knoldus/Desktop/opensource/hiveudf_example/target/scala-2.11/hiveudf_example-assembly-1.0.jar] to class path

create a function with this udf

hive> CREATE FUNCTION trim AS 'com.knoldus.udf.Scala_Hive_Udf';
OK
Time taken: 0.47 seconds

now we  will call this function as below

hive> select trim(" hello ");
OK
hello
Time taken: 1.304 seconds, Fetched: 1 row(s)
hive>

this is the simplest way to create a udf in hive,i hope this blog helps happy coding

KNOLDUS-advt-sticker

This entry was posted in Scala. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s