Creating A Simple Hive Udf In Scala

Table of contents

Reading Time: 2 minutes

Sometimes the query you want to write can’t be expressed easily (or at all) using the built-in functions that Hive provides. By allowing you to write a user-defined function (UDF), Hive makes it easy to plug in your own processing code and invoke it from a Hive query,UDFs have to be written in Java, the language that Hive itself is written in. but in this blog we will write it in scala

A UDF must satisfy the following two properties:

• A UDF must be a subclass of org.apache.hadoop.hive.ql.exec.UDF.

• A UDF must implement at least one evaluate() method.

The evaluate() method is not defined by an interface, since it may take an arbitrary number of arguments, of arbitrary types, and it may return a value of arbitrary type.

Hive introspects the UDF to find the evaluate() method that matches the Hive function that was invoked.

lets get started scala version that i am using is scala 2.11,now add following properties in your build.sbt file

name := "hiveudf_example"

version := "1.0"

scalaVersion := "2.11.1"

unmanagedJars in Compile += file("/usr/lib/hive/lib/hive-exec-2.0.0.jar")

path in the file is the path of your hive home i am hardcording it u can give it yours,create your main file as follows

package com.knoldus.udf

import org.apache.hadoop.hive.ql.exec.UDF

class Scala_Hive_Udf extends UDF {

  def evaluate(str: String): String = {
    str.trim
  }

}

i am creating udf for trim method in hive,you can create any method you want,now next task is to create assembly for your project,add sbt assembly plugin in your plugins.sbt file

logLevel := Level.Warn


addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

next step is to create jar go to your sbt console and hit command

sbt assembly

you can find your jar inside the target folder,now submit this jar to hive as udf,first start hive using hive command and submit the jar using ADD JAR command followed by path of your jar

Logging initialized using configuration in jar:file:/home/knoldus/Documents/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> ADD JAR /home/knoldus/Desktop/opensource/hiveudf_example/target/scala-2.11/hiveudf_example-assembly-1.0.jar
> ;
Added [/home/knoldus/Desktop/opensource/hiveudf_example/target/scala-2.11/hiveudf_example-assembly-1.0.jar] to class path

create a function with this udf

hive> CREATE FUNCTION trim AS 'com.knoldus.udf.Scala_Hive_Udf';
OK
Time taken: 0.47 seconds

now we will call this function as below

hive> select trim(" hello ");
OK
hello
Time taken: 1.304 seconds, Fetched: 1 row(s)
hive>

this is the simplest way to create a udf in hive,i hope this blog helps happy coding

High performance systems

Data Engineering, Strategy and Analytics

Intelligence Driven Decisioning - AI/ML

Cloud Engineering

Architecture Strategy, Audit & Academy

Platforms

KDP

KDSP

Products

Premon

Studio9

Tech Hub

Akka

Scala

Rust

Spark

Functional Java

Kafka

Flink

ML/AI

DevOps

Data Warehouse

Travel

Retail

Finance

Healthcare

Media and Publishing

Consumer Internet

Hi-tech & IoT

Case Studies

Blogs

Books

Community

Resources

OS contributions

Webinars

Knolx

Check out our open positions

Services

Go to Overview

Accelerators

Go to Overview

Platforms

Products

TechHub

Industries

Go to Overview

Travel

Insights

Go to Overview

Creating A Simple Hive Udf In Scala

Share the Knol:

Related

COMPANY

Sign up to our newsletter

Certificates

Partners

© 2023 Knoldus, Inc. All Rights Reserved.

Part of NashTech

Privacy Policy | Sitemap

Discover more from Knoldus Blogs

Check out our
open positions