Integrating Presto With Carbondata

Table of contents

Reading Time: 2 minutes

Presto is a well known open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It was developed by Facebook to analyse petabytes of data and was later open sourced. Presto does not provide any storage but can be used with a variety of data sources like Hive, Cassandra , Relational databases and even with some propriety databases as well.

In this blog we are going to discuss how we can use Presto to query data from one of the other upcoming open source solution Carbondata . CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. CarbonData allows faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, Presto with Carbondata helps in speeding up queries by an order of magnitude over PetaBytes of data.

For Installing Presto, you can download the tarball for latest version from here and then untar it in the directory of your choice. The tarball will contain a single top-level directory , in this case it ispresto-server-0.187, which we will call the installation directory. All the configuration files for Presto lies in the etc folder inside the installation directory. Configure the Presto server as defined here according to your server settings. After installing and configuring the Presto Server you can run the server using below command from installation directory. The below command will run Presto as a daemon.

bin/launcher start

Alternatively if you want to run it in foreground you can use the below command for the same. Personally I prefer the below command as I can see all the log messages and errors on the screen.

bin/launcher run

The above steps help you to run Presto but now we need to integrate Presto with Carbondata . For integrating Carbondata we need to first clone the Carbondata repository using the below command

git clone https://github.com/apache/carbondata.git

then you can do a complete build running the below command inside the Carbondata folder

mvn -Pspark-2.1 -Phadoop-2.7.2 -DskipTests clean package

When the installation is complete you will be able to see the following folder created inside Carbondata directory

integration/presto/target/carbondata-presto-1.2.0-SNAPSHOT

Now we need to make changes at Presto end so that Presto engine can connect to the Carbondata.

Step 1 : We need to create a carbon.properties inside etc/catalog/ folder in presto installation directory. The above properties file will have only two properties,

connector.name=carbondata

carbondata-store=hdfs://localhost:54311/opt/example

The connector.name is to specify the catalog name that will be used by Presto to identify the catalog it needs to connect to.

carbondata-store specifies the Carbondata store location.

Step 2 : Go to the plugin folder inside the presto installation directory and create a folder with the name provided as connector.name property . In this case it is carbondata as shown in Step 1.

cd plugin

mkdir carbondata

Step 3 : Copy all the Jars from the integration/presto/target/carbondata-presto-1.2.0-SNAPSHOT to the carbondata folder created in step 2.

cp <carbon-data-installation-directory>/integration/presto/target/carbondata-presto-1.2.0-SNAPSHOT/* <presto-installation-directory>/plugin/carbondata

Now you are all set to execute queries on Carbondata using Presto. For executing the queries you can use the Presto-CLI . The Presto CLI provides a terminal-based interactive shell for running queries. The CLI is a self-executing JAR file, which means it acts like a normal UNIX executable. You can download the Presto-CLI from here.

Following is the command to run the Presto CLI.

./presto --server localhost:8080 --catalog carbondata --schema default

Once the Presto CLI is started you can run all the queries that you want of CarbonData using Presto.

1 thought on “Integrating Presto With Carbondata3 min read”

Comments are closed.

High performance systems

Data Engineering, Strategy and Analytics

Intelligence Driven Decisioning - AI/ML

Cloud Engineering

Architecture Strategy, Audit & Academy

Platforms

KDP

KDSP

Products

Premon

Studio9

Tech Hub

Akka

Scala

Rust

Spark

Functional Java

Kafka

Flink

ML/AI

DevOps

Data Warehouse

Travel

Retail

Finance

Healthcare

Media and Publishing

Consumer Internet

Hi-tech & IoT

Case Studies

Blogs

Books

Community

Resources

OS contributions

Webinars

Knolx

Check out our open positions

Services

Go to Overview

Accelerators

Go to Overview

Platforms

Products

TechHub

Industries

Go to Overview

Travel

Insights

Go to Overview

Integrating Presto With Carbondata

Step 1 : We need to create a carbon.properties inside etc/catalog/ folder in presto installation directory. The above properties file will have only two properties,

Step 2 : Go to the plugin folder inside the presto installation directory and create a folder with the name provided as connector.name property . In this case it is carbondata as shown in Step 1.

Step 3 : Copy all the Jars from the integration/presto/target/carbondata-presto-1.2.0-SNAPSHOT to the carbondata folder created in step 2.

Share the Knol:

Related

Written by Bhavya Aggarwal

1 thought on “Integrating Presto With Carbondata3 min read”

COMPANY

Sign up to our newsletter

Certificates

Partners

© 2023 Knoldus, Inc. All Rights Reserved.

Part of NashTech

Privacy Policy | Sitemap

Discover more from Knoldus Blogs

Check out our
open positions