Elasticsearch

Spark Structured Streaming with Elasticsearch

Reading Time: 3 minutes There’s been a lot of time we have been working on streaming data. Using Apache Spark for that can be much convenient. Spark provides two APIs for streaming data one is Spark Streaming which is a separate library provided by Spark. Another one is Structured Streaming which is built upon the Spark-SQL library. We will discuss the trade-offs and differences between these two libraries in Continue Reading

Amazon ES – Secure your cluster from anonymous users! #2

Reading Time: 5 minutes In the previous blog, we have learned how to create a domain on Amazon ES and also how to create an index using Curl on the cluster. Now, let’s just look how we can control access to Amazon ES Domain. One of the key benefits of using Amazon ES is that you can utilize AWS Identity and Access Management (IAM) to control access to your Continue Reading

Amazon ES – setting up the cluster! #1

Reading Time: 4 minutes Amazon Web Services (AWS) is a cloud services platform, providing compute power, database storage, content delivery, security options and other functionality to allow businesses to build sophisticated applications with increased flexibility, scalability and reliability. Amazon Elasticsearch is one of the services provided by AWS. Amazon ES Amazon Elasticsearch Service, also called Amazon ES, is a managed service that makes it easy to create a domain, Continue Reading

Exploring JEST: Java HTTP REST Client

Reading Time: 2 minutes Elasticsearch is a real-time distributed and open source full-text search and analytics engine. To integrate Elasticsearch to our application, we need to use an API. Elasticsearch gives us two ways, REST APIs, and Native clients. It’s easy to get confused about all the different ways to connect to Elasticsearch and why one of them should be preferred over the other. Available Elasticsearch clients are: Node Continue Reading

Java High-Level REST Client – Elasticsearch

Reading Time: 3 minutes Elasticsearch is an open-source, highly scalable full-text search and analytics engine. Using this, you can easily store, search, and analyze a large amount of data in real time. Java REST client is the official client for Elasticsearch which comes in 2 flavors: Java Low-Level REST client – It allows communicating with an Elasticsearch cluster through HTTP and leaves requests marshalling & responses un-marshalling to users. Continue Reading

Deploying a 2 node Elasticsearch cluster on ec2 instance.

Reading Time: 4 minutes In this blog we will focus on two major things : 1). Steps required to create a two node elasticsearch (v5.2 released on 31.Jan.2017) cluster on Linux instances (with CentOs as the default OS). 2). Attaching additional volume to the instances and making changes in elasticsearch configurations so that all the elasticsearch related data will be stored on the mounted volumes, since the default storage Continue Reading

Neo4j vs ElasticSearch & Full Text Search In Neo4j

Reading Time: 3 minutes Hello Graphistas, Are you missing this series 🙂 ? Welcome back again in the series of Neo4j with Scala 😉 . Let’s start our journey again. Till now we have talked and learnt about the use of Neo4j with Scala and how easily we can integrated both two amazing technologies. Before starting the blog here is recap : Getting Started Neo4j with Scala : An Continue Reading

Autocomplete using Elasticsearch

Reading Time: 2 minutes You would have seen in a movie data store like IMDB, Whenever a user enters ‘g’, the search bar suggests him that you might be looking for gone girl or all the movies that have ‘g’ in them. This is what an Autocomplete or word completion is and it has become an essential part of any application. Autocomplete speeds up human-computer interaction by predicting the Continue Reading

Meetup: Stream Processing Using Spark & Kafka

Reading Time: < 1 minute Knoldus organized a Meetup on Friday, 9 September 2016. Topics which were covered in this meetup are: Overview of Spark Streaming. Fault-tolerance Semantics & Performance Tuning. Spark Streaming Integration with  Kafka. Meetup code sample available here Real time stream processing engine application code available here

Building Analytics Engine Using Akka, Kafka & ElasticSearch

Reading Time: 5 minutes In this blog , I will share my experience on building scalable, distributed and fault-tolerant  Analytics engine using Scala, Akka, Play, Kafka and ElasticSearch. I would like to take you through the journey of  building an analytics engine which was primarily used for text analysis. The inputs were structured, unstructured and semi-structured data and we were doing a lot of data crunching using it. The Analytics Continue Reading

Introduction to Elasticsearch in Scala

Reading Time: 2 minutes ElasticSearch is a real-time distributed search and analytics engine built on top of Apache Lucene. It is used for full-text search, structured search and analytics. Lucene is just a library and to leverage its power you need to use Java. Integrating Lucene directly with your application is a very complex task. Elastic Search uses the indexing and searching capabilities of Lucene but hides the complexities Continue Reading

How to tokenize your search by N-Grams using Elastic Search in Scala?

Reading Time: 2 minutes N–Grams can be used to search big data with compound words. German language is famous and referred for combining several small words into one massive compound word in order to capture precise or complex meanings. N-Grams are the fragments in which a word is broken, and as more number of fragments relevant to data, the more fragments will match.N-Grams has its length of fragment as Continue Reading

Implementing full text search with Couchbase and harnessing the power of Couchbase full text search (CBFT)

Reading Time: 5 minutes Hey Folks.! In this blog we are going to be introduced to the Couchbase Full text search. In my recent blog ,we talked about how we can user ElasticSearch for the full text search and how we can connect it with Couchbase so that our data gets copied in real time and we can search on it too. But what if we do not want Continue Reading