Empower Scala with Apache Solr

Table of contents

Reading Time: 3 minutes

Solr is ready to use enterprise search server. Here I am going to show you how we can use both of them together to empower your scala application with solr.

Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This is like retrieving pages in a book related to a keyword by scanning the index at the back of a book, as opposed to searching every word of every page of the book.

This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).
Solr stores this index in a directory called index in the data directory.

A. Solr Searches the Indexed Documents.

B. Each Document has an ID and List of terms

C. For Each term solr have list of all document that contains this specific term.

D. Identify each document by it’s ID.

Install Solr on Local Machine

1. Download the latest version of Solr
http://www.apache.org/dyn/closer.cgi/lucene/solr/3.6.1

2. Extract the apache-solr-3.6.1.tgz of apache-solr-3.6.1.zip to any directory.
I have extracted the archive file in /mayank/solr directory

3. Go to SOLR_Home/example

4. Run the server with > jave -jar start.jar

5. Now Your Solr Server is up and running

[source language=”scala”]
cd example
java -jar start.jar
[/source]

You should see something like this in the terminal.

[source language=”scala”]
2011-10-02 05:20:27.120:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT
….
2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983
[/source]

Solr is now running! You can now access the Solr Admin webapp by loading http://localhost:8983/solr/admin/ in your web browser.

So far so good. Lets discuss a use case where you can apply this knowledge with scala application.

Create the scala application for library.

– Create the Book

– Feed Book to Solr (Known as Indexing)

– Search Book from Solr.

Suppose Book object have TITLE, ISBN, AUTHOR, DESCRIPTION fields.

A. Configure Solr

1.To enable Solr to Store the above fields we need to modify the configuration file schema.xml. This file is located in SOLR_HOME /example/solr/conf directory.

Configure the schema.xml file.

[source language=”xml”]
…..
<fields>
<field name=”isbn_s” type=”string” indexed=”false” stored=”true” required=”true” />
<field name=”name_s” type=”text” indexed=”true” stored=”true” />
<field name=”author_s” type=”string” indexed=”false” stored=”true”/>
<field name=”description_s” type=”string” indexed=”true” stored=”true”/>
</fields>

<!– Field to use to determine and enforce document uniqueness.
Unless this field is marked with required=”false”, it will be a required field
–>
<uniqueKey>isbn_s</uniqueKey>

<!– field for the QueryParser to use when an explicit fieldname is absent –>
<defaultSearchField>name_s</defaultSearchField>
…..
[/source]

Note: It is not necessary to index each field. Here we stored all the fields but Indexed only two of them. NAME and DESCRIPTION Fields are Indexed because search process would be apply on these fields.If the size of indexed document increases it slows the searching.
isbn_s is tagged as uniqueKey. It ensure that ISBN would be unique. It you will try to add different documents with same ISBN than latter will replace previous one.

B. Create the Scala Project

build.sbt

[source language=”scala”]
name := “scala-solr”

organization := “com.knoldus”

version := “0.2-SNAPSHOT”

scalaVersion := “2.9.2”

resolvers += “Scala-tools” at “https://oss.sonatype.org/content/groups/scala-tools”

resolvers += “Google Api client” at “http://mavenrepo.google-api-java-client.googlecode.com/hg/”

{
libraryDependencies ++= Seq(
“net.databinder” %% “dispatch-http” % “0.8.8”,
“net.databinder” %% “dispatch-http-json” % “0.8.8”,
“org.apache.solr” % “solr-solrj” % “3.3.0”,
“junit” % “junit” % “4.7” % “test”,
“org.specs2” %% “specs2” % “1.12” % “test”
)
}
[/source]

Indexing : Feed Solr with the Book Object. Here I have used the solrj (solr client for java) for indexing the solr documents.

[source language=”scala”]
package com.knoldus.feed

import org.apache.solr.common.SolrInputDocument
import java.util.ArrayList
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer

case class Book(isbn:String,name:String,author:String,description:String)
class Feed2Solr {
val url = “http://localhost:8983/solr”
val solrDocuments = new ArrayList[SolrInputDocument]()
val server= new CommonsHttpSolrServer( url )

def send(books:List[Book]){
books.foreach(book=>server.add(getSolrDocument(book)))
server.commit()
}

def getSolrDocument(book: Book): SolrInputDocument = {
val document = new SolrInputDocument()
document.addField(“isbn_s”,book.isbn)
document.addField(“name_s”, book.name)
document.addField(“author_s”,book.author)
document.addField(“description_s”, book.description)
document
}
}

object Sender extends App
{
val book1=new Book(“99921-58-10-7″,”Wings of Fire”,”Abdul Kalam”,”Biography of A.P.J Abdul Kalam”)
val book2=new Book(“59234-58-10-3″,”Harry”,”J.K Rowling”,”Harry Poter’s Adventures”)
(new Feed2Solr).send(List(book1,book2))
}
[/source]

Searching: Solr is able to return search result in both xml or Json format. In the following implementation solr respond with xml . Following code parse the xml and produces the List of Books with specific term.

[source language=”scala”]
package com.knoldus.search

import com.knoldus.feed.Book
import scala.xml.Node
import dispatch.Http
import dispatch.url
import dispatch.XhtmlParsing._

class SearchFromSolr {
/* This Method Serialize the Book Object from the XML or JSON
Hear I use Dispatch Libray But you can use any technique to retrive the
Book Object form XML*/

def fetchAndExtractBook(query: String): Seq[Book] = {
def createSolrURL(query:String)=”http://localhost:8983/solr/select?q=”+query+”&defType=edismax”
val urlstr=createSolrURL(query)
def getBook(xml:Node):Option[Book]=
{
var (isbn,name,author,description)=(“”,””,””,””)
(xml \\ “str”).foreach{node => {
if(((node \\ “@name”).text)== “isbn_s”) isbn=node.text
if(((node \\ “@name”).text)== “name_s”) name=node.text
if(((node \\ “@name”).text)== “author_s”) author=node.text
if(((node \\ “@name”).text)== “description_s”) description=node.text
}}
val book=Book(isbn,name,author,description)
Some(book)
}

Http(url(urlstr) </> {
xml =>
((xml \\ “doc”).map {
node => getBook(node)
}).flatten
})

}

object Search extends App{
//Select All the books
(new SearchFromSolr).fetchAndExtractBook(“*:*”).foreach(println(_))

//Select All the books where BOOKNAME like “Harry”
(new SearchFromSolr).fetchAndExtractBook(“name_s:harry”).foreach(println(_))

//Select All the books where BOOKNAME like “Harry” or wings
(newSearchFromSolr).fetchAndExtractBook(“name_s:harry+OR+wings”) foreach(println(_))

}
[/source]

1 thought on “Empower Scala with Apache Solr5 min read”

patrick says:

January 16, 2018 at 1:06 AM

Hi, I am beginner to solr with scala. Thanks for good example. I tried to use the same example using csv file read. but I am getting .SparkException: Task not serializable. Can you help to fix the issue to load csv file please. i know this example is way long back.. but it is nice for me to understand. can you help on this?.