Empower Scala with Apache Solr


Solr is ready to use enterprise search server. Here I am going to show you how we can use both of them together to empower your scala application with solr.

Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This is like retrieving pages in a book related to a keyword by scanning the index at the back of a book, as opposed to searching every word of every page of the book.

This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).
Solr stores this index in a directory called index in the data directory.

A. Solr Searches the Indexed Documents.

B. Each Document has an ID and List of   terms

C. For Each term  solr have list of all document that contains this specific term.

D. Identify each document by it’s ID.

Install Solr on Local Machine

1. Download the latest version of Solr

http://www.apache.org/dyn/closer.cgi/lucene/solr/3.6.1

2. Extract the apache-solr-3.6.1.tgz of apache-solr-3.6.1.zip to any directory.
I have extracted the archive file in /mayank/solr directory

3. Go to SOLR_Home/example

4. Run the server with > jave -jar start.jar

5. Now Your Solr Server is up and running

cd example
java -jar start.jar

You should see something like this in the terminal.

2011-10-02 05:20:27.120:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT
....
2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983

Solr is now running! You can now access the Solr Admin webapp by loading http://localhost:8983/solr/admin/ in your web browser.

So far so good. Lets discuss a use case where you can apply this knowledge with scala application.

 Create the scala application for library. 

- Create the Book

- Feed Book to Solr (Known as Indexing)

- Search Book from Solr.

Suppose Book object have TITLE, ISBN, AUTHOR, DESCRIPTION  fields.

A. Configure Solr

1.To enable Solr to Store the above fields we need to modify the configuration file schema.xml. This file is located in SOLR_HOME /example/solr/conf directory.

Configure the schema.xml file.

.....
<fields>
   <field name="isbn_s" type="string" indexed="false" stored="true" required="true" />
   <field name="name_s" type="text" indexed="true" stored="true" />
   <field name="author_s" type="string" indexed="false" stored="true"/>
   <field name="description_s" type="string" indexed="true" stored="true"/>
 </fields>

 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>isbn_s</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>name_s</defaultSearchField>
.....

Note: It is not necessary  to index each field. Here we stored all the fields but Indexed only two of them. NAME and DESCRIPTION Fields are Indexed because search process would be apply on these fields.If the size of indexed document increases it slows the searching.
isbn_s is tagged as uniqueKey. It ensure that ISBN would be unique. It you will try to add different documents with same ISBN than latter will replace previous one.

B. Create the Scala Project

build.sbt

name := "scala-solr"

organization := "com.knoldus"

version := "0.2-SNAPSHOT"

scalaVersion := "2.9.2"

resolvers += "Scala-tools" at "https://oss.sonatype.org/content/groups/scala-tools"

resolvers += "Google Api client" at "http://mavenrepo.google-api-java-client.googlecode.com/hg/"

{
   libraryDependencies ++= Seq(
    "net.databinder" %% "dispatch-http" % "0.8.8",
    "net.databinder" %% "dispatch-http-json" % "0.8.8",
	"org.apache.solr" % "solr-solrj" % "3.3.0",
            "junit" % "junit" % "4.7" % "test",
            "org.specs2" %% "specs2" % "1.12" % "test"
	)
}

Indexing : Feed Solr with the Book Object. Here I have used the solrj (solr client for java) for indexing the solr documents.

package com.knoldus.feed

import org.apache.solr.common.SolrInputDocument
import java.util.ArrayList
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer

case class Book(isbn:String,name:String,author:String,description:String)
class Feed2Solr {
  val url = "http://localhost:8983/solr"
  val solrDocuments = new ArrayList[SolrInputDocument]()
  val server= new CommonsHttpSolrServer( url )

  def send(books:List[Book]){
    books.foreach(book=>server.add(getSolrDocument(book)))
    server.commit()
  }

  def getSolrDocument(book: Book): SolrInputDocument = {
      val document = new SolrInputDocument()
      document.addField("isbn_s",book.isbn)
      document.addField("name_s", book.name)
      document.addField("author_s",book.author)
      document.addField("description_s", book.description)
      document
    }
}

object Sender extends App
{
  val book1=new Book("99921-58-10-7","Wings of Fire","Abdul Kalam","Biography of A.P.J Abdul Kalam")
  val book2=new Book("59234-58-10-3","Harry","J.K Rowling","Harry Poter's Adventures")
  (new Feed2Solr).send(List(book1,book2))
}

Searching: Solr is able to return search result in both xml or Json format. In the following implementation solr respond with xml . Following code parse the xml and produces the List of Books with specific term.

package com.knoldus.search

import com.knoldus.feed.Book
import scala.xml.Node
import dispatch.Http
import dispatch.url
import dispatch.XhtmlParsing._

class SearchFromSolr {
      /* This Method Serialize the Book Object from the XML or JSON
         Hear I use Dispatch Libray But you can use any technique to retrive the
         Book Object form XML*/

      def fetchAndExtractBook(query: String): Seq[Book] = {
        def createSolrURL(query:String)="http://localhost:8983/solr/select?q="+query+"&defType=edismax"
        val urlstr=createSolrURL(query)
         def getBook(xml:Node):Option[Book]=
          {
            var (isbn,name,author,description)=("","","","")
            			(xml \\ "str").foreach{node => {
            				if(((node \\ "@name").text)== "isbn_s") isbn=node.text
            				if(((node \\ "@name").text)== "name_s") name=node.text
            				if(((node \\ "@name").text)== "author_s") author=node.text
            				if(((node \\ "@name").text)== "description_s") description=node.text
            			}}
            		val book=Book(isbn,name,author,description)
            		Some(book)
          	}

         Http(url(urlstr) </> {
        	 xml =>
        	 ((xml \\ "doc").map {
        		 node => getBook(node)
        	 }).flatten
         })

  }

}

object Search extends App{
     //Select All the books
    (new SearchFromSolr).fetchAndExtractBook("*:*").foreach(println(_))

     //Select All the books where BOOKNAME like "Harry"
    (new SearchFromSolr).fetchAndExtractBook("name_s:harry").foreach(println(_))

      //Select All the books where BOOKNAME like "Harry" or wings
    (newSearchFromSolr).fetchAndExtractBook("name_s:harry+OR+wings")  foreach(println(_))

}
About these ads

About mayankbairagi

Software Developer
This entry was posted in Java, Scala, Web. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s