Parsing XML into scala case classes using xtract

Table of contents
Reading Time: 5 minutes

In computing, Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is one of the well-known data formats for transporting information from one system to another with reliability and convenience. It uses a tag-based format for composing data. In real-world data processing, we often come across XML data parsing issues. Play-Json is one of the easy and convenient ways to parse JSON (JavaScript Object Notation) data into scala case classes and it is being used widely by various organizations to parse and write data into/from JSON object. As part of one of the project, I got a use case of parsing XML data into scala case class to process further. I was quite familiar with JSON parsing with Play-Json and Json4s but XML parsing was a bit new for me, so I tried to look for few alternatives available to parse XML object into scala case class. After a few efforts I came across few alternatives:

  • JAXB : “JAXB stands for Java architecture for XML binding. It is used to convert XML to java object and java object to XML.”
  • Scalaxb: “Scalaxb is an XML data-binding tool that supports XSD and WSDL, and as output, it generates scala source files.”
  • Xtract: “Xtract is a Scala library for deserializing XML. It is heavily inspired by the combinators in the Play JSON library, in particular, the Reads[T] class.”

20180323_143223

The JAXB is more specific to Java classes and the Scalaxb doesn’t look mature enough to be used right now from the blogs, but the most common thing about them is they are most suitable when you have a schema defined for your XML objects.

For my use case, the schema was not defined so I wanted to use a play-json like a library that can convert my XML data into scala case class. One of the solutions I found was the xtract library. As I mentioned earlier it is very similar to play-json and works on reads for reading data. This blog is specific to parsing XML into scala objects only, so if you are looking for a reverse conversion (Scala to XML) you can explore the xtract library itself.

Let’s start exploring this library for parsing an XML object into scala case class.

Build.sbt:

Here is the “build.sbt” file to define the dependency related to the xtract library. The xtract library uses few classes for functional syntax from play-json so we have to provide the play-json dependency as well.

name := "xtract-sample-app"

version := "1.0"

scalaVersion := "2.11.11"

libraryDependencies ++= Seq(
  "com.lucidchart" %% "xtract" % "1.1.1",
  "com.typesafe.play" % "play-json_2.11" % "2.6.7",
  "org.scalatest" %% "scalatest" % "3.0.4"
)

Sample XML Data:

Here is a complex XML object sample which contains a family tree (Example):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Response>
    <person name="Raaj Kapoor" dob="14 December 1924" gender="male">
        <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
        <wife name="Krishna Malhotra" dob="30 December 1930" gender="female"/>
        <kids name="Randheer Kapoor" dob="15 February 1947" gender="male">
            <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <wife name="Babita" dob="NA" gender="female"/>
            <kids name="Karishma Kapoor" dob="25 June 1974" gender="female">
                <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
                <husband name="Sanjay Kapoor" dob="NA" gender="male"/>
                <kids name="Samaira Kapoor" dob="NA" gender="female"/>
            </kids>
            <kids name="Kareena Kapoor" dob="21 September 1980" gender="female">
                <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
                <husband name="Saif Ali Khan" dob="16 August 1970" gender="male"/>
                <kids name="Taimoor Ali Khan" dob="NA" gender="male"/>
            </kids>
        </kids>
        <kids name="Ritu Nanda" dob="30 October 1948" gender="female">
            <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <husband name="Ranjan Nanda" dob="NA" gender="male"/>
            <kids name="Nitasha Nanda" dob="NA" gender="female"/>
            <kids name="Nikhil Nanda" dob="NA" gender="male">
                <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
                <wife name="Shweta Bachchan Nanda" dob="80" gender="female"/>
                <kids name="Navya Naveli Nanda" dob="NA" gender="male"/>
                <kids name="Agastyle Nanda" dob="NA" gender="male"/>
            </kids>
        </kids>
        <kids name="Rishi Kappor" dob="4 September 1952" gender="male">
            <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <wife name="Neetu Singh Kapoor" dob="NA" gender="female"/>
            <kids name="Rishima Sahni" dob="NA" gender="female"/>
            <kids name="Ranveer Kapoor" dob="28 September 1982" gender="male"/>
        </kids>
        <kids name="Reema Jain" dob="NA" gender="male">
            <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <husband name="NA" dob="NA" gender="male"/>
            <kids name="Adar Jain" dob="NA" gender="male"/>
            <kids name="Arman Jain" dob="NA" gender="male"/>
        </kids>
        <kids name="Rajeev Kapoor" dob="NA" gender="male">
            <address street="Mumbai46mbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <wife name="NA" dob="NA" gender="female"/>
        </kids>
    </person>
</Response>

One of the important things to notice is we have an XML data sample that contains a variety of data elements. Most of the elements are optional.

XML Readers:

Just like paly-json xtract let us write readers and writers to parse XML into scala and vice versa. Here is an example for the reader.

import com.lucidchart.open.xtract.XmlReader._
import com.lucidchart.open.xtract.{XmlReader, __}
import play.api.libs.functional.syntax._

object Person {
  implicit val reader: XmlReader[Person] = (
    attribute[String]("name") and
      attribute[String]("dob") and
      attribute[String]("gender") and
      (__ \ "address").read[Address].optional and
      (__ \ "wife").lazyRead(first[Person]).optional and
      (__ \ "husband").lazyRead(first[Person]).optional and
      (__ \ "kids").lazyRead(seq[Person]).default(Nil)
    ) (apply _)
}

case class Person(
                   name: String,
                   dob: String,
                   gender: String,
                   address: Option[Address],
                   wife: Option[Person],
                   husband: Option[Person],
                   kids: Seq[Person]
                 )

object Address {
  implicit val reader: XmlReader[Address] = (
    attribute[String]("street") and
      attribute[String]("city") and
      attribute[String]("state") and
      attribute[String]("pin") and
      attribute[String]("country")
    ) (apply _)
}

case class Address(
                    street: String,
                    city: String,
                    state: String,
                    pin: String,
                    country: String
                  )

case class Response(
                     person: Seq[Person]
                   )

object Response {
  implicit val reader: XmlReader[Response] = (__ \ "person").read(seq[Person]).default(Nil).map(apply _)
}

NOTE:
Here are few important keywords that are used to parse XML into scala case classes:

  • Reading attributes (attribute): An [[XmlReader]] that extracts a value from the attribute of the input NodeSeq.

    attribute[String](“state”)

  • Reading nodes(read):  Create an [[XmlReader]] that reads the node(s) located at this xpath.

    (__ \ “address”).read[Address].optional

  • Reading nodes recursively (lazyRead): Same as [[read]] but take the reader as a lazy argument so that it can be used in recursive

    (__ \ “wife”).lazyRead(first[Person]).optional

  • Reading optional nodes (optional): Convert to a reader that always succeeds with an option (None if it would have failed). Any errors are dropped

    (__ \ “wife”).lazyRead(first[Person]).optional

  • Reading nodes with default values (default): Use a default value if unable to parse, always successful, drops any errors

    (__ \ “kids”).lazyRead(seq[Person]).default(Nil)

  • Reading sequence/lists (seq): Read each node in the NodeSeq with reader, and succeeds with a [[PartialParseSuccess]] if any of the elements fail.

    (__ \ “person”).read(seq[Person]).default(Nil).map(apply _)

XML Helper:

This class works as an XML helper for parsing XML data into scala case classes.

import java.io.File
import com.knoldus.xtract.models._
import com.lucidchart.open.xtract.XmlReader
import scala.io.Source
import scala.xml.XML

/**
  * This class provide functionality to parse xml data into scala case classes
  */
trait XmlHelper {

  def xtract(filePath: String): Option[Response] = {
    val xmlData = Source.fromFile(new File(filePath)).getLines().mkString("\n")
    println("***File to be parsed: ")
    println(xmlData)
    val xml = XML.loadString(xmlData)
    XmlReader.of[Response].read(xml).toOption
  }
}

Sample App:

Here we have a simple application that takes an xml file and parse it into a complex scala object.

import com.knoldus.xtract.util.XmlHelper

object XtractSampleApp extends App with XmlHelper {
  val path = "src/main/resources/person.xml"
  val response = xtract(path)
  println("***RESPONSE: " + response)
}

Sample Scala object after parsing:

After parsing the XML data here is a sample outcome in the form of Scala case classes:

Some(Response(Vector(Person(Raaj Kapoor,14 December 1924,male,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),Some(Person(Krishna Malhotra,30 December 1930,female,None,None,None,Vector())),None,Vector(Person(Randheer Kapoor,15 February 1947,male,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),Some(Person(Babita,NA,female,None,None,None,Vector())),None,Vector(Person(Karishma Kapoor,25 June 1974,female,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),None,Some(Person(Sanjay Kapoor,NA,male,None,None,None,Vector())),Vector(Person(Samaira Kapoor,NA,female,None,None,None,Vector()))), Person(Kareena Kapoor,21 September 1980,female,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),None,Some(Person(Saif Ali Khan,16 August 1970,male,None,None,None,Vector())),Vector(Person(Taimoor Ali Khan,NA,male,None,None,None,Vector()))))), Person(Ritu Nanda,30 October 1948,female,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),None,Some(Person(Ranjan Nanda,NA,male,None,None,None,Vector())),Vector(Person(Nitasha Nanda,NA,female,None,None,None,Vector()), Person(Nikhil Nanda,NA,male,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),Some(Person(Shweta Bachchan Nanda,80,female,None,None,None,Vector())),None,Vector(Person(Navya Naveli Nanda,NA,male,None,None,None,Vector()), Person(Agastyle Nanda,NA,male,None,None,None,Vector()))))), Person(Rishi Kappor,4 September 1952,male,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),Some(Person(Neetu Singh Kapoor,NA,female,None,None,None,Vector())),None,Vector(Person(Rishima Sahni,NA,female,None,None,None,Vector()), Person(Ranveer Kapoor,28 September 1982,male,None,None,None,Vector()))), Person(Reema Jain,NA,male,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),None,Some(Person(NA,NA,male,None,None,None,Vector())),Vector(Person(Adar Jain,NA,male,None,None,None,Vector()), Person(Arman Jain,NA,male,None,None,None,Vector()))), Person(Rajeev Kapoor,NA,male,Some(Address(Mumbai46mbai,Mumbai,Maharashtra,36770047,India)),Some(Person(NA,NA,female,None,None,None,Vector())),None,Vector()))))))

Running application:

Step 1. Clone the git repo from here:
Git repository for the sample project

Step 2. Run the application using the following command:

sbt run

After running the application using the above command, you can find the outcome on the terminal. For further queries, you can play with code and find the required outcomes.

Hope you enjoyed the post. In our next post, we will be looking more deeply into how the “Extract library” works and converts the XML data into scala case classes.

Thanks for reading!


knoldus-advt-sticker

 

Written by 

Girish is a Software Consultant, with experience of more than 3.5 years. He is a scala developer and very passionate about his interest towards Scala Eco-system. He has also done many projects in different languages like Java and Asp.net. He can work in both supervised and unsupervised environment and have a craze for computers whether working or not, he is almost always in front of his laptop's screen. His hobbies include reading books and listening to music. He is self motivated, dedicated and focused towards his work. He believes in developing quality products. He wants to work on different projects and different domains. He is curious to gain knowledge of different domains and try to provide solutions that can utilize resources and improve performance. His personal interests include reading books, video games, cricket and social networking. He has done Masters in Computer Applications from Lal Bahadur Shastri Institute of Management, New Delhi.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading