CSV File Writer using Scala

Table of contents
Reading Time: 3 minutes

So, you want to write a CSV file.Great idea! Let’s understand what a CSV file is, It’s nothing but a Comma-separated value file which contains values separated by commas.

The other day I was looking for a CSV file with some records in it and I started approaching people for it, then I wondered when I can write a CSV file of my own, borrowing it from others does not make a point.This actually made me write a piece of code in Scala which generates a CSV file in the specified directory. You can generate your own CSV file with n number of fields and n number of records in it. Also, you can play around with the fields and number of records in the file as and when required.

Come let’s see How did I make it happen.

Today we gonna make an SBT project
Firstly you will need to add a dependency in your build.sbt project

libraryDependencies += "au.com.bytecode" % "opencsv" % "2.4"

Now we will write code in our class. You can create an object or a class, In my case, it’s a companion object MakeCSV

First of all, you will need to import few packages in your class

import java.io.{BufferedWriter, FileWriter}

import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer
import scala.util.Random

import au.com.bytecode.opencsv.CSVWriter

Now we will start writing code in our Class

1. val outputFile = new BufferedWriter(new FileWriter("PATH_TO_STORE_FILE/output.csv")) //this will create an output file which is an output.csv file in the said directory

2.   val csvWriter = new CSVWriter(outputFile) // this will create a csvwriter object which will have the outputFile in it.
3. val csvSchema = Array("id", "name", "age", "city") // this is the schema for your CSV file, in my case I have four fields, you can include the schema if you want it’s totally optional.
4. val nameList = List("Deepak", "Sangeeta", "Geetika", "Anubhav", "Sahil", "Akshay")// this is the list for the name field
5. val ageList = (24 to 26).toList // this is the list for the age
field
6. val cityList = List("Delhi", "Kolkata", "Chennai", "Mumbai")// this is the list for the city field
7. val random = new Random() // this is the random object which I have created to take up random items from the list of fields
8. var listOfRecords = new ListBuffer[Array[String]]() // here is the list buffer which holds all the records
9. listOfRecords += csvFields //this is how we add the fields to our CSV file.
10.for (i listOfRecords += Array(i.toString, nameList(random.nextInt(nameList.length))
, ageList(random.nextInt(ageList.length)).toString, cityList(random.nextInt(cityList.length)))
}

// here is the loop which adds records to the listbuffers,here I have used random object to pick up random items from the list of fields.
11. csvWriter.writeAll(listOfRecords.toList) //here we are writing all the records to the CSV files.
12. outFile .close() // here we will finally close the file after writing all the records into it.

The final code is here

import java.io.{BufferedWriter, FileWriter}

import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer
import scala.util.Random

import au.com.bytecode.opencsv.CSVWriter

object MakeCSV extends App {

val outputFile = new BufferedWriter(new FileWriter(“/home/deepak/Desktop/deepak19.csv”)) //replace the path with the desired path and filename with the desired filename
val csvWriter = new CSVWriter(outputFile)
val csvFields = Array(“id”, “name”, “age”, “city”)
val nameList = List(“Deepak”, “Sangeeta”, “Geetika”, “Anubhav”, “Sahil”, “Akshay”)
val ageList = (24 to 26).toList
val cityList = List(“Delhi”, “Kolkata”, “Chennai”, “Mumbai”)
val random = new Random()
var listOfRecords = new ListBuffer[Array[String]]()
listOfRecords += csvFields
for (i listOfRecords += Array(i.toString, nameList(random.nextInt(nameList.length))
, ageList(random.nextInt(ageList.length)).toString, cityList(random.nextInt(cityList.length)))
}
csvWriter.writeAll(listOfRecords.toList)
outputFile.close()
}

I have tested the code to make 9 million records in a CSV file and It took 2 minutes and 22 seconds on my machine with an i5 processor and 8 GB RAM. I am gonna come up with a new blog where I will be writing the same code on Spark so that We could test the performance. I really hope the performance will go up on Spark.

You can find the mini project with all the code at the link Click here
If you find any challenge, Do let me know in the comments.
If you enjoyed this post, I’d be very grateful if you’d help it spread.Keep smiling, Keep coding!

Written by 

Deepak is a Software Consultant having experince of more than 5 years . He is very enthusiastic towards his work and is a good team player. He has sound knowledge of different technologies which include Java, C++, C, HTML, CSS, Javascript, C# always keen to learn new technologies.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading