Speech Recognition with Scala

Table of contents

Reading Time: 2 minutes

In this blog I am going to explain how to integrate speech recognition in your Scala project.

Speech recognition enables us to integrate the recognition and translation of spoken language into our projects in form of text. Speech recognition is really upcoming feature in electronic and computer devices so as to make them smarter.

In the project we shall be using the CMU Sphinx Toolkit. It allows us to integrate offline speech recognition. It is an open source toolkit which provides us with several speech recognizer components. There are several components available depending upon the needs of the application. The available components include speech recognizer , speech decoders, software for acoustic model training,language model and pronunciation dictionary.

We shall be using the Sphinx 4 speech recognizer, it is a pure java speech recognition library. It is used for identification of speech devices,adapt models and to recognize and translate speech .

Now let us look at the code.First of all let us include the following two dependencies in our build.sbt

libraryDependencies += "edu.cmu.sphinx" % "sphinx4-core" % "1.0-SNAPSHOT"
libraryDependencies += "edu.cmu.sphinx" % "sphinx4-data" % "1.0-SNAPSHOT"
                                  or
libraryDependencies += "de.sciss" % "sphinx4-core" % "1.0.0"
libraryDependencies += "de.sciss" % "sphinx4-data" % "1.0.0"

The next step is to import the following into the application

import edu.cmu.sphinx.api._

Next comes the code for setting up the configuration for speech recognition

object SpeechRecognitionApp extends App {
  val configuration = new Configuration
  configuration.setAcousticModelPath("file:models/acoustic/wsj")
  configuration.setDictionaryPath("file:models/acoustic/wsj/dict/cmudict.0.6d")
  configuration.setLanguageModelPath("models/language/en-us.lm.dmp")
}

The code above will create a configuration variable which is responsible for setting up the acoustic model, language model and path to the dictionary.

The cmudict is a text file used as dictionary of recognizable words which can be extended for a given language. It looks something like this

ONE                  HH W AH N
ONE(2)               W AH N
TWO                  T UW
THREE                TH R IY

Now the last step is to create the recognition object, enable it to start recognition and store the result.And we have stored that result in form of a string.

println("Start speaking :")
val speechRecognizer = new LiveSpeechRecognizer(configuration)
speechRecognizer.startRecognition(true)
var result = speechRecognizer.getResult
while ({result = speechRecognizer.getResult; result != null}) {
  println(result.getHypothesis)
}

Now we are done, so we can use the above code to enable speech recognition into our Scala Project.

References

6 thoughts on “Speech Recognition with Scala2 min read”

To resolve the deps, had to change them to:

libraryDependencies += “edu.cmu.sphinx” % “sphinx4-core” % “5prealpha-SNAPSHOT”
libraryDependencies += “edu.cmu.sphinx” % “sphinx4-data” % “5prealpha-SNAPSHOT”

Nice concept to write a blog on…

Hi, I am getting an error that Not enough agrument for constructor SpeechAlighner.

Can you please help?

Pallavi Singh says:

October 7, 2016 at 9:15 AM

Hi, can you send me the code block where you are getting the error, may be then I can help you better.

Hi Pallavi,
This is very helpful. Thanks.
Have you tried doing this demo in databricks community edition (http://cmusphinx.sourceforge.net/wiki/) with Apache Spark?
It will help those interested in combining sphinx with Spark for scaling up the transcription task to terbytes of audio files in s3 of hdfs.