Text Mining

Tutorial: How to build a Tokenizer in Spark and Scala

Reading Time: 2 minutes In our earlier blog A Simple Application in Spark and Scala, we explained how to build Spark and make a simple application using it. In this blog, we will see how to build a fast Tokenizer in Spark & Scala using sbt. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens Continue Reading