Text Mining

MachineX: Ultimate guide to NLP (Part 1)

Reading Time: 7 minutes In this blog, we are going to see some basic text operations with NLP, to solve different problems. This Blog is a part of a series Ultimate guide to NLP , which will focus on Basic text pre-processing techniques. Some of the major areas that we will be covering in this series of Blogs include the following: Text Pre-Processing Understanding of Text & Feature Engineering Continue Reading

Tutorial: How to build a Tokenizer in Spark and Scala

Reading Time: 2 minutes In our earlier blog A Simple Application in Spark and Scala, we explained how to build Spark and make a simple application using it. In this blog, we will see how to build a fast Tokenizer in Spark & Scala using sbt. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens Continue Reading