Nutch

Harnessing the Power of Nutch with Scala

Reading Time: < 1 minute Knoldus was recently speaking at the IndicThreads conference in New Delhi, India. Here, we talked about Nutch and how easy it was for us to integrate it with Scala and build a scalable web crawler with less than 900 lines of code. The case study and the demonstration made the attendees aware of the power of Scala as a language of choice.

Intercepting Nutch Crawl Flow with a Scala Plugin

Reading Time: 4 minutes Apache Nutch, is an open source web search project. One of the interesting things that it can be used for is a crawler. The interesting thing about Nutch is that it provides several extension points through which we can plugin our custom functionality. Some of the existing extension points can be found here. It supports a plugin system which is used in Eclipse as well. Continue Reading