Meetup: An Overview of Spark DataFrames with Scala

Knoldus organized a Meetup on Wednesday, 18 Nov 2015. In this Meetup, an overview of Spark DataFrames with Scala, was given. Apache Spark is a distributed compute engine for large-scale data processing. A wide range of organizations are using it to process large datasets. Many Spark and Scala enthusiasts attended this session and got to know, as to why DataFrames are the best fit for building an application in Spark with Scala or any language.


Below is the Youtube video of the whole session.

Posted in apache spark, big data, Scala, Spark | Tagged , , , , , | 2 Comments

Akka Persistence Event Sourcing

This presentation covers a brief introduction to
1) how event sourcing works (commands, domain events, event logs)
2) DDD (domain driven design)
3) CQRS (command query responsibility segregation)
4) how event sourced architecture can be useful
5) akka persistence as a tool for event sourcing


Watch the video session:

Posted in Scala | Leave a comment

MeetUp on “BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL”

Big datasets are growing exponentially, but our needs to get quick interactive responses to our queries remain ever as important. This talk will feature an overview of various components in BlinkDB and introduce a new generalized online aggregation (G-OLA) paradigm in SparkSQL to incrementally process massive amounts of data on clusters of tens, hundreds or thousands of machines while returning approximate answers. More precisely, this new execution model enables SparkSQL to present the user with meaningful approximate results (with error bars) that are continuously refined and updated, at a speed comfortable to the user, while it crunches larger and larger fractions of the whole dataset in the background. This not only alleviates the need for pre-processing the data in advance for a wide range of queries, but also enables the users to observe the progress of a query and control its execution on the fly– enabling a smooth time/accuracy trade-off.

Knoldus is organizing an one hour session on 24th Nov 2015 at 6:00 PM. Mr. Sameer Agarwal from Databricks would give session on “BlinkDB and G-OLA”. All of you are invited to join this session

First Floor,
Above UCO Bank,
Near Rajendra Place Metro Station,  New Delhi, India

Please click here for more details.

Posted in Scala | Tagged , | Leave a comment

Simplifying Sorting with Spark DataFrames

In our previous blog post, Using Spark DataFrames for Word Count, we saw how easy it has become to code in Spark using DataFrames. Also, it has made programming in Spark much more logical rather than technical.

So, lets continue our quest for simplifying coding in Spark with DataFrames via Sorting. We all know that Sorting has always been an inseparable part of Analytics. Whether it is E-Commerce or Applied Sciences, sorting has always been a critical task for them. Even Spark gained its fame from Daytona Gray Sort challenge, in which Spark set a new record.

Earlier, sortByKey() was the only way to sort data in Spark, until DataFrames were introduced in Spark 1.3.0. That too, was limited to sort a dataset by its key only. What would one do if sorting was to be done by value ? A probable solution for this question is to swap the Key-Value pairs and then apply sortByKey(), like this

val lines = sc.textFile("data.txt")
val rdd = lines
           .flatMap(_.split(" "))
           .map((_, 1))
           .reduceByKey(_ + _)
val sortedRDD =
val data = sortedRDD.take(5)

In above code snippet, we want to find the 5 most frequent words written in “data.txt” file, hence we provided “false” to sortByKey(). From the code itself it is understandable that how cumbersome it is perform sorting with RDDs.

Now, lets see what magic Spark DataFrames has done to simplify sorting by taking the same example.

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val lines = sc.textFile("data.txt").toDF("line")
val df = lines.explode("line","word")((line: String) => line.split(" "))
val sortedDF = df.groupBy("word").count().sort(desc("count"))
val data = sortedDF.take(5)

As we can see that their is no need of swapping values as we were doing in RDD. Since, data is organised in column format, we can perform sorting by just mentioning the name of the column on which sorting needs to be done. Also, DataFrames are not restricted to Key-Value pairs anymore.

Of course, there are few shortcomings like the imports that are necessary to work with DataFrames and its functions. But, the overall experience of coding in Spark with DataFrames was fun.

Posted in apache spark, big data, Spark | Tagged , , , , | 4 Comments

Script-less Test Automation – Create Automated Test-Cases Automatically

Test automation is the activity to creating test script that can run with out human interventions on the same UI that the customer and end user would seen. These scripts can reduce the execution time and cover all the aspects of application.

The first generation of test automation tools provided the macro recording facility which run on the synchronous API rather than UI. The  test execution engine would fire a command, and then wait for the command to complete.A macro based tool was good for such short length automation, By combine a logical sequence of such short length macros, and your test flow is automated.

The second generation of test automation tool provided full‐fledged scripting, or even objected oriented language support to the automation engineer. A lot of tool APIs were present to simplify the common windows based tasks like application launch, file operations, generating elementary test reports, etc.

In a quest to achieve accelerated software delivery, organizations are increasingly adopting Agile development methodology. However, maximizing the benefits of Agile software development requires testing to be performed concurrently with development.

What is script-less and Script-less testing

Script-less is an approach to build an optimized test automation engine by empowering testing team to quickly build automated test cases by sequencing and ready but reusable code assets to ensure the full test coverage.

Script-less means no scripting and programming in the test automation tools native language. it doesn’t means there are no scripting involved. This means while automating test cases there is no need to program scripts  for each test case.

Script-less testing serves to reduce the time required for creating automated tests by considerably reducing the amount of scripting needed and it is no way a substitute for actual coding of an organizational test automation tools. Script-less testing is highly flexible and conventional testing framework with minimal code exposure.

Simple definition of  script-less test automation is an approach that helps users automate their tests without having to script or code using any programming language.”Test automation is a software development process that demands core technical capabilities. Test automation, most often than not, is driven by automation experts who might not necessarily have enough functional expertise about application under test.

Script-less test automation provides a very easy to use interface-

All the good concepts of test automation like data driving, keyword driving and modularity are wrap in a very easy to use software package. All the requirement of test automation script like sequence of keyword, steps with in each keyword, data per step, sequence of keyword for the test and UI object definition are available in a very easy to use interface.

Test automation will continue to deliver what it promises reduce repetitive testing efforts and accelerated regression testing. But going script-less will ensure you achieve these results much faster than promised by script based test automation.

Script-less test automation strives to bridge the gap between functional and technical expertise by allowing functional experts to take the driver seat in test automation.Script-less experience is achieved by abstracting the technology layer by use of work-flow driven approach to build automated test cases along with a set of keyword that run in the background which convert work-flow into scripts.

 scriptless test activity

Benefits of script-less testing 

  1. Requires one time effort of Automation Experts to define the architecture & design the solution, class libraries, custom keywords etc.
  2. Ability to keep up with increased product workload without increase in resources.
  3. Faster execution of automated tests.
  4. Flexible Keyword/Data-Driven approach.
  5. Applicable to all areas of automated testing (Smoke Testing, Regression Testing, User-Acceptence Testing )
  6. Ability to keep up with increased product workload without increase in resources.
  7. Designed for Maintainability.
  8. Increased test coverage as result to fewer product escapes.

Some of the common myths with script-less test automation

  1. Script-less no different from record and playback – Record and playback is capability that users can choose to use to record their tests and play it back to execute them.The recorded scripts contains hard coded test data inputs, cannot handle dynamic situations and is more error prone since it does not perform any validation and error handling on its own. The recorded script does not work in long run as they are not maintainable, scalable, and reliable.Unlike record and playback, the script less approach also provides a flexibility to manage dynamic object and associate multiple data sets.
  2. Script-less means script-free – Script less automation is an experience made possible by building keywords  that are reusable across applications, tools, technologies, platforms etc.The idea is to have our automation experts build a library of simple yet exhaustive keywords that can be easily used by functional experts, allowing them to quickly automate tests without any scripting.Keywords can be of different types like user actions such as “Click”, “Select Item”, “Enter Text”, etc and operations such as arithmetic, file, database, and many more.
    As far as a complete automation suite is concerned, it needs to grow organically from within your environment by carefully integrating business and operational logic step by step until it gets to a point where no further scripting is needed for people who are using it.
  1. Script-less test automation is not maintainable and reliable – Script-less test automation is a well-structured, methodical yet a very flexible approach to handle all types of complexities an application under test has to offer. It eliminates the complexities of test automation tools by building a layer on top, allowing functional users to automate.A well designed script-less approach, maintains complete traceability of all reusable components and maps dependencies throughout the test automation life cycle. Today there are several highly organized tools that offer script-less automation for use in multiple real time test scenarios. The developers of these tools have analysed numerous business cases, operational scenarios, deployment environments before building reusable components for their tools. They offer a high degree of practical reliability as well.

With the evolution of artificial intelligence, big data analytics, and high speed cloud computing, we may in the near future have a taste of script-less testing literally but for now, it is safe to assume that script-less testing is a highly flexible and conventional testing framework with minimal code exposure to users.


Posted in Scala | Tagged , ,

MeetUp on “An Overview of Spark DataFrames with Scala”

Knoldus is organizing an one hour session on 18th Nov 2015 at 6:00 PM. Topic would be An Overview of Spark DataFrames with Scala. All of you are invited to join this session.

First Floor,
Above UCO Bank,
Near Rajendra Place Metro Station,  New Delhi, India

Please click here for more details.

Posted in apache spark, Spark | Tagged , , , | Leave a comment

Service Virtualization in Testing

Application are very important for the business today. The development cost and the quality of application are remains challenges.
Service Virtualization allowing developers, testers and performance teams to work in parallel for faster delivery and higher application quality and reliability. it Simulates the behavior of selected components within a composite application to enable end-to-end testing as a whole.
Service Virtualization is not a substitute for testing the actual source code deployed as a composite application.
Development and testing team need a dependent system component for the application but if these are not available we make dependent system component as virtualized. Development and testing team can access the virtualized component as a actual component.

Why use Service Virtualization– Traditionally testing team have to wait for completed to application and deployed after it could begin functional testing,Integration testing,performance testing etc so if the service is not ready for the testing, we can make the virtual service for it. it works same as like actual service.

                                                                                                                    Source: ontestautomation


How to use Virtual Services

  • Configure the virtual service end point URL in consumer instead of real service end point URL.
  • Use the data set provided by SV team to test scenarios.

When to use Virtual Services

  • Component / Services not available.
  • Dependency on Third party services.
  • Component / Services with limited access.

Service Virtualization Tools– Lots of tool are available in the market for the service virtualization.

  • Parasoft Virtualize
  • CA Lisa Service Virtualization.
  • IBM Rational Test Virtualization Server.
  • HP Service Virtualization.

Benefits of Service Virtualization

  • accelerates time to market.
  • reducing risk.
  • lowering costs associated with environment.
  • Avoid manually writing stubs or mocks.
  • Avoid Maintenance of stubs during Agile development.
  • No conflicts and anytime access of services.
  • Reduces data setup time.
  • Eliminates delay in Third party service & avoid access fees.
  • Easily reconfigured for different testing needs and projects.
  • Used for Training and Knowledge transition purposes.
  • Easy to test Offline scenarios.
  • Increases Agility & Quality.

Thanks !!

Posted in Scala | Tagged , , ,

Setup dev environment for ionic framework

In the series of Ionic blog posts earlier we have seen why Ionic framework and what is crosswalk, now we are going to setup the dev environment for ionic framework.

Ionic is an MIT-licensed, front-end, open source framework for creating hybrid mobile apps. Built on top of AngularJS and Apache Cordova, Ionic provides tools, plugins and services for developing hybrid mobile apps using Web technologies like CSS, HTML5, Sass and Javascript. Apps can be built with these Web technologies and then distributed through native app stores to be installed on devices by using Cordova/PhoneGap. Ionic was created by Max Lynch, Ben Sperry, and Adam Bradley of Drifty Co. in 2013.

This ionic setup is for browser level only. Soon we will provide the environment setup steps for the mobile version.

Setting Up Ionic
Ionic is an npm module and requires Node.js. If you haven’t install Nodejs yet. You can get the Node installer from Once you have done, Ionic can be installed using Node’s package manager (npm).

npm install cordova ionic -g

This command will installed the Ionic and Apache Cordova modules globally. The modules themselves are installed in your user directory. On Linux the modules can be found in /usr/local. Once ionic is installed you can create the project with below command

Starting an ionic app

ionic start [appname] [template]

A starter template is what becomes the www directory within the Cordova project.

The default templates name are below:

  • tabs (Default)
  • sidemenu
  • maps
  • salesforce
  • tests
  • complex-list
  • blank

I have create with sidemenu template you can see some screenshots below:
And you can find code on Github

Screenshot from 2015-11-02 13_10_34 Screenshot from 2015-11-02 13_10_50 Screenshot from 2015-11-02 13_11_25

Thanks !!

Posted in Scala | Tagged , , , ,

Why Ionic over others and what is crosswalk.

In our previous blog (Intro to Hybrid Mobile App Development and Ionic Framework), we have seen what is hybrid mobile application and its architecture with the introduction to Ionic framework, earlier we were concerned about using the ionic framework over others, so in the series we are going to compare and understand why ionic framework only, what are the difference and improvements that makes ionic easy to use.

~Why Ionic ?

PhoneGap gave you a blank slate, Ionic provides UI components for you to use and to customize using the popular CSS extension language – Sass. Before building a hybrid app using PhoneGap or Cordova, you’d be in a state of choice paralysis. Which UI framework would you use? Ionic removes that choice for you and gives you a solid foundation with examples for you to make and create your own apps quickly. You can get on with the business logic of your app without the overhead of picking what UI kit to use.

~What ionic says:

“We’re using AngularJS and Sass”. It provides the UI components for you to use. It reduces the cognitive overhead to get up and running.

Here we encounter a  new topic crosswalk, so without going to see other information just stick to the point with some FAQs about crosswalk, What is it, why we use, what are the requirements, how to use it etc.

~What is Crosswalk?

  • Crosswalk is an open source project that allows you to specify a version of Chrome to use as your web browser in Android. The compiled app will have your code hosted inside of this Chrome webview.

~Why should I use Crosswalk?

  • Older versions of Android devices (4.0-4.3) use Android’s default browser, which has significantly less performance and standards compliance than modern Chrome. Using Crosswalk gives you a specific and more performant version of Chrome to use on all Android devices, in order to reduce fluctuations and fragmentation among devices.

~How does Crosswalk improve Cordova Android apps?

  • By designating a specific version of Chrome, you can skip the unexpected behavior from browsers that vary from device to device. Crosswalk also provides improved performance and ease of debugging.

~What can I expect, performance and size-wise?

  • In older Android Devices (4.0-4.3), you’ll see about a 10x improvement of both HTML/CSS rendering and JavaScript performance and CSS correctness. To bundle Chrome, you will see a small (~10-15MB) size increase in your Android Apps.

~How do I report errors?


~What are the architectures for Android devices, and why do they exist?

  • There are two main Architectures for Android – x86 and ARM. The reason for the two is that the device providers choose to use a separate processor. With different processors, we’ll need to compile them separately. Using Crosswalk, you may specify that you want two separate builds for x86 / Arm to keep your build size down. If you make a single build, you will have to bundle both versions of Crosswalk (x86/Arm) and have a larger build size (~50-60 MB).

Now it’s time to come into some code and try figuring out how everything is going on, so in our coming series we will start working on our first Ionic Framework App development.

Update: Ionic Framework dev setup environment

Thanks !!


Posted in Mobile Development | Tagged , , , , , , , | 3 Comments

Introduction to GulpJS

This presentation covering the introduction to the GulpJS, its tools and some code snippets, and how to start working with the GulpJs.


Watch the video tutorial, enjoy the live code going on:

Thank You.

Posted in Scala | Tagged , , , ,