This blog is all about Big data testing and what all scenarios we should keep in mind while performing. Big data testing is a process of performing quality analysis. Hence the big data testing can be functional, performance, database, infrastructure, security etc. Lets start.
What is Big Data?
Big data is a new word in the town. It is the data which is huge and is increasing exponentially with time. As data increases it becomes difficult to process, handle and manage the data.New Computing technologies have been created to handle, manage huge amount of data. Hence processing it quicker than the traditional system and technologies. Hence it is important to understand the tools and technologies used to handle big data.

Strategy behind testing Big Data
Testing such a huge amount of data will definitely need a astonishing strategy. The quality assurance team should focus on
- Batch Data Processing Test – Batch data processing is the processing of accessing large amount of data. Hence it involves running the application with huge data. As a result validating the volume of data. Some of the tools used are HDFS, HCL workload automation etc.
- Real-Time Data Processing Test – It deals with the data when the application is in Real-Time Data Processing mode. The tools used are Spark etc
- Interactive Data Processing Test – This involves peak detection, integration and quantitation. Hence the testing process checks modifying, retrieving and displaying of information which is in the form of result. The tools are HiveSQL etc.



Big data Testing also goes through these 3 stages:
- Data Ingestion – This means that the data loads from Big data system using some extraction tools. As a result the storage might be HDFS, MongoDB. Now the data is validating, as if the data which is storing is as per requirement.
- Data Processing – The data processes as per the requirement.
- Validation of the Output – The output generates and then sent to the data warehouse.
Tools used in Big data
- Data Ingestion – Kafka, Zookeeper, Sqoop, Flume, Storm, Amazon Kinesis.
- Data Processing – Hadoop (Map-Reduce), Cascading, Oozie, Hive, Pig.
- Storage – HDFS (Hadoop Distributed File System), Amazon S3, HBase.
Benefits of Big data testing
- Cost effective storage
- Accurate data
- Right data at right time
- Increase revenue
- Effective decision making and business strategy



Challenges while testing Big data
- Testing big data is quite complicating. It requires high skills and technical knowledge.
- Volume of data monitoring . It is a real challenge
- There are multiple tools to test full end to end flow
- Special environments requires while testing huge data.
References
https://www.edureka.co/blog/big-data-testing/


