What is Big Data?
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. Testing of these datasets involves various tools, techniques and frameworks to process. Big data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, variety, and velocity.
Big DataTesting strategy
Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product. When it comes to Big data testing,performance and functional testing are the key.
In Big data testing QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast. Processing may be of three types
- Real Time
- InteractiveAlong with this, data quality is also an important factor in big data testing. Before testing the application, it is necessary to check the quality of data and should be considered as a part of database testing.It involves checking various characteristics like conformity, accuracy, duplication, consistency, validity, data completeness etc.
Tools used in Big Data Scenarios
- CouchDB, DatabasesMongoDB, Cassandra, Redis, ZooKeeper, Hbase
- Hadoop, Hive, Pig, Cascading, Oozie, Kafka, S4, MapR, Flume
- S3, HDFS ( Hadoop Distributed File System)
- Elastic, heroku , Elastic, Google App Engine, EC2
- R, Yahoo! Pipes, Mechanical Turk, BigSheets, Datameer
Challenges in Big Data Testing
- AutomationAutomation testing for Big data requires someone with a technical expertise. Also, automated tools are not equipped to handle unexpected problems that arise during testing
- VirtualizationIt is one of the integral phases of testing. Virtual machine latency creates timing problems in real time big data testing. Also managing images in Big data is a hassle.
- Large Dataset
- Need to verify more data and need to do it faster
- Need to automate the testing effort
- Need to be able to test across different platform
- As data engineering and data analytics advances to a next level, Big data testing is inevitable.
- Big data processing could be Batch, Real-Time, or Interactive
- 3 stages of Testing Big Data applications are
- Data staging validation
- “MapReduce” validation
- Output validation phase
- Architecture Testing is the important phase of Big data testing, as poorly designed system may lead to unprecedented errors and degradation of performance
- Performance testing for Big data includes verifying
- Data throughput
- Data processing
- Sub-component performance