Big Data Storage

BigQuery:  Efficient Data Warehouse Schema Design

Reading Time: 3 minutes Conventional data warehouses support data models based on star schema and snowflake schema. In these models, there are a number of fact tables and dimension tables. In order to minimize redundancy it is recommends to split data into multiple tables in . This is a normalization process. Normalization is the technique of eliminating the redundant data. It minimize the insertion, deletion, and update anomalies. It saves the disk Continue Reading

Modernizing Data Storage for fuelling Digital Transformation

Reading Time: 5 minutes As companies mature in their digital transformation journey, old technologies and rules of doing business are being re-defined. Capturing customers is no longer enough and companies are focusing on how to keep them engaged with hyper-personalized experiences. There’s an explosion of data sources as everyone and everything is connected with mobile devices, social media, and IoT.  What this means for a business is an exponential Continue Reading

Apache Spark: Read Data from S3 Bucket

Reading Time: < 1 minute Amazon S3 Accessing S3 Bucket through Spark Edit spark-default.conf file You need to add below 3 lines consists of your S3 access key, secret key & file system spark.hadoop.fs.s3a.access.key “s3keys” spark.hadoop.fs.s3a.secret.key “yourkey” spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem

Big Data Landscape explained

Reading Time: 5 minutes Big Data has now evolved into a buzz word and it seems everyone is either working on it or want to work on it. However, most of the people associate Big Data with some of the popular tool sets like Hadoop, Spark, NoSql databases like Hive, Cassandra , HBase etc. HDFS made Big Data popular as it gave us an option to distribute the data Continue Reading