Author: Ayush Tiwari

Blue Pill Red Pill The Matrix of Thousands of Data Streams

Introduction to Databricks Delta

Reading Time: 2 minutes A component of the Databricks Unified Analytics Platform, Databricks Delta is an analytics engine that provides a powerful transactional storage layer built on Apache Spark. It helps users build robust production data pipelines at scale, giving end users a consistent view of the data. Advantages of Databricks Delta There are multiple benefits of using databricks delta which is following Query performance Delta uses the following techniques to provide 10x to 100x faster query performance on Parquet than Continue Reading

Databricks jobs

Reading Time: 2 minutes Jobs A job is a way to run non-interactive code in a Databricks cluster. For example, you can run an extract, transform, and load (ETL) workload interactively or on a schedule. You can also run jobs interactively in the notebook UI. Your job can consist of a single task or can be a large, multi-task workflow with complex dependencies. Databricks manages the task orchestration, cluster management, Continue Reading

Introduction to GitLab CI/CD

Reading Time: 3 minutes To use GitLab CI/CD: Ensure you have runners available to run your jobs. Install GitLab Runner and register a runner for your instance, project, or group if you don’t have a runner. Create a .gitlab-ci.yml file at the root of your repository. This file is where you define your CI/CD jobs. In GitLab, runners are agents that run your CI/CD jobs.You might already have runners available for your project, Continue Reading

Understanding the Apache Spark Streaming

Reading Time: 2 minutes The Apache Streaming module is a stream processing-based module within Apache Spark. It uses the Spark cluster to offer the ability to scale to a high degree. Being based on Spark, it is also highly fault-tolerant, having the ability to rerun failed tasks by checkpointing the data stream that is being processed. Four Major Aspects of Spark Streaming Fast recovery from failures and stragglers Better Continue Reading

Spark Session

Understanding Spark Application Concepts

Reading Time: 3 minutes Once you have downloaded the spark and are ready with the SparkShell and executed some shortcode examples. After that, to understand what’s happening behind your sample code you should be familiar with some of the critical concepts of the Spark application. Some important terminology used are: ApplicationA user program built on Spark using its APIs. It consists of a driver program and executors on the Continue Reading

volatile

Difference Between Synchronized and Volatile in Java

Reading Time: 3 minutes Even though synchronized and volatile help to keep away from multi-threading issues, they’re completely unique from each other. Before seeing the difference between them, let’s understand what does synchronized and volatile variables in Java provide. Synchronization in Java We all know that Java is a multi-threaded language in which multiple threads execute in parallel to complete program execution, so in this multi-threaded environment synchronization of Java Continue Reading