Author: Ayush Tiwari

man wearing blue crew neck top

Databricks Job API

Reading Time: 4 minutes The Databricks Jobs API follows the guiding principles of the REST (Representational State Transfer)  architecture. We can use either Databricks personal access token or password for the Authentication and access to Databricks REST API. The Databricks Jobs API 2.1 supports jobs with multiple tasks. All Databricks Jobs API are mentioned below: Creating a New Job Users can send requests to the server to create a new task. The Databricks Jobs  API uses  the  HTTP POST request method, which consists of a request body schema as follows: Schema Data Type Description name String The Continue Reading

woman sitting while operating macbook pro

Introduction to Tagging in Git

Reading Time: 2 minutes Like most Version Control Systems, Git has the ability to tag specific points in a repository’s history as important. Generally, people use this feature to mark version points (v1.0, v2.0, etc.). In this blog, you will learn about git tagging, how to list existing tags, how we can create and delete tags,  and the different types of tags. Listing of tags Creating tags Git supports two types of tags: lightweight tags and annotated tags. They both allow you to reference a specific commit in your repository, but they differ in the amount of metadata they can store. Annotated tags Annotated tags store additional metadata as complete objects in the Git database, such as author name, release notes, tag message, and date. git tag -a rel-5.2.1 -m “first tag of 5.2 release” The – m Specifies the tag message, to be stored with the tag. If you don’t specify a message for an annotated tag, Git launches your editor so you can enter it.If you execute the git show command you can see all the tag-related data. Lightweight tags Lightweight tags are the easiest way to add tags to a git repository, as they only store the hash of the commit they refer to. They are created without the -a, -s, or -m options, * containing no additional information. You can create a new lightweight tag by executing the below command- Continue Reading

Blue Pill Red Pill The Matrix of Thousands of Data Streams

Introduction to Databricks Delta

Reading Time: 2 minutes A component of the Databricks Unified Analytics Platform, Databricks Delta is an analytics engine that provides a powerful transactional storage layer built on Apache Spark. It helps users build robust production data pipelines at scale, giving end users a consistent view of the data. Advantages of Databricks Delta There are multiple benefits of using databricks delta which is following Query performance Delta uses the following techniques to provide 10x to 100x faster query performance on Parquet than Continue Reading

Databricks jobs

Reading Time: 2 minutes Jobs A job is a way to run non-interactive code in a Databricks cluster. For example, you can run an extract, transform, and load (ETL) workload interactively or on a schedule. You can also run jobs interactively in the notebook UI. Your job can consist of a single task or can be a large, multi-task workflow with complex dependencies. Databricks manages the task orchestration, cluster management, Continue Reading

Introduction to GitLab CI/CD

Reading Time: 3 minutes To use GitLab CI/CD: Ensure you have runners available to run your jobs. Install GitLab Runner and register a runner for your instance, project, or group if you don’t have a runner. Create a .gitlab-ci.yml file at the root of your repository. This file is where you define your CI/CD jobs. In GitLab, runners are agents that run your CI/CD jobs.You might already have runners available for your project, Continue Reading

Understanding the Apache Spark Streaming

Reading Time: 2 minutes The Apache Streaming module is a stream processing-based module within Apache Spark. It uses the Spark cluster to offer the ability to scale to a high degree. Being based on Spark, it is also highly fault-tolerant, having the ability to rerun failed tasks by checkpointing the data stream that is being processed. Four Major Aspects of Spark Streaming Fast recovery from failures and stragglers Better Continue Reading

Spark Session

Understanding Spark Application Concepts

Reading Time: 3 minutes Once you have downloaded the spark and are ready with the SparkShell and executed some shortcode examples. After that, to understand what’s happening behind your sample code you should be familiar with some of the critical concepts of the Spark application. Some important terminology used are: ApplicationA user program built on Spark using its APIs. It consists of a driver program and executors on the Continue Reading

volatile

Difference Between Synchronized and Volatile in Java

Reading Time: 3 minutes Even though synchronized and volatile help to keep away from multi-threading issues, they’re completely unique from each other. Before seeing the difference between them, let’s understand what does synchronized and volatile variables in Java provide. Synchronization in Java We all know that Java is a multi-threaded language in which multiple threads execute in parallel to complete program execution, so in this multi-threaded environment synchronization of Java Continue Reading