Smart Searching Through Trillion of Research Papers with Apache Spark ML

Great start with filter, map, flatMap, and for comprehension

Reading Time: 2 minutes Scala has a very large collection set. Collections are like containers that hold some linear set of values, and we apply some operations like filter, map, flatMap and for comprehension of the collections and manipulate them in a new collection set. filter Selects all elements of the collection that satisfy a predicate. Params: p- It used to test elements Returns: A new collection consisting of Continue Reading


Introduction to the Spring Data JDBC

Reading Time: 4 minutes Introduction Spring Data JDBC, part of the larger Spring Data family, makes it easy to implement JDBC based repositories. It is a persistence framework that is not as complex as Spring Data JPA. It doesn’t provide cache, lazy loading, write-behind, or many other features of JPA. Nevertheless, it has its own ORM and provides most of the features we’re used with Spring Data JPA like mapped Continue Reading

How to Create Cronjobs with Kubernetes Client Python?

Reading Time: 3 minutes Hello Readers! In this blog, we will see How to Create Cronjobs with Kubernetes Client Python. As we all know that generally, we use kubectl commands for creating, listing, updating, and deleting the Kubernetes resources. But In this blog, we will see how we can use python for doing these things with resources.  Installation: From source: From PyPI directly: Now, we have installed the python-Kubernetes package installed. Continue Reading

black and gray laptop computer turned on doing computer codes

How to create Statefulsets workloads Using Kubernetes Python Client

Reading Time: 3 minutes Hello Readers! In this blog we are going to see how we can create a Statefulsets using kubernetes python client . Kubectl is the primary tool for dealing with and managing your clusters from the command line. Through kubectl you can see the status of individual nodes, pods on those nodes, policies and much more besides. But In this blog we will see how we can use Continue Reading

Apache Spark’s Developers Friendly Structured APIs: Dataframe and Datasets

Reading Time: 3 minutes This is the second part of the blog series on Spark‘s structured APIs Dataframe & Datasets. In the first part we covered Dataframe and I recommend you go read that blog first if you are new to spark. In this blog we’ll cover the Spark Datasets API, so let’s get started. The Datasets API Datasets are also the combination of two characteristics: typed and untyped Continue Reading

Lasso And Ridge Regression

Reading Time: 4 minutes In this blog, we will learn about lasso regression and ridge regression techniques of regression. We will compare and analyze the methods in detail. Introducing Linear Models Linear regression is a type of linear model which is the most basic and commonly used predictive algorithm. This can not be dissociated from its simple, yet effective architecture. A linear model assumes a linear relationship between input Continue Reading

Dataframe and Datasets: Apache Spark’s Developers Friendly Structured APIs

Reading Time: 4 minutes This is a two-part blogs in which first we’ll be covering Dataframe API and in the second part Datasets. Spark 2.x introduced the concept of structuring the spark by introducing two concepts: – to express some computation by using common patterns found in data analysis, such as filtering, selecting, counting, aggregating, and grouping. And the second one of order and structure your data in a Continue Reading

Different Types of JOIN in Spark SQL

Reading Time: 3 minutes Join in Spark SQL is the functionality to join two or more datasets that are similar to the table join in SQL based databases. Spark works as the tabular form of datasets and data frames. The Spark SQL supports several types of joins such as inner join, cross join, left outer join, right outer join, full outer join, left semi-join, left anti join. Joins scenarios Continue Reading

Apache Beam: Pipeline Fundamentals

Reading Time: 3 minutes An introduction to pipeline fundamentals. What is Apache Beam Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream processing. The Apache Beam programming model simplifies the mechanics of large-scale data processing. What is Beam Pipeline A Beam pipeline is a graph of all the data and computations in your data processing task. This Continue Reading

Databricks jobs

Reading Time: 2 minutes Jobs A job is a way to run non-interactive code in a Databricks cluster. For example, you can run an extract, transform, and load (ETL) workload interactively or on a schedule. You can also run jobs interactively in the notebook UI. Your job can consist of a single task or can be a large, multi-task workflow with complex dependencies. Databricks manages the task orchestration, cluster management, Continue Reading

Introduction to GitLab CI/CD

Reading Time: 3 minutes To use GitLab CI/CD: Ensure you have runners available to run your jobs. Install GitLab Runner and register a runner for your instance, project, or group if you don’t have a runner. Create a .gitlab-ci.yml file at the root of your repository. This file is where you define your CI/CD jobs. In GitLab, runners are agents that run your CI/CD jobs.You might already have runners available for your project, Continue Reading

Setting up the Play Framework

Reading Time: 2 minutes What Play is? Play is a high productivity Java and Scala web application framework that integrates the components and APIs you need to the modern web application. It is a Web framework whose HTTP interface is simple and powerful. Play Requirements: To function correctly, the Play application only needs to include the Play JAR files. Because these JAR files are published to the Maven Repository, Continue Reading

Open Systems Interconnection (OSI) Model

Reading Time: 4 minutes Open Systems Interconnection Model Introduction : The Open Systems Interconnection Model is a way to isolate communication problems between two remote computers. The abstract model have few layers, and each layer has certain functions. Those functions need to be performed by the services of the respective layer. Layers in Open Systems Interconnection Model : The process of communication between two endpoints in a network can Continue Reading