Cloud

KnolSnow: Load continuous data into Snowflake using Snowpipe

Reading Time: 5 minutes In this blog, we will discuss loading streaming data into Snowflake table using Snowpipe. But before that, if you haven’t read the previous part of this blog i.e., Loading Bulk Data into Snowflake then I would suggest you go through it. As now we have been set so let’s get started and see what Snowpipe is all about. Introduction Snowpipe is a mechanism provided by Continue Reading

KnolSnow: Loading Data Into Snowflake

Reading Time: 5 minutes This blog pertains to Loading Data into Snowflake, and I will explain you about the various step involved in this process. So let’s get started. Before moving ahead, you can visit the blog on understanding the basic of Snowflake Data Warehouse in case you want to refresh your concepts. Now let’s talk about the actual topic for which you have click on this blog. To Continue Reading

Migrating MLFlow Server To Cloud: Part 2

Reading Time: 4 minutes In my previous blog, I had discussed the first two phases of migrating MLFlow server to cloud. In this blog, I’ll be discussing the deployment of MLflow tracking server on Google Cloud Platform and migration of the existing data to the process. Also, I’ll be talking about optimizing the overall environment in the process. Deployment Step 1: Copy Contents from Disk Go to this link Continue Reading

Migrating MLFlow Server To Cloud: Part 1

Reading Time: 4 minutes The cloud migration process involves moving all or part of an organization’s data, apps, and services from on-premises data centres to a public or private cloud, where they are accessible on-demand over the Internet to authorized users. For most businesses considering cloud migration, the move is filled with promise and potential; scalability, flexibility, reliability, cost-effectiveness, improved performance and disaster recovery, and simpler, faster deployment. Cloud Continue Reading

Migration Assessment

Reading Time: 7 minutes The first step in migration is to calculate the cost of the move and the cost of what you are running in your current setup. This is useful if you’re planning a migration from an on-premises environment, a private hosting environment, another cloud provider, or if you’re evaluating the opportunity to migrate and exploring what the assessment phase might look like. The assessment phase is Continue Reading

Introduction to Cloud Migration

Reading Time: 4 minutes Cloud migration is the process of moving digital business operations into the cloud to leverage the advantages delivered by a successful digital transformation. Cloud migration is like a physical move, except it involves moving data, applications, and IT processes from some data centres to other data centres, instead of packing up and moving physical goods. Much like a move from a smaller office to a Continue Reading

Networking in Google Cloud Platform

Reading Time: 6 minutes Virtual Private Cloud Network or simply network is a virtual version of a physical network. In Google Cloud Networking, networks provide data connections into and out of cloud resources – mostly Compute Engine instances. Securing the networks is critical to securing the data and controlling access to the resources. Google Cloud Networking achieves flexible and logical isolation of unrelated resources through its different levels.

Amazon EMR

Reading Time: 3 minutes Businesses worldwide are discovering the power of new big data processing and analytics frameworks like Apache Hadoop and Apache Spark, but they are also discovering some of the challenges of operating these technologies in on-premises data lake environments. They may also have concerns about the future of their current distribution vendor. Common problems of on-premises big data environments include a lack of agility, excessive costs, Continue Reading

Apache Spark: Read Data from S3 Bucket

Reading Time: < 1 minute Amazon S3 Accessing S3 Bucket through Spark Edit spark-default.conf file You need to add below 3 lines consists of your S3 access key, secret key & file system

Apache Spark

Deep Dive into Apache Spark Transformations and Action

Reading Time: 4 minutes In our previous blog of Apache Spark, we discussed a little about what Transformations & Actions are? Now we will get deeper into the topic and will understand what actually they are & how they play a vital role to work with Apache Spark? What is Spark RDD? Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects Continue Reading

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading