Reading Time: 2 minutes Well, a one working with spark is very much familiar with the ways of reading the file from local either from a Table or HDFS or from any file. But do you know how tricky it is to read data into spark from an S3 bucket? So, this blog makes you give a stepwise follow up to how to read data from an S3 bucket. Continue Reading
Reading Time: 4 minutes OpenEBS is the leading open-source project for container-attached and container-native storage on Kubernetes. OpenEBS adopts Container Attached Storage (CAS) approach, where each workload is provided with a dedicated storage controller. OpenEBS implements granular storage policies and isolation that enable users to optimize storage for each specific workload. OpenEBS runs in userspace and does not have any Linux kernel module dependencies.
Reading Time: 3 minutes If your actors are distributed across several nodes in the cluster, Cluster Sharding allows you to interact with them without worrying about their physical location and using only their logical identifier. Even if an actor re-locates to a new node, Akka will take care of locating it for you. You just need to send a message to it as if it is located on your local node.
Reading Time: 6 minutes Internet of Things or IoT is everywhere. From smart homes & smart cities to your fitness trackers & connected cars, we have seen them all and there’s more to come. As we gear up for 2020, studies suggest that IoT will comprise of 30 billion connected devices and that number may go up to 500 billion in another 10 years. IoT is changing the trajectory Continue Reading
Reading Time: 4 minutes Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLib for machine learning, Graphx for graph processing, and Spark Streaming. Here, are the Spark core components All Continue Reading
Reading Time: 4 minutes Introduction You are developing a product trying to help your core business grow through technology. The easiest way to start is to design what you need and start coding it as a monolithic blob. The product becomes successful and you start seeing the impact that it has made on your business. The next thought is to make the product commercially available so that other businesses Continue Reading
Reading Time: 5 minutes Introduction & the Problem One of our key customers, a large cruise line has ships sail with capacity running into few thousands of people on board. They are going through a successful digital transformation which includes managing full life cycle of a guest on mobile, data science-driven personalization, etc and we are fortunate to be part of the whole journey. These ships generate varieties of Continue Reading
Reading Time: 4 minutes In a previous blog post I wrote about integration testing with H2, an in-memory database. I mentioned that H2 does not perfectly emulate other databases such as Oracle. This means that H2 cannot execute all queries meant to be executed by an Oracle database. This has been a problem for me as I have been writing integration tests for code that makes many calls to Continue Reading
Reading Time: 4 minutes What do you do when you get an error/issue with the code? What is the first thought that comes to your mind? What if you are trying to fix a code that was written by someone else? That’s right. We check the logs. We all know that Useful logs can provide the developer ( especially when someone has to debug/maintain someone else’s code ) Continue Reading
Reading Time: 3 minutes Akka Cluster Formation Every actor has an address in Akka. The actor could be present locally or could be remote. Remote Actors require communication over the network. Each Actor system in a cluster is called a member or node. Node is addressed by a combination of hostname, port, and UUID (Regenerated when Actor System restarted). An actor can join the cluster with this combination to Continue Reading
Reading Time: 3 minutes Hi Guys, In this blog we will see some excerpts from the slides by Martin Odersky and Dmitry Petrashko. How From DOT to DOTTY evolved? So, let’s begin. The DOT calculus is intended to be a new minimal foundation of Scala which is to be Scala 3. Its type structure is a blueprint for the types used internally in the compiler. DOT is a core Continue Reading
Reading Time: 3 minutes Does partitioning help you increase/decrease the Job Performance? Spark splits data into partitions and computation is done in parallel for each partition. It is very important to understand how data is partitioned and when you need to manually modify the partitioning to run spark applications efficiently. Now, diving into our main topic i.e Repartitioning v/s Coalesce What is Coalesce? The coalesce method reduces the number Continue Reading