Spring Cloud GCP And BigQuery: A Way Large Dataset Handle

Reading Time: 5 minutes
Spring cloud GCP and BigQuery
The above image is courtesy of google.

Introduction

In this example, we are going to cover how to load data into the BigQuery table and query the table using GCP console and API. We will use Spring Cloud GCP to make our application ready for accessing the Google Cloud Platform services.

Spring Cloud GCP and BigQuery

Spring Cloud GCP and BigQuery is a library that provides support for Google Cloud Platform (GCP) services in a Spring Boot application. You can use this library to connect your application with BigQuery, or query datasets and tables programmatically therefore It also offers offline persistence of queries for future execution on demand without requiring any code changes.

BigQuery is a serverless, highly scalable, and cost-effective multicloud data warehouse designed for business agility. BigQuery has two main components: Dataset and Tables. The Dataset component stores the schema definition of your table; the Table component stores its data in BigQuery storage containers called tablespaces:

What Is Spring Cloud GCP?

Spring Cloud GCP is a Spring Cloud-based framework that helps you to configure and manage your Google Cloud Platform (GCP) services.

Configure single or multiple projects on the fly. You can specify service accounts, storage buckets, and other parameters in your configuration file.

Define policies to restrict access to various GCP APIs by the user or group roles.

Define properties for each service instance in your application such as its name and location, which are used when creating resources via HTTP requests or command-line tools like gcloud .

What Is BigQuery?

BigQuery is a serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility. it is a fully managed, no-ops service that enables you to focus on analyzing data and turning insights into action.

this can handle terabytes of data at a time and supports complex analytical queries on billions of rows. You pay only for the resources you use—no upfront costs or long-term commitments are required.

Project Setup

Configure Spring Cloud GCP.

Create a project in the Google Cloud Platform Console.

Create a service account for your application to use. You can create a new service account or use an existing one (e.g., created during setup). In the following steps, we’ll use a new service account called [your-project].

Create a database instance and database table to store your data in BigQuery – this is where all of our streaming data will be stored!

Below is Mevan dependency to add-in project

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-gcp-starter-bigquery</artifactId>
</dependency>

And gradle coordinates:

dependencies {
implementation(“org.springframework.cloud:spring-cloud-gcp-starter-bigquery”)
}

Below is the code Sample from Google (Only for reference/ The code is courtesy of google.)

// BigQuery client object provided by our autoconfiguration.
@Autowired
BigQuery bigquery;

public void runQuery() throws InterruptedException {
  String query = "SELECT column FROM table;";
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(query).build();

  // Run the query using the BigQuery object
  for (FieldValueList row : bigquery.query(queryConfig).iterateAll()) {
    for (FieldValue val : row) {
      System.out.println(val);
    }
  }
}

For More Details please find the below links.
Sample Application from Google.
Sample API implementation example from Google.

Creating a BigQuery Table

To create a BigQuery table, you’ll need to do the following:

  • Create a project and dataset.
  • Create a table schema in YAML format.

Querying the table using GCP’s console

The Google Cloud Platform Console provides a web-based interface for managing projects and services.

Start by opening the console and clicking on the “BigQuery” button:

Next, click on “bigquery-datasets-xxxxx” to open that dataset’s screen. It should look like this:

Underneath your dataset is all of its tables. Clicking on any given table will reveal more information about it, including its name and schema:

Lastly, you can use the Query Editor to run queries against your data source. For example, here we’re selecting all users whose age is greater than 35 years old (inclusive):

Loading data into the table From the Local File System

  • Create a directory for the source files.
  • Copy the source files into the directory.
  • Create a local BigQuery table:

bq mk -t [PROJECT_ID]:[DATASET_ID].bigquery-sources/ where is one of these strings: java -jar bigquery-loader4j-1.2.jar load \ --sourceFormat=NEWLINE_DELIMITED \ --sourceUris=gs://[BUCKET]//.csv \ --destinationTable=[PROJECT_ID]:[DATASET].[TABLE] where is the bucket name and is an absolute or relative path to your file on Google Cloud Storage (GCS). For example, if you want to load, which is in, use gs://bucket/mydirectory. If you want to load users2.csv, which is in, use gcs://bucket2a//mydirectory2a///users2a.

Loading the Data to the Table Using API

  • Load the data using the load job:
  • The Spring Cloud GCP and BigQuery module provide you with a convenient JobOperator interface to perform operations on jobs. You can use this interface to load your data into BigQuery tables by creating a LoadJob object and passing it as a reference to your source table. The load job will perform an insert operation for each row in your source table, resulting in a new table at a destination that’s identical to the source table up until its last update time (if any).

Query the table using API

You can query the table using API. To do this, you need to:

  • Connect to the BigQuery service. See Connecting to BigQuery for details on how to connect with Cloud SDK tools and with GCP Console.
  • Create a client context for your project using the credentials of your project. For example, if you have an existing service account named ‘service_account’ and a project named ‘bigquery-tutorial’, run:
gsutil bigquery create project bigquery-tutorial -nkcrc -credential_file = \\\\path\\to\\service_account\\keyfile

where path/to/service_account/keyfile is the path where you want Google Cloud Storage temporary key files created by gsutil to be stored (for example, on Windows). If no path is specified in this command line argument, gsutil uses its default behavior which may be different depending on whether or not it’s running from within a GCP project directory whose name ends in “.gcloud”. In particular, if running from within such a directory then it will store temporary keys there; otherwise, it will store them under $HOME/.gcloud on Unix systems or %APPDATA%\gcloud on Windows systems “`yaml

Conclusion

In conclusion, we have learned that BigQuery is a powerful cloud-based data warehouse that can be used for handling large amounts of data. It can work with a wide variety of sources and types of data, even if they are non-relational. BigQuery also provides an API and several ways to connect it with other services. However, there are some important limitations to keep in mind while using BigQuery:

  • The supported schema types differ based on the source type; some sources do not support all defined schemas
  • Data must be loaded into tables before querying them
  • Table information (e.g., stats) is only available upon loading the table’s contents

The series of blog posts aims to provide an overview of how to leverage Google’s serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility. In this post, we are going to cover basic configuration and usage examples of Google BigQuery

BigQuery is a fully managed, serverless, highly scalable, cost-effective, and enterprise-ready cloud data warehouse designed for business agility.

BigQuery has a flexible pricing model that makes it easy to pay for only the storage and queries you need.

In this blog post, we have covered basic configuration and usage examples of Google BigQuery using Spring Cloud GCP. We hope that you will find this series of blog posts informative in your journey to learn more about Google’s Cloud Platform. Please feel free to leave your comments and suggestions on how we could improve these articles.

For BigQuery more Details please visit our Blogs.
For detailed Reference please refer to Google Documents below link.
https://docs.spring.io/spring-cloud-gcp/docs/current/reference/html/index.html#introduction

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading