Spring GCP BigQuery

Reading Time: 3 minutes

BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse. It’s Serverless, highly scalable, and cost-effective multicloud data warehouse designed for business agility.

Architecture diagram showing 5 primary means of bringing data into BigQuery. On left, labeled 1 is a line tagged File (CSV, JSON, AVRO, etc.). This flows to Cloud Storage and BigQuery. Line 2, tagged Partner DTS connectors and Google BigQuery DT connectors flows into BigQuery DTS and on to BigQuery. Line 3, tagged Data Fusion plug-ins flows to Data Fusion and on to BigQuery. Line 4 is tagged SAP Data Services and flows to BigQuery. Line 5 is tagged Partner integrations and flows to BigQuery.

BigQuery Benefits

  • Gain insights with real-time and predictive analytics
    – Query streaming data in real time and get up-to-date information on all your business processes. Predict business outcomes easily with built-in machine learning–without the need to move data.
  • Access data and share insights with ease
    – Securely access and share analytical insights in your organization with a few clicks. Easily create stunning reports and dashboards using popular business intelligence tools, out of the box.
  • Protect your data and operate with trust
    – Rely on BigQuery’s robust security, governance, and reliability controls that offer high availability and a 99.99% uptime SLA. Protect your data with encryption by default and customer-managed encryption keys.

BigQuery Features

  • ML and predictive modeling with BigQuery ML
    BigQuery ML enables data scientists and data analysts to build and operationalize ML models on planet-scale structured or semi-structured data, directly inside BigQuery, using simple SQL—in a fraction of the time
  • Multicloud data analysis with BigQuery Omni
    – BigQuery Omn is a flexible, fully managed, multicloud analytics solution that allows you to cost-effectively and securely analyze data across clouds such as AWS and Azure. Use standard SQL and BigQuery’s familiar interface.
  • Interactive data analysis with BigQuery BI Engine
    – BigQuery BI Engine is an in-memory analysis service built into BigQuery that enables users to analyze large and complex datasets interactively with sub-second query response time and high concurrency. BI Engine natively integrates with Google’s Data Studio using BI Engine single node and natively accelerates any other business intelligence tools using BI Engine SQL interface.
  • Geospatial analysis with BigQuery GIS
    – BigQuery GIS Uniquely combines the serverless architecture of BigQuery with native support for geospatial analysis, so you can augment your analytics workflows with location intelligence. Simplify your analyses, see spatial data in fresh ways, and unlock entirely new lines of business with support for arbitrary points, lines, polygons, and multi-polygons in common geospatial data formats.

Loading data into BigQuery

There are several ways to ingest data into BigQuery:

  • Batch load a set of data records.
  • Stream individual records or batches of records.
  • Use queries to generate new data and append or overwrite the results to a table.
  • Use a third-party application or service.

Spring Cloud GCP BigQuery 

Spring Cloud GCP provides:

  • A convenience starter which provides autoconfiguration for the client BigQuery objects with credentials needed to interface with BigQuery.
  • A Spring Integration message handler for loading data into BigQuery tables in your Spring integration pipelines.

Maven configuration

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-gcp-starter-bigquery</artifactId>
</dependency>

Gradle configuration:

dependencies {
    implementation("org.springframework.cloud:spring-cloud-gcp-starter-bigquery")
}

Application properties

The following application properties may be configured with Spring Cloud GCP BigQuery

spring.cloud.gcp.bigquery.datasetName = <BigQuery datase>
spring.cloud.gcp.bigquery.enabled = <Enables or disables Spring Cloud GCP BigQuery autoconfiguration>
spring.cloud.gcp.bigquery.project-id = <GCP project ID of the project using BigQuery APIs>
spring.cloud.gcp.bigquery.credentials.location = <Credentials file location for authenticating with the Google Cloud BigQuery APIs>

BigQuery Client Object

The GcpBigQueryAutoConfiguration class configures an instance of BigQuery for you by inferring your credentials and Project ID from the machine’s environment.

Example usage:

// BigQuery client object provided by our autoconfiguration.
@Autowired
BigQuery bigquery;

public void runQuery() throws InterruptedException {
  String query = "SELECT column FROM table;";
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(query).build();

  // Run the query using the BigQuery object
  for (FieldValueList row : bigquery.query(queryConfig).iterateAll()) {
    for (FieldValue val : row) {
      System.out.println(val);
    }
  }
}

BigQueryTemplate

The BigQueryTemplate class is a wrapper over the BigQuery client object and makes it easier to load data into BigQuery tables. A BigQueryTemplate is scoped to a single dataset. The autoconfigured BigQueryTemplate instance will use the dataset provided through the property spring.cloud.gcp.bigquery.datasetName.

Below is a code snippet of how to load a CSV data InputStream to a BigQuery table.

// BigQuery client object provided by our autoconfiguration.
@Autowired
BigQueryTemplate bigQueryTemplate;

public void loadData(InputStream dataInputStream, String tableName) {
  ListenableFuture<Job> bigQueryJobFuture =
      bigQueryTemplate.writeDataToTable(
          tableName,
          dataFile.getInputStream(),
          FormatOptions.csv());

  // After the future is complete, the data is successfully loaded.
  Job job = bigQueryJobFuture.get();
}

Running the application

  1. Create a Google Cloud Platform project with billing enabled, if you don’t have one already.
  2. Enable the BigQuery API from the APIs & Services menu of the Google Cloud Console.
    1. Authenticate in one of two ways:
    1. Use the Google Cloud SDK to authenticate with application default credentials.
    2. Create a new service account, download its private key and point it such as  spring.cloud.gcp.credentials.location=file:/path/to/creds.json
  3. Create a BigQuery dataset for yourself by navigating to the BigQuery dashboard. Under the Resources panel, click on your project ID, and then click Create Dataset under your project.

Written by 

Abid Khan is a Lead Consultant at Knoldus Inc., postgraduate (MCA), and having 5+ years of experience in JavaSE, JavaEE, ORM framework, Spring, Spring-boot, RESTful Web Services, Kafka, MQTT, Rabbitmq, Docker, Redis, MySQL, Maven, GIT, etc. He is a well-developed professional with a prolific track record of designing, testing, and monitoring software as well as upgrading the existing programs.