BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse. It’s Serverless, highly scalable, and cost-effective multicloud data warehouse designed for business agility.
BigQuery Benefits
- Gain insights with real-time and predictive analytics
– Query streaming data in real time and get up-to-date information on all your business processes. Predict business outcomes easily with built-in machine learning–without the need to move data. - Access data and share insights with ease
– Securely access and share analytical insights in your organization with a few clicks. Easily create stunning reports and dashboards using popular business intelligence tools, out of the box. - Protect your data and operate with trust
– Rely on BigQuery’s robust security, governance, and reliability controls that offer high availability and a 99.99% uptime SLA. Protect your data with encryption by default and customer-managed encryption keys.
BigQuery Features
- ML and predictive modeling with BigQuery ML
– BigQuery ML enables data scientists and data analysts to build and operationalize ML models on planet-scale structured or semi-structured data, directly inside BigQuery, using simple SQL—in a fraction of the time - Multicloud data analysis with BigQuery Omni
– BigQuery Omn is a flexible, fully managed, multicloud analytics solution that allows you to cost-effectively and securely analyze data across clouds such as AWS and Azure. Use standard SQL and BigQuery’s familiar interface. - Interactive data analysis with BigQuery BI Engine
– BigQuery BI Engine is an in-memory analysis service built into BigQuery that enables users to analyze large and complex datasets interactively with sub-second query response time and high concurrency. BI Engine natively integrates with Google’s Data Studio using BI Engine single node and natively accelerates any other business intelligence tools using BI Engine SQL interface. - Geospatial analysis with BigQuery GIS
– BigQuery GIS Uniquely combines the serverless architecture of BigQuery with native support for geospatial analysis, so you can augment your analytics workflows with location intelligence. Simplify your analyses, see spatial data in fresh ways, and unlock entirely new lines of business with support for arbitrary points, lines, polygons, and multi-polygons in common geospatial data formats.
Loading data into BigQuery
There are several ways to ingest data into BigQuery:
- Batch load a set of data records.
- Stream individual records or batches of records.
- Use queries to generate new data and append or overwrite the results to a table.
- Use a third-party application or service.
Spring Cloud GCP BigQuery
Spring Cloud GCP provides:
- A convenience starter which provides autoconfiguration for the client BigQuery objects with credentials needed to interface with BigQuery.
- A Spring Integration message handler for loading data into BigQuery tables in your Spring integration pipelines.
Maven configuration
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-gcp-starter-bigquery</artifactId>
</dependency>
Gradle configuration:
dependencies {
implementation("org.springframework.cloud:spring-cloud-gcp-starter-bigquery")
}
Application properties
The following application properties may be configured with Spring Cloud GCP BigQuery
spring.cloud.gcp.bigquery.datasetName = <BigQuery datase>
spring.cloud.gcp.bigquery.enabled = <Enables or disables Spring Cloud GCP BigQuery autoconfiguration>
spring.cloud.gcp.bigquery.project-id = <GCP project ID of the project using BigQuery APIs>
spring.cloud.gcp.bigquery.credentials.location = <Credentials file location for authenticating with the Google Cloud BigQuery APIs>
BigQuery Client Object
The GcpBigQueryAutoConfiguration
class configures an instance of BigQuery for you by inferring your credentials and Project ID from the machine’s environment.
Example usage:
// BigQuery client object provided by our autoconfiguration.
@Autowired
BigQuery bigquery;
public void runQuery() throws InterruptedException {
String query = "SELECT column FROM table;";
QueryJobConfiguration queryConfig =
QueryJobConfiguration.newBuilder(query).build();
// Run the query using the BigQuery object
for (FieldValueList row : bigquery.query(queryConfig).iterateAll()) {
for (FieldValue val : row) {
System.out.println(val);
}
}
}
BigQueryTemplate
The BigQueryTemplate
class is a wrapper over the BigQuery
client object and makes it easier to load data into BigQuery tables. A BigQueryTemplate
is scoped to a single dataset. The autoconfigured BigQueryTemplate
instance will use the dataset provided through the property spring.cloud.gcp.bigquery.datasetName
.
Below is a code snippet of how to load a CSV data InputStream
to a BigQuery table.
// BigQuery client object provided by our autoconfiguration.
@Autowired
BigQueryTemplate bigQueryTemplate;
public void loadData(InputStream dataInputStream, String tableName) {
ListenableFuture<Job> bigQueryJobFuture =
bigQueryTemplate.writeDataToTable(
tableName,
dataFile.getInputStream(),
FormatOptions.csv());
// After the future is complete, the data is successfully loaded.
Job job = bigQueryJobFuture.get();
}
Running the application
- Create a Google Cloud Platform project with billing enabled, if you don’t have one already.
- Enable the BigQuery API from the APIs & Services menu of the Google Cloud Console.
- Authenticate in one of two ways:
- Use the Google Cloud SDK to authenticate with application default credentials.
- Create a new service account, download its private key and point it such as
spring.cloud.gcp.credentials.location=file:/path/to/creds.json
- Create a BigQuery dataset for yourself by navigating to the BigQuery dashboard. Under the Resources panel, click on your project ID, and then click Create Dataset under your project.