Dynamic Schema & Data Loading Into BigQuery

Table of contents

Reading Time: 3 minutes

BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.

BigQuery’s serverless architecture lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.

Scope – Dynamic schema & Data loading

In this blog, we will see how to –

Upload local JSON Data into GCS Bucket
Overwrite Tables Data Load
Append Load with Dynamic schema
Load JSON data from the bucket
Create Datasets

Prerequisites

GCloud SDK kit Download Link
GCS Bucket
BigQuery API enabled
- gcloud services enable bigquerystorage.googleapis.com
- gcloud services enable bigquerymigration.googleapis.com
- gcloud services enable bigquery.googleapis.com
Service Account JSON key or gcloud auth login plugin Setup auth
Download this template or clone it from the git repo.
https://github.com/knoldus/Dynamic-schema-loading-into-BigQuery
https://techhub.go1percent.com/dashboard/projects/devops/6294ddd8ea09b33f796287b7

Usage

1. Clone the repo or download the template

git clone https://github.com/knoldus/Dynamic-schema-loading-into-BigQuery

2. Export Required Variables

export PROJECT_ID=your_project_id
export GCS_BUCKET_ID=example-bucket-name
export DATASET=any_name_of_dataset
export BIGQUERY_TABLE=any_name_of_table
export GCS_BUCKET_PATH=json/data
export JSON_DATA_FILE=data.json

Run this command

source env_vars.sh

3. Directory Structure Should look like this

tree

Output should look like this:

├── data.json
├── env_vars.sh
├── loadTables.sh
├── README.md
└── vfv.json

0 directories, 5 files

4. Run the Script

./loadTables.sh

Output should look like:

Options...

1) Create a dataset in gcp-learn285 
2) Create table with JSON data 
3) Dynamic Append data in table 
4) Overwrite JSON Data into table 
5) Upload local JSON data into GCS bucket.

Make a choice [  1,2,3,4,5 or q to exit  ]

Make your choices & run operations to update Bigquery tables.

Big Query Operations

1. Upload JSON to the GCS bucket

Run the script & choose this option.

upload_local_data(){
echo ""
echo -e "Listing all json file in $(pwd) directory ... \n"
ls |grep json
echo ""
read -p "Enter the json data file: (like: data.json) " json
echo -e "\nUploading...."

cat $json|jq -c > $JSON_DATA_FILE

gsutil cp $JSON_DATA_FILE gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE

rm $JSON_DATA_FILE
}

2. Create Datasets

Run the script & create dataset

echo "Creating dataset into bigquery"
 read -p "Enter name of your Dataset. [default: $DATASET] " dataset

 if [[ -z "$dataset" ]];then
  dataset=$DATASET
 fi
 bq --location=us mk --dataset $PROJECT_ID:$dataset

Run the script & create table when upload JSON data

echo "Listing Datasets in $PROJECT_ID project..."   
bq ls --datasets
read -p "Enter name of your Dataset. [default: $DATASET] " dataset
if [[ -z "$dataset" ]];then
  dataset=$DATASET
 fi

read -p "Choose name of your BigQuery Table to be created in $dataset Dataset. [default: $BIGQUERY_TABLE] " table

if [[ -z "$table" ]];then
  table=$BIGQUERY_TABLE
 fi


bq --location=us load \
--noreplace \
--autodetect \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
$PROJECT_ID:$dataset.$table \
gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE

4. Update table with dynamic schema

Run the script to update table with dynamic schema loading.

echo "Listing Datasets in $PROJECT_ID project..."   
bq ls --datasets
read -p "Enter name of your Dataset. [default: $DATASET] " dataset
if [[ -z "$dataset" ]];then
  dataset=$DATASET
 fi
 echo "Listing Tables in $dataset dataset...."
bq ls $dataset
read -p "Choose name of your BigQuery Table in $dataset Dataset. [default: $BIGQUERY_TABLE] " table

if [[ -z "$table" ]];then
  table=$BIGQUERY_TABLE
 fi

bq --location=us load \
--noreplace \
--autodetect \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
$PROJECT_ID:$dataset.$table \
gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE

5. Overwrite Tables

Run script & choose overwrite option

echo "Listing Datasets in $PROJECT_ID project..."   
bq ls --datasets
read -p "Enter name of your Dataset. [default: $DATASET] " dataset
if [[ -z "$dataset" ]];then
  dataset=$DATASET
 fi
 echo "Listing Tables in $dataset dataset...."
bq ls $dataset
read -p "Choose name of your BigQuery Table in $dataset Dataset. [default: $BIGQUERY_TABLE] " table

if [[ -z "$table" ]];then
  table=$BIGQUERY_TABLE
 fi

bq --location=us load \
--autodetect \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
$PROJECT_ID:$dataset.$table \
gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE