BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.
BigQuery’s serverless architecture lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.
Scope – Dynamic schema & Data loading
In this blog, we will see how to –
- Upload local JSON Data into GCS Bucket
- Overwrite Tables Data Load
- Append Load with Dynamic schema
- Load JSON data from the bucket
- Create Datasets
Prerequisites
- GCloud SDK kit Download Link
- GCS Bucket
- BigQuery API enabled
gcloud services enable bigquerystorage.googleapis.com
gcloud services enable bigquerymigration.googleapis.com
gcloud services enable bigquery.googleapis.com
- Service Account JSON key or gcloud auth login plugin Setup auth
- Download this template or clone it from the git repo.
https://github.com/knoldus/Dynamic-schema-loading-into-BigQuery
https://techhub.go1percent.com/dashboard/projects/devops/6294ddd8ea09b33f796287b7
Usage
1. Clone the repo or download the template
git clone https://github.com/knoldus/Dynamic-schema-loading-into-BigQuery
2. Export Required Variables
export PROJECT_ID=your_project_id
export GCS_BUCKET_ID=example-bucket-name
export DATASET=any_name_of_dataset
export BIGQUERY_TABLE=any_name_of_table
export GCS_BUCKET_PATH=json/data
export JSON_DATA_FILE=data.json
Run this command
source env_vars.sh
3. Directory Structure Should look like this
tree
Output should look like this:
├── data.json
├── env_vars.sh
├── loadTables.sh
├── README.md
└── vfv.json
0 directories, 5 files
4. Run the Script
./loadTables.sh
Output should look like:
Options...
1) Create a dataset in gcp-learn285
2) Create table with JSON data
3) Dynamic Append data in table
4) Overwrite JSON Data into table
5) Upload local JSON data into GCS bucket.
Make a choice [ 1,2,3,4,5 or q to exit ]
Make your choices & run operations to update Bigquery tables.

Big Query Operations
1. Upload JSON to the GCS bucket
Run the script & choose this option.
upload_local_data(){
echo ""
echo -e "Listing all json file in $(pwd) directory ... \n"
ls |grep json
echo ""
read -p "Enter the json data file: (like: data.json) " json
echo -e "\nUploading...."
cat $json|jq -c > $JSON_DATA_FILE
gsutil cp $JSON_DATA_FILE gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE
rm $JSON_DATA_FILE
}
2. Create Datasets
Run the script & create dataset
echo "Creating dataset into bigquery"
read -p "Enter name of your Dataset. [default: $DATASET] " dataset
if [[ -z "$dataset" ]];then
dataset=$DATASET
fi
bq --location=us mk --dataset $PROJECT_ID:$dataset
3. Create a Table when loading
Run the script & create table when upload JSON data
echo "Listing Datasets in $PROJECT_ID project..."
bq ls --datasets
read -p "Enter name of your Dataset. [default: $DATASET] " dataset
if [[ -z "$dataset" ]];then
dataset=$DATASET
fi
read -p "Choose name of your BigQuery Table to be created in $dataset Dataset. [default: $BIGQUERY_TABLE] " table
if [[ -z "$table" ]];then
table=$BIGQUERY_TABLE
fi
bq --location=us load \
--noreplace \
--autodetect \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
$PROJECT_ID:$dataset.$table \
gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE
4. Update table with dynamic schema
Run the script to update table with dynamic schema loading.
echo "Listing Datasets in $PROJECT_ID project..."
bq ls --datasets
read -p "Enter name of your Dataset. [default: $DATASET] " dataset
if [[ -z "$dataset" ]];then
dataset=$DATASET
fi
echo "Listing Tables in $dataset dataset...."
bq ls $dataset
read -p "Choose name of your BigQuery Table in $dataset Dataset. [default: $BIGQUERY_TABLE] " table
if [[ -z "$table" ]];then
table=$BIGQUERY_TABLE
fi
bq --location=us load \
--noreplace \
--autodetect \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
$PROJECT_ID:$dataset.$table \
gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE
5. Overwrite Tables
Run script & choose overwrite option
echo "Listing Datasets in $PROJECT_ID project..."
bq ls --datasets
read -p "Enter name of your Dataset. [default: $DATASET] " dataset
if [[ -z "$dataset" ]];then
dataset=$DATASET
fi
echo "Listing Tables in $dataset dataset...."
bq ls $dataset
read -p "Choose name of your BigQuery Table in $dataset Dataset. [default: $BIGQUERY_TABLE] " table
if [[ -z "$table" ]];then
table=$BIGQUERY_TABLE
fi
bq --location=us load \
--autodetect \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
$PROJECT_ID:$dataset.$table \
gs://$GCS_BUCKET_ID/$GCS_BUCKET_PATH/$JSON_DATA_FILE
Conclusion
So with this Bigquery script, you can upload JSON data into GCS bucket & update bigquery tables.