In a previous blog, we had seen BigQuery facilitate efficient data warehouse schema design. BigQuery supports the nested & repeated columns. We can use a combination of ARRAY
and STRUCT
data types to define our schema in BigQuery. It enables to denormalize data efficiently in single table. In this blog, for the same schema of sales data, we will execute a few DML operations on nested array fields.
Schema

In the schema, products is of type RECORD means nested or struct type. And mode is REPEATED means array. So, products is array of struct type.
Sample data
{"odr_id":"1","str_loc":"NewYork","amount":"1450","cust_id" : "C012","products":[{"prdct_id":"1234","prdct_nm" : "Shampoo","pdct_prc": "140"},{"prdct_id":"421", "prdct_nm" : "Grinder","pdct_prc": "240"}]}
{"odr_id":"2","str_loc":"NewYork","amount":"1250","cust_id" : "C034","products":[{"prdct_id":"34P","prdct_nm" : "Table","pdct_prc": "140"},{"prdct_id":"42A","prdct_nm" : "Chair","pdct_prc": "440"}]}
Select Queries
SELECT * FROM demo.test_nested.sales



Recommends to not execute Select * query. In BigQuery, you can use preview option which is free of cost.
Retrieve product id and product name for customer C012
Note: we can’t get product id and product name with below query.
SELECT prdct_id, prdct_nm FROM l-ulti-data-dwh-6816.test_nested.sales
where cust_id = 'C012'
ERROR: Unrecognized name: prdct_id at [1:8]
To get from nested field need to cross join or use UNNEST.
SELECT prdct_id, prdct_nm FROM l-ulti-data-dwh-6816.test_nested.sales
, Unnest(products) where cust_id = 'C012'



Number of products purchases by customer C012
This we can get by finding the length of the array of products.
SELECT ARRAY_LENGTH(products) FROM l-ulti-data-dwh-6816.test_nested.sales where cust_id = 'C012';



Update Queries
Update product id of Shampoo p-103 for customer C012
UPDATE l-ulti-data-dwh-6816.test_nested.sales SET prdct_id = 'p103' where prdct_nm = 'Shampoo' and cust_id = 'C012'
Again the above query will not work for the nested field.
Unrecognized name: prdct_nm at [1:75]
To update nested field the correct query would be:
UPDATE l-ulti-data-dwh-6816.test_nested.sales SET products = ARRAY(SELECT AS STRUCT * REPLACE('p-103' AS prdct_id) FROM UNNEST(products) where prdct_nm = 'Shampoo') where cust_id='C012'



There can be more complex use cases where more complex SQL will be needing.


