This blog pertains to Time Travel and Fail-safe in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started.
Introduction to Time Travel
Snowflake allows accessing historical data of a point in the past that may have been modified or deleted at the current time. Using time travel functionality a number of situations can be handled. For example, there can be accidental updates on data that we would want to revert. So using time travel, we can recover and query data before the update ran on that data. Similarly, if we accidentally drop the table, we can easily un-drop that table and return to the previous state. Also, from an analysis perspective, we can perform queries on a table over a particular period of time using the time travel functionality.
Using time travel we can perform the following functionalities with extreme ease:
a. We can query data from the past that has been changed or deleted.
b. We can restore databases, tables & schemas that may have been dropped.
c. We can also clone databases, tables & schemas of a specific time. This feature basically combines cloning and time travel together.
How does Snowflake perform Time Travel?
Snowflake saves the state of the data before making an update to the data. This is done by Snowflake internally and it is not visible to the user. But the duration from which the state of the data is saved is driven by the retention period.
The retention period may vary between 0 to 90 days. By default, the retention period is set to 1 day but that can be configured. We can configure it separately for each database, table & schema.
Invoking Time Travel
Snowflake provides us 3 method of time travel –
a. Using Timestamp – We can do time travel to any point of time before or after the specified timestamp.
b. Using Offset – We can do time travel to any previous point of time.
c. Using Query ID – We can do time travel to any point of time before or after the specified Query ID.
Here is an example, the Time Travel feature is used to query data at a specific time. We will also look into a scenario where we accidentally update a table and will see how data in that table looks before we ran the update query.
If you are not aware of loading data into the Snowflake table then, you should read this blog on Loading bulk data into Snowflake first and then come back, as I am not repeating the code here. ☺
- We set the time to ‘UTC’ (Coordinated Universal Time) which is a best practice to avoid any confusion in the time.
Sample Code: (Assuming you have CSV files placed in an S3 bucket and loaded data into Snowflake table)
ALTER SESSION SET TIMEZONE = 'UTC'; --Select the current timestamp SELECT CURRENT_TIMESTAMP; --Copy the timestamp somewhere given by the query --2020-07-13 16:04:34.068 +0000 --Check the result from the sales table SELECT * FROM my_db.public.sales LIMIT 20; --Before executing the update query, wait for 2-3 minutes to see the time travel effectively --Update query UPDATE my_db.public.sales SET CustomerName = 'Alice'; --Check the result from the sales table SELECT * FROM my_db.public.sales LIMIT 20; --Time travel using a timestamp --Time travel to time just BEFORE the update was run --Use the timestamp that you saved earlier SELECT * FROM my_db.public.sales BEFORE(TIMESTAMP => '2020-07-13 16:04:34.068'::timestamp); --Time travel using offset --Time travel to 5 minutes ago (i.e. before we ran the update) SELECT * FROM my_db.public.sales AT(OFFSET => -60 * 5) --Time travel to 15 minutes ago SELECT * FROM my_db.public.sales AT(OFFSET => -60 * 15) --Another update statement UPDATE my_db.public.sales SET CustomerID = NULL; --Copy the Query ID of the update statement --019498ec-000e-2244-0000-00001df52f75 --Check the result from the sales table SELECT * FROM my_db.public.sales LIMIT 20; --Time travel using Query ID --Time travel to the time before the update query was run SELECT * FROM my_db.public.sales BEFORE(STATEMENT => '019498ec-000e-2244-0000-00001df52f75');
You must be wondering how to store the recovered records back into the table. Well you can do this by creating a different table from the recovered records and then drop the original table (if needed).
--Recovery CREATE OR REPLACE TABLE my_db.public.sales_recovered AS SELECT * FROM my_db.public.sales AT(OFFSET => -60 * 5) --Verify the sales_recovered table SELECT COUNT(*) FROM my_db.public.sales_recovered; SELECT * FROM my_db.public.sales_recovered LIMIT 20;
Now, let see how you can modify the retention period for any table. (Default retention period is of 1 day)
--Modify the retention period ALTER TABLE my_db.public.sales_recovered SET DATA_RETENTION_TIME_IN_DAYS = 0; --Check the retention period along with other details SHOW TABLES LIKE 'sales' in my_db.public;
Time Travel with UNDROP
Undrop functionality is very helpful if we accidentally drop a table and want to undo it. In most databases, it would usually require a restore from a backup. But not so in Snowflake, as we can simply run an undrop command and recover the table. Undrop command can be used for database, table & schema.
Let’s look at an example, where we will drop a table and restore it back.
--Dropping a table DROP TABLE sales; SELECT COUNT(*) FROM sales; --Undropping a table UNDROP TABLE sales; --Check the result SELECT COUNT(*) FROM sales; SELECT * FROM sales LIMIT 20;
Snowflake provides the Fail-safe feature to ensure data protection. Fail-safe is the part of the continuous data protection lifecycle provided by Snowflake. It provides non-configurable 7 days further storage of historical data after the time travel period has ended.
So, basically, if we update or delete “n” number of rows from a table then these changed rows will first be retained in the time travel history for the configured retention period, which can be up to 90 days. Once the time travel history ends, Snowflake still keeps the data for a further 7 days as a backup.
It is, however, important to note that the time travel data can be directly queried by the customer but the fail-safe data can only be recovered by requesting Snowflake support. Also, time travel and fail-safe both contribute towards the storage cost.
Although data is backed up at regular intervals, there can be situations between the two back-ups during which the data has been changed accidentally or deleted due to some mistake. So the fail-safe provides a quick and efficient way to recover such data with minimal cost and minimal effort.
Key points for Fail-safe
- The storage for fail-safe can be of 0 or 7 days and it is not configurable.
- For permanent tables, the fail-safe is 7 days.
- For transient and temporary tables, the fail-safe is 0 days.
- No user queries are allowed on fail-safe.
- Data can only be recovered through Snowflake support.
We can check the fail-safe information for any table using the following query. Basically, it will show the amount of memory a table is occupying in the fail-safe region. (ACCOUNTADMIN has the privilege to see this information) –
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS; --WHERE TABLE_NAME = 'sales'; --Convert the result into GBs SELECT TABLE_NAME, ACTIVE_BYTES / (1024*1024*1024) AS STORAGE_USED, TIME_TRAVEL_BYTES / (1024*1024*1024) AS TIME_TRAVEL_STORAGE, FAILSAFE_BYTES / (1024*1024*1024) AS FAILSAFE_STORAGE FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS; --WHERE TABLE_NAME = 'sales';
This is all from this blog. Hope you enjoyed the blog and it helped you!! Stay connected for more future blogs. Thank you!! 🙂