KSnow: Time Travel and Fail-safe in Snowflake

Table of contents
Reading Time: 5 minutes

This blog pertains to Time Travel and Fail-safe in Snowflake, and I will explain you all the things you need to know about these features with practical example. So let’s get started.

Introduction to Time Travel

Snowflake allows accessing historical data of a point in the past that may have been modified or deleted at the current time. Using time travel functionality a number of situations can be handled. For example, there can be accidental updates on data that we would want to revert. So using time travel, we can recover and query data before the update ran on that data. Similarly, if we accidentally drop the table, we can easily un-drop that table and return to the previous state. Also, from an analysis perspective, we can perform queries on a table over a particular period of time using the time travel functionality. 

Using time travel we can perform the following functionalities with extreme ease:

a. We can query data from the past that has been changed or deleted.
b. We can restore databases, tables & schemas that may have been dropped.
c. We can also clone databases, tables & schemas of a specific time. This feature basically combines cloning and time travel together.

How does Snowflake perform Time Travel?

Snowflake saves the state of the data before making an update to the data. This is done by Snowflake internally and it is not visible to the user. But the duration from which the state of the data is saved is driven by the retention period.

The retention period may vary between 0 to 90 days. By default, the retention period is set to 1 day but that can be configured. We can configure it separately for each database, table & schema.

Invoking Time Travel

Snowflake provides us 3 method of time travel –

a. Using Timestamp – We can do time travel to any point of time before or after the specified timestamp.

b. Using Offset – We can do time travel to any previous point of time.

c. Using Query ID – We can do time travel to any point of time before or after the specified Query ID.

Example

Here is an example, the Time Travel feature is used to query data at a specific time. We will also look into a scenario where we accidentally update a table and will see how data in that table looks before we ran the update query.

If you are not aware of loading data into the Snowflake table then, you should read this blog on Loading bulk data into Snowflake first and then come back, as I am not repeating the code here. ☺

  • We set the time to ‘UTC’ (Coordinated Universal Time) which is a best practice to avoid any confusion in the time.

Sample Code: (Assuming you have CSV files placed in an S3 bucket and loaded data into Snowflake table)

ALTER SESSION SET TIMEZONE = 'UTC';

--Select the current timestamp
SELECT CURRENT_TIMESTAMP;

--Copy the timestamp somewhere given by the query
--2020-07-13 16:04:34.068 +0000

--Check the result from the sales table
SELECT * FROM my_db.public.sales LIMIT 20;

--Before executing the update query, wait for 2-3 minutes to see the time travel effectively

--Update query
UPDATE my_db.public.sales SET CustomerName = 'Alice';

--Check the result from the sales table
SELECT * FROM my_db.public.sales LIMIT 20;

--Time travel using a timestamp
--Time travel to time just BEFORE the update was run
--Use the timestamp that you saved earlier
SELECT * FROM my_db.public.sales BEFORE(TIMESTAMP => '2020-07-13 16:04:34.068'::timestamp);

--Time travel using offset
--Time travel to 5 minutes ago (i.e. before we ran the update)
SELECT * FROM my_db.public.sales AT(OFFSET => -60 * 5)

--Time travel to 15 minutes ago
SELECT * FROM my_db.public.sales AT(OFFSET => -60 * 15)

--Another update statement
UPDATE my_db.public.sales SET CustomerID = NULL;

--Copy the Query ID of the update statement
--019498ec-000e-2244-0000-00001df52f75

--Check the result from the sales table
SELECT * FROM my_db.public.sales LIMIT 20;

--Time travel using Query ID
--Time travel to the time before the update query was run
SELECT * FROM my_db.public.sales BEFORE(STATEMENT => '019498ec-000e-2244-0000-00001df52f75');

You must be wondering how to store the recovered records back into the table. Well you can do this by creating a different table from the recovered records and then drop the original table (if needed).

--Recovery
CREATE OR REPLACE TABLE my_db.public.sales_recovered
AS
SELECT * FROM my_db.public.sales AT(OFFSET => -60 * 5)

--Verify the sales_recovered table
SELECT COUNT(*) FROM my_db.public.sales_recovered;
SELECT * FROM my_db.public.sales_recovered LIMIT 20;

Now, let see how you can modify the retention period for any table. (Default retention period is of 1 day)

--Modify the retention period
ALTER TABLE my_db.public.sales_recovered SET DATA_RETENTION_TIME_IN_DAYS = 0;

--Check the retention period along with other details
SHOW TABLES LIKE 'sales' in my_db.public;

Time Travel with UNDROP

Undrop functionality is very helpful if we accidentally drop a table and want to undo it. In most databases, it would usually require a restore from a backup. But not so in Snowflake, as we can simply run an undrop command and recover the table. Undrop command can be used for database, table & schema.

Example

Let’s look at an example, where we will drop a table and restore it back.

--Dropping a table
DROP TABLE sales;

SELECT COUNT(*) FROM sales;

--Undropping a table
UNDROP TABLE sales;

--Check the result
SELECT COUNT(*) FROM sales;
SELECT * FROM sales LIMIT 20;

FAIL-SAFE

Snowflake provides the Fail-safe feature to ensure data protection. Fail-safe is the part of the continuous data protection lifecycle provided by Snowflake. It provides non-configurable 7 days further storage of historical data after the time travel period has ended.

So, basically, if we update or delete “n” number of rows from a table then these changed rows will first be retained in the time travel history for the configured retention period, which can be up to 90 days. Once the time travel history ends, Snowflake still keeps the data for a further 7 days as a backup.

It is, however, important to note that the time travel data can be directly queried by the customer but the fail-safe data can only be recovered by requesting Snowflake support. Also, time travel and fail-safe both contribute towards the storage cost.

Why Fail-safe?

Although data is backed up at regular intervals, there can be situations between the two back-ups during which the data has been changed accidentally or deleted due to some mistake. So the fail-safe provides a quick and efficient way to recover such data with minimal cost and minimal effort.

Key points for Fail-safe

  • The storage for fail-safe can be of 0 or 7 days and it is not configurable.  
  • For permanent tables, the fail-safe is 7 days.
  • For transient and temporary tables, the fail-safe is 0 days.
  • No user queries are allowed on fail-safe.
  • Data can only be recovered through Snowflake support.

We can check the fail-safe information for any table using the following query. Basically, it will show the amount of memory a table is occupying in the fail-safe region. (ACCOUNTADMIN has the privilege to see this information) –

SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
--WHERE TABLE_NAME = 'sales';

--Convert the result into GBs
SELECT 	TABLE_NAME,
	ACTIVE_BYTES / (1024*1024*1024) AS STORAGE_USED,
	TIME_TRAVEL_BYTES / (1024*1024*1024) AS TIME_TRAVEL_STORAGE,
	FAILSAFE_BYTES / (1024*1024*1024) AS FAILSAFE_STORAGE
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
--WHERE TABLE_NAME = 'sales';

This is all from this blog. Hope you enjoyed the blog and it helped you!! Stay connected for more future blogs. Thank you!! 🙂

Written by 

Sarfaraz Hussain is a Big Data fan working as a Senior Software Consultant (Big Data) with an experience of 2+ years. He is working in technologies like Spark, Scala, Java, Hive & Sqoop and has completed his Master of Engineering with specialization in Big Data & Analytics. He loves to teach and is a huge fitness freak and loves to hit the gym when he's not coding.