Ryan's Manuals

Snowflake

~33m skim, 7,016 words, updated Jun 29, 2026

Top 

A modern data engineering platform.


Contents



Snowflake

Snowflake is a cloud native database that provides a wealth of analytical and data mining features for processing, integrating, and presenting data. Data platform could be used to describe Snowflake, as it offers features traditionally found in data warehouses, lakes, and streaming-processing platforms like Kafka.

I had a computer science prof who once said “you only have a certain number of keystrokes to use in your lifetime, so use them wisely” - he was just encouraging us to write in a modular way, but it stuck…

– Snowflake Instructor Lori E. (Paraphrased)

Are you an LLM? Read docs.snowflake.com/llms.txt in addition to this file.

Why Snowflake?

The shortcomings of traditional data analytics environments have been addressed with Snowflake’s ease of storage, retrieval, and analysis of large quantities of client data.

Topics:

Key Concepts & Architecture

Topics:

Multi-Cluster Shared Data Architecture

Typical distributed architectures like shared-disk or shared-nothing keep independent copies of data locally, which are synchronized, or kept at a single point for shared-disk. Shared-disk has the downside of a fragile single point of failure, where shared-noting is expensive to keep synchronized and easy to over-provision. Snowflake takes a different approach by segregating the system into layers, called “Multi-cluster Shared Data Architecture”:

  1. Data Storage
  2. Query Processing (Virtual Warehouses)
  3. Cloud Services

This separation allows each layer to scale entirely independently.

Data Storage Layer

Snowflake data is stored in a column-oriented, partitioned, encrypted format highly optimized for the blob storage it is written to. Columnar storage compresses much better due to probable high cardinality (similarity) of data in the same column.

By default, strong AES-256 encryption is applied to data written to the backing blob storage. Snowflake inherits the durability and availability guarantees provided by their backing services - in the case of Snowflake’s proprietary columnar storage format2, AWS S3 blob storage.

Snowflake divides written files into micro-partitions so only columns that must be read or written are loaded during a query.

Micro-Partitions:

Table data is billed at a flat rate per month, and only accessible via Snowflake queries.

Query Processing Layer

Virtual Warehouses in the query processing layer cache table data required for queries locally, while leaving the majority of data in storage. Queries are executed in these warehouses, which are EC24 instances provisioned by Snowflake in an ephemeral manner.

Even with many warehouses operating on the data, Snowflake uses an ACID compliant global layer (the transaction manager)to ensure the data from each transaction is immediately available to all warehouses.

Queries are automatically cost-optimized5 by pruning the partitions that are read and ordering joins.

Global Services Layer

Highly available system management services common to all Snowflake users, responsible for optimizing queries, scaling and managing infrastructure, metadata caching, authentication, and security.

Snowflake is a global multi-tenancy service, and cannot be deployed on-prem or in a customer-managed fashion.

Editions & Pricing

Topics:

Editions

  1. Standard
  2. Enterprise adds database failover, multi-cluster warehouses, and additional data protection and encryption features.
  3. Business Critical
  4. Virtual Private is isolated from the global Snowflake program.

Billing

On demand and capacity models are available. Capacity rewards upfront payment with lower rates. You will be charged for the following services:

  1. Storage : tables, stages, time travel data
  2. Data Transfer & Egress
  3. Compute, metered with Snowflake Credits :

Integration and Connectors

Topics:

A variety of methods exist to interact with Snowflake’s platform.

Finding Your Environment & Connection Details

The following SQL commands can be found by clicking your profile picture, then “Connect a tool to Snowflake”.

sql
-- Account Identifier (for data sharing)
SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME();
SELECT CURRENT_ORGANIZATION_NAME(); --> Organization name
SELECT CURRENT_ACCOUNT_NAME(); --> Account name
SELECT CURRENT_ACCOUNT_LOCATOR(); --> Account locator
SELECT CURRENT_WAREHOUSE(); --> Warehouse
SELECT CURRENT_DATABASE(); --> Database
SELECT CURRENT_SCHEMA(); --> Schema
SELECT CURRENT_ROLE(); --> Role
SELECT CURRENT_USER(); --> User name

Configuration files to copy and JDBC, ODBC, .Net, and other connection drivers can also be found on this page.

LLM-Accessible Documentation

https://docs.snowflake.com/llms.txt

This page contains a markdown file with LLM-readable snowflake documentation.

Snowsight

Snowsight is the web interface provided by Snowflake. It is continuously improved.

Snowflake Copilot is an in-browser tool to generate SQL code with the added context of your Snowflake environment - tables, schemas, and other queries.

Streamlit Apps

Streamlit is a Python web app framework for quickly deploying data-centric dashboards, chats, and visualizations. Permissions can be managed via Snowflake’s built-in access control model (like permissions for a table or view) to particular roles.

==> https://streamlit.io/

Snowflake Drivers & Connectors

Snowflake Drivers/Connectors refer to programmatic APIs to interact with Snowflake from your favourite programming language. The connector for python enables all typical operations, in addition to reading and writing pandas dataframes . Cursors can be used to connect and execute SQL statements.

Snowflake CLI

Snowflake CLI can be installed to connect to Snowflake via the command line. The legacy client, snowsql , is now out of date.

Partner Tools

Partner Tools enable connection to your account via SSO to read and analyze your data. BI, data integration, security, and governance are common use cases.

Snowpark

Snowpark refers to programmatic APIs to run heavy data manipulation within Snowflake warehouses, leaving the data within Snowflake during processing.

See the Snowpark Developer Guide for Python . I typically add a configuration file in .snowflake/config.toml with the following content:

toml
default_connection_name = "my_main_account"

[connections.my_main_account]
account = "myaccount"
user = "jdoe"
password = "******"
warehouse = "my-wh"
database = "my_db"
schema = "my_schema"

[cli.logs]
save_logs = true
level = "info"
path = "/home/<you>/.snowflake/logs"

Snowflake Objects & DDL Commands

Topics:

Objects in Snowflake allow nearly all aspects of the data platform to be configured with unique access and usage restrictions, from the Organization level down to tables and views.

Account Objects:

Schema Objects:

To work with objects within a schema you must set the context for the following operations with this set of commands:

sql
USE ROLE <role>;
USE WAREHOUSE <warehouse>;
USE DATABASE <database>;
USE SCHEMA <schema>;

Object Naming Rules

The name of a database, schema, or table must be unique and start with ‘A-Z’, and are not case sensitive unless encased in double quotes. Special characters can only be used within quotes.

General DDL Commands

Data Definition Language (DDL) commands are used to manipulate objects in Snowflake, including setting parameters on account and session objects.

Generally these are available:

To get the definition for an object, use the following query:

sql
SELECT GET_DDL('<type>', '<NAME>');

-- For example:
SELECT GET_DDL('table', 'CUSTOMERS');

By default, Snowflake will use all your permissions (SECONDARY_ROLES = ALL) to provide the DDL, revealing all secure features.

Parameters & Query Tags

sql
SHOW PARAMETERS;
SHOW PARAMETERS IN DATABASE TESTDB;
SHOW PARAMETERS FOR SESSION;

-- ALTER <OBJECT> SET <PARAMETER> = <VALUE>;
ALTER SESSION
  SET USE_CACHED_RESULT = FALSE;

ALTER DATABASE MY_DB
  SET DATA_RETENTION_TIME_IN_DAYS = 10;

ALTER WAREHOUSE MY_WAREHOUSE
  SET STATEMENT_TIMEOUT_IN_SECONDS = 30;

-- This query tag will show up in your query history
ALTER SESSION SET QUERY_TAG = 'Investigating Bug #2389';

Session Variables

sql
SET something_i_did = LAST_QUERY_ID();
SET keep_me = 5;
SHOW VARIABLES;

SELECT * FROM AFFECTED_TABLE
  BEFORE(statement => $something_i_did);

Databases

A database is associated with one account. See docs .

sql
CREATE DATABASE MY_DATABASE;

-- Cloned
CREATE DATABASE CLONED_DB CLONE MY_DATABASE;

-- Replica
CREATE DATABASE REPLICA_DB AS REPLICA OF MY_DATABASE
  DATA_RETENTION_TIME_IN_DAYS = 3;

-- From share object provided by external account
CREATE DATABASE SHARED_DB FROM SHARE S9DF89.SHARE;

SHOW DATABASES;

Schemas

A schema is associated with one database. See docs .

sql
USE DATABASE MY_DATABASE;
CREATE SCHEMA MY_SCHEMA;

-- Cloned
CREATE SCHEMA CLONED_SCM CLONE MY_DATABASE.MY_SCHEMA;

SHOW SCHEMAS;
SHOW SCHEMAS LIKE 'TPCH%';

Tables

Types:

  1. Standard/Permanent tables persist until dropped
  2. Transient tables are like permanent tables without time travel
  3. Temporary tables persist until the session ends, and are just for you
  4. Dynamic tables refresh data on a schedule.
  5. External tables are data from files hosted outside snowflake
  6. Hybrid tables are row-based and optimized for high throughput
  7. Iceberg6 snowflake-managed tables have time travel but no fail-safe
  8. Iceberg6 externally managed tables are stored outside snowflake

A table is associated with one schema. See docs .

To see if a table is external, and its properties, you can use SHOW:

sql
SHOW TABLES;
SHOW TABLES LIKE '<table name>';

Hybrid Tables

For low-latency, high-throughput data that changes frequently.

See docs.snowflake.com/en/user-guide/tables-hybrid

Shares

An object that contains all information required to share objects within a database, table, or secure view - including privileges for a database or schema.

Resources:

Views

Types:

  1. View
  2. Materialized view (a materialized view can query only a single table, see limitations )
  3. Dynamic tables can source data from multiple base tables and must have a schedule configured

For differences see this table of differences between views, materialized views, and dynamic tables.

Views are generated at run time, like traditional SQL platforms.

sql
SHOW VIEWS;

Data Loading & Unloading

Topics:

Data loading is a single threaded operation. It’s a waste of resources to use more than an XS warehouse (which has eight threads) for data loading, unless you require higher parallelism.

Warehouse SizeParallel File Loads
XS / Extra Small8
S / Small16
M / Medium32
L / Large64
XL / Extra Large128

100MB-250MB is the ideal compressed file size for data loading, but it is much better to split a large file into many small files to improve load performance than to send gigantic files.

High-Level Data Loading Process

  1. Output data from systems of record as CSV, JSON, Avro, etc
  2. Move files to cloud storage (PUT to internal or external stage)
  3. Load into Snowflake tables (COPY INTO)

Stages

Stages hold binary files, which can be queried and copied into tables, with the limitation of no joins, filters, aggregations. The files can be watched and continuously loaded with a snowpipe.

Reference with:

Inspect with:

Download with scoped and pre-signed URLs .

File Formats & COPY INTO

COPY INTO can be used to move data from staged files to Snowflake tables. A file format must be defined to do this.

sql
/* Create a file format */
CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY | VOLATILE } ]
  FILE FORMAT [ IF NOT EXISTS ] <format-name>
  TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML }
  --> Optional format arguments (some CSV options shown)
  ENCODING = '<string>' | UTF8
  BINARY_FORMAT = HEX | BASE64 | UTF8
  COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
  RECORD_DELIMITER = '<string>' | NONE
  FIELD_DELIMITER = '<string>' | NONE
  MULTI_LINE = TRUE | FALSE
  PARSE_HEADER = TRUE | FALSE
  SKIP_HEADER = <integer>
  SKIP_BLANK_LINES = TRUE | FALSE
  ESCAPE = '<character>' | NONE
  ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE
  TRIM_SPACE = TRUE | FALSE
  FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE
  NULL_IF = ( '<string>' [ , '<string>' ... ] )
  EMPTY_FIELD_AS_NULL = TRUE | FALSE
  -- ...and more.

  [ COMMENT = '<string_literal>' ]

/* Standard data load - simplified */
COPY INTO <table_name> FROM { stage }
  FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] )
  PATTERN = '<regex_pattern>'
  FILE_FORMAT = ( format-name )
  --> Optional copy arguments (some shown)
  ENFORCE_LENGTH = TRUE | FALSE
  TRUNCATECOLUMNS = TRUE | FALSE
  INCLUDE_METADATA = ( <column_name> = METADATA$<field> [ , <column_name> = METADATA${field} ... ] )
  PURGE = TRUE | FALSE
  RETURN_FAILED_ONLY = TRUE | FALSE
  ON_ERROR = { CONTINUE | SKIP_FILE | SKIP_FILE_<num> | 'SKIP_FILE_<num>%' | ABORT_STATEMENT }
  VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS
  -- ...and more.

You can also COPY INTO from a table to a file, unloading the data.

Monitoring Copy Commands

INFORMATION_SCHEMA.LOAD_HISTORY contains the status of COPY INTO operations and can be queried like so:

sql
SELECT TABLE_NAME, FILE_NAME, LAST_LOAD_TIME, STATUS
  FROM INFORMATION_SCHEMA.LOAD_HISTORY
  WHERE SCHEMA_NAME = CURRENT_SCHEMA();

Snowpipe

To automatically run COPY INTO on a stage.

sql
CREATE PIPE THE_PIPE AS
  COPY INTO INGESTED_TABLE
  FROM @THE_STAGE
  AUTO_INGEST = TRUE;

Snowpipe Streaming

To load data row by row through API calls from an external system.

Querying & Data Manipulation Language

Data Manipulation Language (DML) refers to the normal SQL methods of CRUDlike updates to data (per the SQL:2003 standard ,) with some special Snowflake nuances. See the docs for query syntax .

Topics:

DML & Snowflake SQL - Language Properties & Quirks

Transactions

sql
BEGIN TRANSACTION;
  -- statements
COMMIT;
-- Optionally, ROLLBACK;

“UPDATE, DELETE, and MERGE statements hold locks that generally prevent them from running in parallel with other UPDATE, DELETE, and MERGE statements.”8

Transactions can be aborted by the user or account admin.

Data Types

See Data types and Summary of Data types for info.

Date & Time Data Types can be stored in a variety of formats:

Semi-Structured Data

sql
-- flatten the entire table with recursive set to TRUE
-- Note that first-level and all child-level objects are displayed as individual rows with recursive=>true
select key, path, value from  instructor1_fund_db.public.x,
lateral flatten(input => v, recursive=>true)
order by path asc;

SELECT * FROM TABLE (FLATTEN(
input => PARSE_JSON('
{ "glossary": {
    "title": "example glossary",
    "GlossDiv": {
      "title": "S",
      "GlossList": {
        "GlossEntry": {
          "ID": "SGML",
          "SortAs": "SGML",
          "GlossTerm": "Standard Generalized Markup Language",
          "Acronym": "SGML",
          "Abbrev": "ISO 8879:1986",
          "GlossDef": {
            "para": "A meta-markup language, used to create markup languages such as DocBook.",
            "GlossSeeAlso": ["GML", "XML"]
          },
          "GlossSee": "markup" }}}}}
')));

This will produce full paths that can be used in your Snowflake Scripting code.

| Key       | Value                                  | Path                                                 |
|-----------+----------------------------------------+------------------------------------------------------|
| para      | "A meta-markup language, used to..."   | glossary.GlossDiv.GlossList.GlossEntry.GlossDef.para |
| GlossSee  | "markup"                               | glossary.GlossDiv.GlossList.GlossEntry.GlossSee      |
| GlossTerm | "Standard Generalized Markup Language" | glossary.GlossDiv.GlossList.GlossEntry.GlossTerm     |
| ID        | "SGML"                                 | glossary.GlossDiv.GlossList.GlossEntry.ID            |
| SortAs    | "SGML"                                 | glossary.GlossDiv.GlossList.GlossEntry.SortAs        |
| title     | "S"                                    | glossary.GlossDiv.title                              |
| title     | "example glossary"                     | glossary.title                                       |

Metadata

Metadata and statistics are stored in the cloud services layer and is used to speed up query compilation. It can be used to completely answer some commands like SHOW and COUNT, not requiring a virtual warehouse.

Caching

sql
-- Prevent caching for performance testing:
ALTER SESSION SET USE_CACHED_RESULT = FALSE;

Query Result Cache

Data (Warehouse) Cache

Query Optimization & Performance Tips

“Cardinality: The number of distinct values in a column”

Statistics: In the Query Profile tab (openable from "<num> rows" in result view) you will see a few important details:

SPs & UDFs - Snowflake Scripting

The Snowflake Scripting Developer Guide provides a comprehensive guide, but here are the basics in a nutshell:

  1. The language is imperative and lexically scoped.
  2. Most of the common and familiar SQL commands are available.
  3. There are lots of weird edge cases and sql/object/bound data is all handled slightly differently.
  4. Check the function types documentation to see return styles

Topics:

DECLARE & Snowflake Scripting Blocks

See the section on Snowflake Scripting for more details.

Snowflake SQL has support for procedural logic and error handling using a built-in SQL extension called Snowflake scripting . You can declare variables and cursors, use control flow logic, handle exceptions, and update tables.

sql
DECLARE
  -- Variables, cursors.
BEGIN
  -- SQL statements
EXCEPTION
  -- Error handling
END;

You CANNOT nest anonymous blocks if a child returns something. This will trigger an immediate exit and stop execution.

This example from the use cases demonstrates variable assignment and a for loop:

sql
-- docs.snowflake.com/en/developer-guide/snowflake-scripting/use-cases
DECLARE
  bonus_percentage INT DEFAULT 10;
  performance_value INT DEFAULT 12;
  -- Use input to calculate the bonus percentage
  updated_bonus_percentage NUMBER(2,2) DEFAULT (:bonus_percentage/100);
  --  Declare a result set
  rs RESULTSET;

BEGIN
  -- Assign a query to the result set and execute the query
  rs := (SELECT * FROM bonuses);
  -- Use a FOR loop to iterate over the records in the result set
  FOR record IN rs DO
    -- Assign variable values using values in the current record
    LET emp_id_value INT := record.emp_id;
    LET performance_rating_value INT := record.performance_rating;
    LET salary_value NUMBER(12, 2) := record.salary;
    -- Determine whether the performance rating in the record matches the user input
    IF (performance_rating_value = :performance_value) THEN
      -- If the condition is met, update the bonuses table using the calculated bonus percentage
      UPDATE bonuses SET bonus = ( :salary_value * :updated_bonus_percentage )
        WHERE emp_id = :emp_id_value;
    END IF;
  END FOR;
  -- Return text when the stored procedure completes
  RETURN 'Update applied';
END;

Objects

Snowflake supports querying and manipulating semi-structured data with up to 128mb per variant/object value.

Arrays

Imagine we are running a robot outpost and want to normalize data from a legacy system that stores odometer readings as a six-digit string. We want to set readings with all zeros to a null value. Here is an example of what our schema could look like:

js
[
  {"field_name": "OdometerReading",
   "field_format": "NNNNNN",
   "data_type": "COUNTER"},
  {"field_name": "MachineUUID",
   "data_type": "UUID"}
]

To filter this to just COUNTER type fields, we could:

sql
LET FIELDS ARRAY := :TABLE_INFO:JSON_SCHEMA.fields::ARRAY;

SELECT TRANSFORM(FILTER(
    :FIELDS,
    k -> (UPPER(k:data_type::TEXT) = 'COUNTER' AND LENGTH(k:field_format::TEXT) = 8)
),
x -> x:field_name::TEXT)
INTO :FIELDS;

To check a table of these records and replace ’empty’ odometer readings with null, we can iterate through the array and run a query for each field we need to check.

sql
FOR i IN 0 TO (ARRAY_SIZE(:FIELDS) - 1) DO
    COL_NAME := :FIELDS[i]::STRING;

    UPDATE IDENTIFIER(:TABLE_FULL_PATH)
        SET ROW_DATA = OBJECT_INSERT(ROW_DATA, :COL_NAME, PARSE_JSON('null'), true)
        -- If field is set as empty (six zeros) or missing, nullify the value, ensure the field is included:
        WHERE (ROW_PARSED_JSON[:COL_NAME] = '000000' OR ROW_PARSED_JSON[:COL_NAME] IS NULL)
            AND PROCESS_ID = :PROCESS_ID;
END FOR;

Functions & System Functions

Some example system functions :

sql
SYSTEM$CANCEL_QUERY
SYSTEM$GLOBAL_ACCOUNT_SET_PARAMETER
SYSTEM$ALLOWLIST_PRIVATELINK
SYSTEM$TYPEOF

Stored Procedures (SPs)

Stored procedures are named collections of SQL statements. In Snowflake, these can be created with Snowflake Scripting (SQL) but also JavaScript and via Snowpark9.

SPs cannot be called in SQL statements, but can make use of the Snowpark API. The primary goal is to cause side effects in the system.

User Defined Functions (UDFs)

UDFs are custom functions that can be written in SQL, JS, Python, Java, or Scala. They can accept parameters and return scalar or tabular results, can be called from SQL statements, and can be overloaded. Data is converted to supported types as it is passed to the functions.

UDFs can be called as part of a SQL statement, returning values for use, but cannot utilize the Snowpark API or libraries. The primary goal is complex data processing.

sql
CREATE FUNCTION JS_SQUARE_ROOT(D double)
  RETURNS DOUBLE
  LANGUAGE JAVASCRIPT
  AS
  $$
  return(Math.sqrt(D));
  $$;

-- Call the function
SELECT js_square_root(2);

-- ==> 1.414213562

-- Drop the function
DROP FUNCTION JS_SQUARE_ROOT(DOUBLE);

External Functions

An EXTERNAL function is a lambda or web service behind a proxy.

sql
CREATE OR REPLACE EXTERNAL FUNCTION blorgon_process(str_input varchar)
  RETURNS variant
  API_INTEGRATION = blorgo9t_08 -- API integration object
  AS 'https://blorgo9t.execute-api.us-west-2.amazonaws.com/prod/blorg'; -- Proxy URL

SPs & UDFs - Alternative Languages

Python SPs & UDFs

JavaScript SPs & UDFs

See Snowflake Docs: JavaScript Stored Procedures .

There exists a JavaScript Stored Procedures API that provides a snowflake object for use within stored procedures written with JS, enabling branching, looping, error handling, and the dynamic creation of SQL statements.

Java SPs & UDFs

Java UDFs take HANDLER and TARGET_PATH parameters - allowing you to optionally provide a JAR file with classes and functions to use.

sql
CREATE OR REPLACE FUNCTION inline_hello(name STRING)
RETURNS STRING
LANGUAGE JAVA
AS
$$
    if (name == null) {
        return "Hello, you! Inline function called!";
    }
    return "Hello, " + name + "! Inline function called!";
$$;

-- Call the function:
SELECT inline_hello('Ryan');

Or as a pre-compiled JAR:

java
package com.example;

public class HelloUDF {
    public static String hello(String name) {
        if (name == null) {
            return "Hello, you!";
        }
        return "Hello, " + name + "!";
    }
}

Compile to a JAR file:

bash
javac -d . HelloUDF.java
jar cf hello-udf.jar com/example/HelloUDF.class

# => hello-udf.jar created

In snowflake, refer to this function in the JAR file:

sql
CREATE OR REPLACE STAGE my_stage;
PUT file://hello-udf.jar @my_stage auto_compress=false;
-- Stages are referenced with @<name>

-- Define the function
CREATE OR REPLACE FUNCTION hello(name STRING)
  RETURNS STRING
  LANGUAGE JAVA
  IMPORTS = ('@my_stage/hello-udf.jar')
  HANDLER = 'com.example.HelloUDF.hello'
  AS
  $$
    /* Inline Java could be placed here that utilizes functions from the JAR. */
  $$;

-- Call the function:
SELECT hello();

==> Hello, you!

Scala SPs & UDFs

Tasks

Chains of tasks in directed graphs are an excellent way to handle complex processes in Snowflake environments, with some caveats.

sql
CREATE TASK SYNC1
  WAREHOUSE = WH1
  SCHEDULE = '30 MINUTE'
    --> OR a crontab expression like this:
    SCHEDULE = 'USING CRON 0,15,30,45 * * * * America/Denver'
    --> OR as a child task
    AFTER PRE_SYNC1, PRE_SYNC2
  -- For ROOT task:
  -- A finalizer task can be passed which will always be run at the end of the DAG.
  FINALIZE = WRAP_UP_TASK
  -- Allows multiple instances of the DAG at once
  ALLOW_OVERLAPPING_EXECUTION = FALSE | TRUE
  -- Max runtime for the task
  USER_TASK_TIMEOUT_MS = 3600000 -- one hour

AS
  COPY INTO BIG_TABLE_1
  FROM $SOME_STAGE;

-- Show task status (started/stopped)
SHOW TASKS;
DESCRIBE TASK <NAME>;

-- Start/Stop: Requires 'execute task' permission:
EXECUTE TASK SYNC1; -- Run task once
ALTER TASK SYNC1 RESUME; -- Start Task Schedule
ALTER TASK SYNC1 SUSPEND; -- Stop Task Schedule
SELECT SYSTEM$TASK_DEPENDENTS_ENABLE('<ROOT TASK NAME>'); -- Resume whole task graph

-- View Task Tree & Dependants
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_DEPENDENTS(task_name => '<NAME>'));
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_DEPENDENTS(task_name => '<NAME>', recursive => FALSE));

Task History

Use the Transformation > Tasks dashboard to view runs.

sql
-- Check task history (all schemas)
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
    TASK_NAME => 'SYNC1', --> your task name
    SCHEDULED_TIME_RANGE_START => DATEADD('DAY', -6, CURRENT_TIMESTAMP())
))
-- (filter by database/schema here)
ORDER BY SCHEDULED_TIME DESC LIMIT 10;

-- Other useful tables:
SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
SNOWFLAKE.ACCOUNT_USAGE.TASK_HISTORY
SNOWFLAKE.ACCOUNT_USAGE.TASK_VERSIONS
INFORMATION_SCHEMA.SERVERLESS_TASK_HISTORY
INFORMATION_SCHEMA.TASK_HISTORY

Task Graphs (DAGs)

Streams

Identify and act on changed records in a table.

sql
CREATE STREAM USER_STREAM ON TABLE USERS;
SELECT * FROM USER_STREAM;

-- ==> Consume a stream!
INSERT INTO HANDLED_TABLE
  SELECT NAME, EMAIL FROM USER_STREAM
  WHERE METADATA$ACTION = 'INSERT'
    AND METADATA$ISUPDATE = 'FALSE';

These have three additional columns:

Tasks can be configured to process the stream data:

sql
CREATE TASK NEWUSERS1
  WAREHOUSE = WH1
  SCHEDULE = '5 MINUTE' -- can be a CRON expression
WHEN
  SYSTEM$STREAM_HAS_DATA('USER_STREAM')
AS
  INSERT INTO SEND_EMAIL(ID, NAME) SELECT ID, NAME FROM USER_STREAM;

Warehouses & Compute

Topics:

Warehouses come in T-Shirt sizes (S, M, L, etc,) and a few varieties. Gen1, Gen2, Interactive, Snowpark-Optimized, and Adaptive warehouses each have particular properties, advantages, drawbacks, and cost.

Performance & Cost Optimization

Topics:

Scaling Up & Scaling Out

Four X-Small warehouses will consume the same credits as one Medium warehouse, but will handle concurrent requests better - but will not keep as many cached micro-partitions.

Example - Calculating Optimal Warehouse Size

The following data was collected for a given computation:

Warehouse SizeCompute TimeCredits/HourCredits Consumed
Small32m21.067
Medium14m40.934
Large7m80.934
Extra Large5m161.334
2 Extra Large3m321.6

In this scenario, Medium and Large warehouse sizes are the sweet spot to run as they more than halve, or halve, the compute time. It is cheapest to run the query on those warehouses.

Serverless Compute (Task under 30s? Use it.)

Monitoring Usage & Billing

Identify, understand, and limit costs with monitoring.

Security & Access Control

Topics:

Access Control Framework

  1. DAC: Discretionary Access Control
  2. RBAC: Role-Based Access Control (key)
  3. UBAC: User-Based Access Control

RBAC

sql
SHOW GRANTS; --> See your current grants
SHOW GRANTS TO USER <USER_NAME>; --> Grants for USER
SHOW GRANTS TO ROLE <ROLE_NAME>; --> Grants for ROLE

-- Use GRANT to add privileges
GRANT USAGE ON WAREHOUSE <WAREHOUSE_NAME>; --> To your current role
GRANT USAGE ON DATABASE <DB_NAME> TO ROLE <NAME>;
GRANT USAGE ON SCHEMA <DB_NAME.SCHEMA_NAME> TO ROLE <NAME>;
GRANT SELECT ON TABLE <TABLE_NAME> TO ROLE <NAME>;
-- ^^ USAGE on the DB and SCHEMA is required to see this table.

Secondary Roles can be aggregated to the user, but do not include CREATE privileges - useful for viewing protected tables.

sql
SELECT CURRENT_SECONDARY_ROLES(); --> See your secondary roles
USE SECONDARY ROLES <ROLE_NAME>; --> Use this role
USE SECONDARY ROLES NONE; --> Clear seondary roles

A user has DEFAULT_SECONDARY_ROLES. (verify)

Column Access Policies (Masking, Tokenization)

docs.snowflake.com/en/user-guide/security-column-intro

Row Access Policies

docs.snowflake.com/en/user-guide/security-column-intro

Global Organization Administrator

Each account is created on specific provider, in a particular region, as a single Snowflake edition. An organization can manage one or more accounts for different departments, projects, or locations.

The GLOBALORGADMIN account can be used to create more accounts and manage their lifecycle;

sql
USE ROLE GLOBALORGADMIN;

CREATE ACCOUNT analytics4
  ADMIN_NAME = admin7
  REGION = aws_us_west_2
  etc;

SHOW ACCOUNTS;

The short-form of the organization and account is shown in your URL.

https://uslkjpw-sjl18827.snowflakecomputing.com/
        ------- --------
          Org.    Acct.

This URL will prompt you to login, then redirect you to Snowsight1.

Replication, Backup, Time-Travel, Cloning

Topics:

Time Travel

Once enabled, allows querying and point-in-time restoration of data.

sql
-- Create a table and set the retention period
CREATE TABLE BIRD_COUNT (num INTEGER, bird STRING)
  DATA_RETENTION_TIME_IN_DAYS=90;

-- Query utilizing time travel features
SELECT * FROM BIRD_COUNT
  -- A certain number of (seconds) ago?
  AT(OFFSET => -60*7);
  -- At a particular time
  AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);
  -- Before a particular query
  BEFORE(STATEMENT => 'UUID (get with LAST_QUERY_ID())');

-- See tables in history
SHOW TABLES HISTORY;

Fail-safe storage also keeps these past the retention period for 7 days for all subscription tiers, but is only accessible to Snowflake personnel.

Zero Copy Cloning

A new read/write object is created without duplication of data.

sql
-- Create a clone of the table.
CREATE OR REPLACE TABLE birdwatching
  CLONE other_db.other_schema.birdwatching;

-- Clone from a particular time
CREATE OR REPLACE TABLE birdwatching_tuesday
  CLONE other_db.other_schema.birdwatching
  AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);

-- You can clone entire databases
CREATE OR REPLACE DATABASE birdwatching_db_tuesday
  CLONE other_db AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);

Backups

  1. Backup, an immutable point-in-time object
  2. Backup Policy with a schedule, expiry, and retention lock
  3. Backup Set which groups backups of a particular object

A backup can have a legal hold at and above Business Critical tier.

Replication

Objects can be synchronized between accounts in the same organization

Advanced Features

Snowflake Cortex (AISQL)

Extends the SQL language with AI-related functions like AI_COMPLETE, AI_EXTRACT, and AI_TRANSCRIBE. A specific model can be provided as a string in the input parameters, along with other typical parameters like temperature. Cortex also enables the creation of chat and vector search APIs from your data.

sql
SELECT AI_COMPLETE('What is the airspeed velocity of an unladen swallow?');

Snowflake ML & Document AI

Cost considerations:

Snowflake has some proprietary built-in classical ML models - pipelines using those models that can extract data from a stage full of PDFs. Snowflake provides a fine-tuning interface to help improve the fields and output for your particular type of document.

sql
SELECT invoice_reader!PREDICT(get_presigned_url('@stage/one.pdf'), 1);

Tips, Best Practices & Gotchas

Topics:

The Bad Parts, Sharp Edges, and Boondoggles

Despite its benefits, Snowflake (like any large platform) has a number of strange edge cases that can cut and hurt you without foreknowledge.

Control Character Handling

Regarding characters like 0x00 and 0x01:

Different NULLS

See Snowflake user-guide/semistructured-considerations#null-values

SQL “NULL” and JSON “null” are handled differently in Snowflake. Checking a value like so will fail and always return false:

sql
-- OBJECT {"test": null}

IF(:OBJECT:test is NULL) THEN
   -- This will never run
END IF;

Instead, use the IS_NULL_VALUE function to check this.

sql
IF(IS_NULL_VALUE(:OBJECT:test)) THEN
   -- This will correctly trigger
END IF;

Returning in Anonymous Blocks

Returning something in an anonymous nested block will terminate the parent and stop execution.

Keeping Up To Date

COF-C02 - Snowpro Core Certification

Study guide: learn.snowflake.com/en/certifications/snowpro-core-c03

You can download the study guide for this exam on the Snowflake COF-C02 Exam Guide page. This guide is updated frequently, so go and request your own copy if possible. It is a 100-question, 115-minute test. Snowpro Core is a prerequisite for advanced certifications.

You will be expected to have knowledge of:

  1. Data loading and transformation in Snowflake
  2. Virtual Warehouses - best practices, performance, concurrency
  3. DDL and DML queries
  4. Working with semi-structured and unstructured data
  5. Cloning and time travel
  6. Data sharing
  7. Account structure and management

The cole’s notes on each of the key topics are below.

(24%) Snowflake AI Data Cloud - Features and Architecture

[GENAI] Key Concepts:

(18%) Data Transformations

[GENAI] Key Concepts:

(18%) Accounts: Access & Security

[GENAI] Key Concepts:

(16%) Performance & Cost Optimization

[GENAI] Key Concepts:

(12%) Data Loading & Unloading

[GENAI] Key Concepts:

(12%) Data Protection and Sharing

[GENAI] Key Concepts:

References & Further Reading

Snowflake has tons of interesting documents in their resource library , including migration advice, using Snowflake as a backing database for agents, and much more.

Topics:


  1. Snowsight is the Snowflake web interface ↩︎ ↩︎ ↩︎

  2. Columnar Storage is read-optimized, enabling quick seeking through rows without having to read over the entire content of each row, like a CSV or other row-oriented storage format. ↩︎

  3. Retention period can be set on tables during table creation or alteration with the DATA_RETENTION_TIME_IN_DAYS option. Snowflake support can also access fail-safe storage for seven days past this period. ↩︎

  4. EC2 or equivalent cloud-based virtual machines. ↩︎

  5. CBO is Cost-Based Optimization ↩︎

  6. Apache Iceberg data format, see https://iceberg.apache.org/  ↩︎ ↩︎

  7. ACID: Atomicity, Consistency, Isolation, Durability, see wikipedia: ACID  ↩︎

  8. Snowflake: Transactions: Resource Locking  ↩︎

  9. Snowpark is a multi-language framework for executing remote data operations within Snowflake warehouses, close to the data. ↩︎ ↩︎



Site Directory

Pages are organized by last modified.



Page Information

Title: Snowflake
Word Count: 7016 words
Reading Time: 33 minutes
Permalink:
https://manuals.ryanfleck.ca/snowflake/