Snowflake

~33m skim, 7,016 words, updated Jun 29, 2026
==> LLMs: Read this page as markdown

A modern data engineering platform.

Snowflake¶

Snowflake is a cloud native database that provides a wealth of analytical and data mining features for processing, integrating, and presenting data. Data platform could be used to describe Snowflake, as it offers features traditionally found in data warehouses, lakes, and streaming-processing platforms like Kafka.

I had a computer science prof who once said “you only have a certain number of keystrokes to use in your lifetime, so use them wisely” - he was just encouraging us to write in a modular way, but it stuck…
– Snowflake Instructor Lori E. (Paraphrased)

Are you an LLM? Read docs.snowflake.com/llms.txt in addition to this file.

Why Snowflake?¶

The shortcomings of traditional data analytics environments have been addressed with Snowflake’s ease of storage, retrieval, and analysis of large quantities of client data.

Topics:

Fully managed: no hardware or server provisioning required
Compute / storage decoupled
Auto-scaling, auto-suspend, etc.
Support for semi-structured data (JSON, VARIANT, etc.)
Zero-copy cloning, time travel, data sharing
Strong ecosystem and connector support

Key Concepts & Architecture¶

Topics:

Snowsight¹ UI (web interface)
- Worksheets: old UI, now Workspaces (UI: Projects => Workspaces).
- Workspaces can be synchronized with Git.
SnowSQL CLI (Snowflake command-line utility)
Notebooks (Snowflake Notebooks)
Architecture: hybrid of shared-disk and shared-nothing.
- Central storage + multiple MPP compute clusters.
Snowflake Documentation
Cloud Platforms: Runs on AWS, Azure, or GCP.
Snowflake Documentation
Micro-partitions (internal detail)
Compute / Storage separation
Zero-copy cloning, Time Travel, Fail-safe

Multi-Cluster Shared Data Architecture¶

Typical distributed architectures like shared-disk or shared-nothing keep independent copies of data locally, which are synchronized, or kept at a single point for shared-disk. Shared-disk has the downside of a fragile single point of failure, where shared-noting is expensive to keep synchronized and easy to over-provision. Snowflake takes a different approach by segregating the system into layers, called “Multi-cluster Shared Data Architecture”:

Data Storage
Query Processing (Virtual Warehouses)
Cloud Services

This separation allows each layer to scale entirely independently.

Data Storage Layer¶

Snowflake data is stored in a column-oriented, partitioned, encrypted format highly optimized for the blob storage it is written to. Columnar storage compresses much better due to probable high cardinality (similarity) of data in the same column.

By default, strong AES-256 encryption is applied to data written to the backing blob storage. Snowflake inherits the durability and availability guarantees provided by their backing services - in the case of Snowflake’s proprietary columnar storage format², AWS S3 blob storage.

Snowflake divides written files into micro-partitions so only columns that must be read or written are loaded during a query.

Micro-Partitions:

Compressed and encrypted
Immutable when written to blob storage
New partitions are created when data is changed
- Old micro-partitions are kept for a specified time³

Table data is billed at a flat rate per month, and only accessible via Snowflake queries.

Query Processing Layer¶

Virtual Warehouses in the query processing layer cache table data required for queries locally, while leaving the majority of data in storage. Queries are executed in these warehouses, which are EC2⁴ instances provisioned by Snowflake in an ephemeral manner.

Small to 6XL warehouse sizes (t-shirt sizes) are available.
Scale up for more complex queries
Scale out (more warehouses in cluster) to improve parallel processing

Even with many warehouses operating on the data, Snowflake uses an ACID compliant global layer (the transaction manager)to ensure the data from each transaction is immediately available to all warehouses.

Queries are automatically cost-optimized⁵ by pruning the partitions that are read and ordering joins.

Global Services Layer¶

Highly available system management services common to all Snowflake users, responsible for optimizing queries, scaling and managing infrastructure, metadata caching, authentication, and security.

Snowflake is a global multi-tenancy service, and cannot be deployed on-prem or in a customer-managed fashion.

Editions & Pricing¶

Topics:

Editions: Standard, Enterprise, Business Critical, Virtual Private Snowflake (VPS)
Pricing model: credits for compute + storage + usage
How edition differences affect features (e.g. multi-cluster, data protection)

Editions¶

Standard
Enterprise adds database failover, multi-cluster warehouses, and additional data protection and encryption features.
Business Critical
Virtual Private is isolated from the global Snowflake program.

Billing¶

On demand and capacity models are available. Capacity rewards upfront payment with lower rates. You will be charged for the following services:

Storage : tables, stages, time travel data
- Charged at a flat rate per TB
Data Transfer & Egress
- Transfer charges between regions (COPY INTO)
- Replicating data between regions
Compute, metered with Snowflake Credits :
- Virtual Warehouse Services
  - Billed per second, minimum 60 seconds, based on size.
- Cloud Services
  - Metadata operations that don’t require a warehouse
  - Burns 4.4 credits per compute-hour
  - Cloud services adjustment : only billed if all services exceed 10% of the daily virtual warehouse credits used.
- Serverless Services
  - Each has its own rate

Integration and Connectors¶

Topics:

Snowsight, Snowpark, Drivers, CLI
JDBC, ODBC, Python connector
Spark / Snowflake connector
Kafka Connector
BI tools (Tableau, Power BI, etc.)
Data marketplace & data sharing

A variety of methods exist to interact with Snowflake’s platform.

Finding Your Environment & Connection Details¶

The following SQL commands can be found by clicking your profile picture, then “Connect a tool to Snowflake”.

sql-- Account Identifier (for data sharing)
SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME();
SELECT CURRENT_ORGANIZATION_NAME(); --> Organization name
SELECT CURRENT_ACCOUNT_NAME(); --> Account name
SELECT CURRENT_ACCOUNT_LOCATOR(); --> Account locator
SELECT CURRENT_WAREHOUSE(); --> Warehouse
SELECT CURRENT_DATABASE(); --> Database
SELECT CURRENT_SCHEMA(); --> Schema
SELECT CURRENT_ROLE(); --> Role
SELECT CURRENT_USER(); --> User name

Configuration files to copy and JDBC, ODBC, .Net, and other connection drivers can also be found on this page.

LLM-Accessible Documentation¶

https://docs.snowflake.com/llms.txt

This page contains a markdown file with LLM-readable snowflake documentation.

Snowsight¶

Snowsight is the web interface provided by Snowflake. It is continuously improved.

Snowflake Copilot is an in-browser tool to generate SQL code with the added context of your Snowflake environment - tables, schemas, and other queries.

Streamlit Apps¶

Streamlit is a Python web app framework for quickly deploying data-centric dashboards, chats, and visualizations. Permissions can be managed via Snowflake’s built-in access control model (like permissions for a table or view) to particular roles.

==> https://streamlit.io/

Snowflake Drivers & Connectors¶

Snowflake Drivers/Connectors refer to programmatic APIs to interact with Snowflake from your favourite programming language. The connector for python enables all typical operations, in addition to reading and writing pandas dataframes . Cursors can be used to connect and execute SQL statements.

Snowflake CLI¶

Snowflake CLI can be installed to connect to Snowflake via the command line. The legacy client, snowsql , is now out of date.

Partner Tools¶

Partner Tools enable connection to your account via SSO to read and analyze your data. BI, data integration, security, and governance are common use cases.

Snowpark¶

Snowpark refers to programmatic APIs to run heavy data manipulation within Snowflake warehouses, leaving the data within Snowflake during processing.

See the Snowpark Developer Guide for Python . I typically add a configuration file in .snowflake/config.toml with the following content:

tomldefault_connection_name = "my_main_account"

[connections.my_main_account]
account = "myaccount"
user = "jdoe"
password = "******"
warehouse = "my-wh"
database = "my_db"
schema = "my_schema"

[cli.logs]
save_logs = true
level = "info"
path = "/home/<you>/.snowflake/logs"

Snowflake Objects & DDL Commands¶

Topics:

Databases, Schemas, Tables, Views
External Tables, Streams, Tasks
Sequences, Stages
Examples of CREATE / ALTER / DROP
Cloning & object versioning
DDL = data definition language

Objects in Snowflake allow nearly all aspects of the data platform to be configured with unique access and usage restrictions, from the Organization level down to tables and views.

Account Objects:

Network Policy
User
Role
Database => Schema
Warehouse
Share
Resource Monitor

Schema Objects:

Stage
Pipe
Procedure
Function
Table
View
Task
Stream

To work with objects within a schema you must set the context for the following operations with this set of commands:

sqlUSE ROLE <role>;
USE WAREHOUSE <warehouse>;
USE DATABASE <database>;
USE SCHEMA <schema>;

Object Naming Rules¶

The name of a database, schema, or table must be unique and start with ‘A-Z’, and are not case sensitive unless encased in double quotes. Special characters can only be used within quotes.

General DDL Commands¶

Data Definition Language (DDL) commands are used to manipulate objects in Snowflake, including setting parameters on account and session objects.

Generally these are available:

USE <object>
CREATE <object>
ALTER <object>
CREATE OR ALTER <object>
DROP <object>
SHOW <object-type>
DESCRIBE <object-type> <object>
COMMENT

To get the definition for an object, use the following query:

sqlSELECT GET_DDL('<type>', '<NAME>');

-- For example:
SELECT GET_DDL('table', 'CUSTOMERS');

By default, Snowflake will use all your permissions (SECONDARY_ROLES = ALL) to provide the DDL, revealing all secure features.

Parameters & Query Tags¶

Modify the behavior of Snowflake objects
Account, session, and object level
Objects use the nearest parameter, on itself or a parent
All parameters can be set at the account level by ACCOUNTADMIN
See the parameter docs

sqlSHOW PARAMETERS;
SHOW PARAMETERS IN DATABASE TESTDB;
SHOW PARAMETERS FOR SESSION;

-- ALTER <OBJECT> SET <PARAMETER> = <VALUE>;
ALTER SESSION
  SET USE_CACHED_RESULT = FALSE;

ALTER DATABASE MY_DB
  SET DATA_RETENTION_TIME_IN_DAYS = 10;

ALTER WAREHOUSE MY_WAREHOUSE
  SET STATEMENT_TIMEOUT_IN_SECONDS = 30;

-- This query tag will show up in your query history
ALTER SESSION SET QUERY_TAG = 'Investigating Bug #2389';

Session Variables¶

sqlSET something_i_did = LAST_QUERY_ID();
SET keep_me = 5;
SHOW VARIABLES;

SELECT * FROM AFFECTED_TABLE
  BEFORE(statement => $something_i_did);

Databases¶

A database is associated with one account. See docs .

sqlCREATE DATABASE MY_DATABASE;

-- Cloned
CREATE DATABASE CLONED_DB CLONE MY_DATABASE;

-- Replica
CREATE DATABASE REPLICA_DB AS REPLICA OF MY_DATABASE
  DATA_RETENTION_TIME_IN_DAYS = 3;

-- From share object provided by external account
CREATE DATABASE SHARED_DB FROM SHARE S9DF89.SHARE;

SHOW DATABASES;

Schemas¶

A schema is associated with one database. See docs .

sqlUSE DATABASE MY_DATABASE;
CREATE SCHEMA MY_SCHEMA;

-- Cloned
CREATE SCHEMA CLONED_SCM CLONE MY_DATABASE.MY_SCHEMA;

SHOW SCHEMAS;
SHOW SCHEMAS LIKE 'TPCH%';

Tables¶

Types:

Standard/Permanent tables persist until dropped
Transient tables are like permanent tables without time travel
Temporary tables persist until the session ends, and are just for you
Dynamic tables refresh data on a schedule.
External tables are data from files hosted outside snowflake
Hybrid tables are row-based and optimized for high throughput
Iceberg⁶ snowflake-managed tables have time travel but no fail-safe
Iceberg⁶ externally managed tables are stored outside snowflake

A table is associated with one schema. See docs .

Tables are permanent by default, but can also be:
- Temporary (for just the current Snowsight¹ session)
- Transient (no failsafes)
- External (file-based tables stored outside Snowflake)
  - You can specify the path and type for these.
Standard accounts can set time travel on permanent tables to a day, and enterprise accounts can be set up to 90 days, which enables un-dropping the table and restoring from a particular timestamp.
Standard tables do not enforce foreign keys or uniqueness

To see if a table is external, and its properties, you can use SHOW:

sqlSHOW TABLES;
SHOW TABLES LIKE '<table name>';

Hybrid Tables¶

For low-latency, high-throughput data that changes frequently.

Require a primary key
Will enforce foreign key, unique, and not null
Traditional row-based storage
High performance point operations (lookups, inserts)
Larger storage footprint than standard tables due to less efficient compression

See docs.snowflake.com/en/user-guide/tables-hybrid

Shares¶

An object that contains all information required to share objects within a database, table, or secure view - including privileges for a database or schema.

No time travel for consumers, only the current data
Read only and no re-sharing for consumers
Streams can be created on shares with change tracking permission
Provider pays for storage and data transfer
Consumer pays for compute to query data

Resources:

Views¶

Types:

View
Materialized view (a materialized view can query only a single table, see limitations )
Dynamic tables can source data from multiple base tables and must have a schedule configured

For differences see this table of differences between views, materialized views, and dynamic tables.

Views are generated at run time, like traditional SQL platforms.

Views don’t contribute to storage cost.
Can be used to reveal a subset of table data.
Materialized views are periodically refreshed and stores the results of the query independently from the source table.
Secure views are only visible to authorized users.
Querying views that rely on dropped tables will throw an error.
Views and MVs can be marked SECURE.
MVs and DTs store data in micro-partitions, consuming storage.

sqlSHOW VIEWS;

Data Loading & Unloading¶

Topics:

Stages: internal (user, table, named) vs external cloud stages
Snowflake Documentation
COPY INTO / bulk load
Snowpipe (continuous ingestion)
Snowflake Documentation
Snowpipe Streaming
Loading semi-structured data and schema inference

Data loading is a single threaded operation. It’s a waste of resources to use more than an XS warehouse (which has eight threads) for data loading, unless you require higher parallelism.

Warehouse Size	Parallel File Loads
XS / Extra Small	8
S / Small	16
M / Medium	32
L / Large	64
XL / Extra Large	128

100MB-250MB is the ideal compressed file size for data loading, but it is much better to split a large file into many small files to improve load performance than to send gigantic files.

High-Level Data Loading Process¶

Output data from systems of record as CSV, JSON, Avro, etc
Move files to cloud storage (PUT to internal or external stage)
Load into Snowflake tables (COPY INTO)

Stages¶

Stages hold binary files, which can be queried and copied into tables, with the limitation of no joins, filters, aggregations. The files can be watched and continuously loaded with a snowpipe.

Reference with:

@ for named stages, made with CREATE STAGE <NAME>;
@% for table stages, which are automatically created for permanent, transient, and temporary tables
@~ for user stages, which are available for each user

Inspect with:

DESCRIBE STAGE <name> to show info
LIST @<name>/<path> to find files

Download with scoped and pre-signed URLs .

File Formats & COPY INTO¶

COPY INTO can be used to move data from staged files to Snowflake tables. A file format must be defined to do this.

sql/* Create a file format */
CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY | VOLATILE } ]
  FILE FORMAT [ IF NOT EXISTS ] <format-name>
  TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML }
  --> Optional format arguments (some CSV options shown)
  ENCODING = '<string>' | UTF8
  BINARY_FORMAT = HEX | BASE64 | UTF8
  COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
  RECORD_DELIMITER = '<string>' | NONE
  FIELD_DELIMITER = '<string>' | NONE
  MULTI_LINE = TRUE | FALSE
  PARSE_HEADER = TRUE | FALSE
  SKIP_HEADER = <integer>
  SKIP_BLANK_LINES = TRUE | FALSE
  ESCAPE = '<character>' | NONE
  ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE
  TRIM_SPACE = TRUE | FALSE
  FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE
  NULL_IF = ( '<string>' [ , '<string>' ... ] )
  EMPTY_FIELD_AS_NULL = TRUE | FALSE
  -- ...and more.

  [ COMMENT = '<string_literal>' ]

/* Standard data load - simplified */
COPY INTO <table_name> FROM { stage }
  FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] )
  PATTERN = '<regex_pattern>'
  FILE_FORMAT = ( format-name )
  --> Optional copy arguments (some shown)
  ENFORCE_LENGTH = TRUE | FALSE
  TRUNCATECOLUMNS = TRUE | FALSE
  INCLUDE_METADATA = ( <column_name> = METADATA$<field> [ , <column_name> = METADATA${field} ... ] )
  PURGE = TRUE | FALSE
  RETURN_FAILED_ONLY = TRUE | FALSE
  ON_ERROR = { CONTINUE | SKIP_FILE | SKIP_FILE_<num> | 'SKIP_FILE_<num>%' | ABORT_STATEMENT }
  VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS
  -- ...and more.

Both the file format and copy into have key parameters for controlling file ingestion. See file format type options .
See full COPY INTO syntax and FILE FORMAT syntax for details.

You can also COPY INTO from a table to a file, unloading the data.

Monitoring Copy Commands¶

INFORMATION_SCHEMA.LOAD_HISTORY contains the status of COPY INTO operations and can be queried like so:

sqlSELECT TABLE_NAME, FILE_NAME, LAST_LOAD_TIME, STATUS
  FROM INFORMATION_SCHEMA.LOAD_HISTORY
  WHERE SCHEMA_NAME = CURRENT_SCHEMA();

Snowpipe¶

To automatically run COPY INTO on a stage.

Triggers:
- REST API (notify the pipe to pull from external stage)
- Auto Ingest (internal stage detection)

sqlCREATE PIPE THE_PIPE AS
  COPY INTO INGESTED_TABLE
  FROM @THE_STAGE
  AUTO_INGEST = TRUE;

Snowpipe Streaming¶

To load data row by row through API calls from an external system.

Loads data row by row using some sort of SDK
Useful for streaming data into Snowflake for analysis

Querying & Data Manipulation Language¶

Data Manipulation Language (DML) refers to the normal SQL methods of CRUDlike updates to data (per the SQL:2003 standard ,) with some special Snowflake nuances. See the docs for query syntax .

Topics:

SELECT, INSERT, UPDATE, DELETE, MERGE
Working with semi-structured data (VARIANT, OBJECT, ARRAY)
Window functions, CTEs, analytic SQL
Transforming data in Snowflake

DML & Snowflake SQL - Language Properties & Quirks¶

Case-insensitive unless surrounded in double quotes
- tableone and TABLEONE are the same, but "TableOne" will refer to a different object

Transactions¶

Snowflake handles transactions with ACID⁷
By default Snowflake will AUTOCOMMIT all queries (implicit transactions)
Locks are placed on tables, see resource locking
Nested transactions cannot be undone, see scoped transactions

sqlBEGIN TRANSACTION;
  -- statements
COMMIT;
-- Optionally, ROLLBACK;

“UPDATE, DELETE, and MERGE statements hold locks that generally prevent them from running in parallel with other UPDATE, DELETE, and MERGE statements.”⁸

Transactions can be aborted by the user or account admin.

Data Types¶

See Data types and Summary of Data types for info.

Date & Time Data Types can be stored in a variety of formats:

DATE: Date with no time elements like YYYY-MM-DD
TIME: Time with no offset like HH:MI:SS.MS, up to 9 digits of MS
DATETIME: Alias for TIMESTAMP_NTZ
TIMESTAMP: Alias for TIMESTAMP_NTZ
TIMESTAMP_LTZ: TIMESTAMP with local time zone; time zone, if provided, isn’t stored
TIMESTAMP_NTZ: TIMESTAMP with no time zone; time zone, if provided, isn’t stored
TIMESTAMP_TZ: TIMESTAMP with time zone (as UTC offset)

Semi-Structured Data¶

sql-- flatten the entire table with recursive set to TRUE
-- Note that first-level and all child-level objects are displayed as individual rows with recursive=>true
select key, path, value from  instructor1_fund_db.public.x,
lateral flatten(input => v, recursive=>true)
order by path asc;

SELECT * FROM TABLE (FLATTEN(
input => PARSE_JSON('
{ "glossary": {
    "title": "example glossary",
    "GlossDiv": {
      "title": "S",
      "GlossList": {
        "GlossEntry": {
          "ID": "SGML",
          "SortAs": "SGML",
          "GlossTerm": "Standard Generalized Markup Language",
          "Acronym": "SGML",
          "Abbrev": "ISO 8879:1986",
          "GlossDef": {
            "para": "A meta-markup language, used to create markup languages such as DocBook.",
            "GlossSeeAlso": ["GML", "XML"]
          },
          "GlossSee": "markup" }}}}}
')));

This will produce full paths that can be used in your Snowflake Scripting code.

| Key       | Value                                  | Path                                                 |
|-----------+----------------------------------------+------------------------------------------------------|
| para      | "A meta-markup language, used to..."   | glossary.GlossDiv.GlossList.GlossEntry.GlossDef.para |
| GlossSee  | "markup"                               | glossary.GlossDiv.GlossList.GlossEntry.GlossSee      |
| GlossTerm | "Standard Generalized Markup Language" | glossary.GlossDiv.GlossList.GlossEntry.GlossTerm     |
| ID        | "SGML"                                 | glossary.GlossDiv.GlossList.GlossEntry.ID            |
| SortAs    | "SGML"                                 | glossary.GlossDiv.GlossList.GlossEntry.SortAs        |
| title     | "S"                                    | glossary.GlossDiv.title                              |
| title     | "example glossary"                     | glossary.title                                       |

Metadata¶

Metadata and statistics are stored in the cloud services layer and is used to speed up query compilation. It can be used to completely answer some commands like SHOW and COUNT, not requiring a virtual warehouse.

Caching¶

sql-- Prevent caching for performance testing:
ALTER SESSION SET USE_CACHED_RESULT = FALSE;

Query Result Cache¶

The query result is kept in the cloud services layer and optimized storage
Returned if the micro-partitions used by the query are unchanged
Resultsets are cached for 24 hours after the last use of the data

Data (Warehouse) Cache¶

Stores file headers and column data from queries on SSD in the virtual warehouse. These are cached micro-partitions and not the result.
The data
This is “percentage scanned from cache”

Query Optimization & Performance Tips¶

Use EXPLAIN and the query profile to check the execution plan
Row ops are performed before group ops
Filter as early as possible on well-ordered filter columns
Avoid SELECT * and use SELECT * EXCLUDE (cols) or ILIKE '<pattern>'
Add limits to large ORDER BY calls
Don’t use GROUP BY with high-cardinality columns
Avoid unintentional many-to-many or cross-joins
Use temporary tables for long single-session calculations!
Don’t filter with functions that return a different type then the target column, as the optimizer relies on the column type to partition
Don’t use UDFs in filter/where clauses (no partition pruning)

“Cardinality: The number of distinct values in a column”

Statistics: In the Query Profile tab (openable from "<num> rows" in result view) you will see a few important details:

Scan progress (ex. 13.29%)
Bytes scanned (ex. 13.00GB)
Percentage scanned from cache (ex. 2.12%) the warehouse’s cached partitions
Partitions scanned (ex. 7302) the micro-partitions read
Partitions total (ex. 54924)
Bytes spilled to remote storage (ex. 2.45MB) occurs when a warehouse has insufficient memory, beyond the local storage on the warehouse.

SPs & UDFs - Snowflake Scripting¶

The Snowflake Scripting Developer Guide provides a comprehensive guide, but here are the basics in a nutshell:

The language is imperative and lexically scoped.
Most of the common and familiar SQL commands are available.
There are lots of weird edge cases and sql/object/bound data is all handled slightly differently.
Check the function types documentation to see return styles

Topics:

Stored procedures in SQL, JavaScript, Python
UDFs in SQL, JS, Python, Java, and Scala
When, why, how, and the results of these calls
What is Snowpark⁹, how does it work?

DECLARE & Snowflake Scripting Blocks¶

See the section on Snowflake Scripting for more details.

Snowflake SQL has support for procedural logic and error handling using a built-in SQL extension called Snowflake scripting . You can declare variables and cursors, use control flow logic, handle exceptions, and update tables.

sqlDECLARE
  -- Variables, cursors.
BEGIN
  -- SQL statements
EXCEPTION
  -- Error handling
END;

You CANNOT nest anonymous blocks if a child returns something. This will trigger an immediate exit and stop execution.

This example from the use cases demonstrates variable assignment and a for loop:

sql-- docs.snowflake.com/en/developer-guide/snowflake-scripting/use-cases
DECLARE
  bonus_percentage INT DEFAULT 10;
  performance_value INT DEFAULT 12;
  -- Use input to calculate the bonus percentage
  updated_bonus_percentage NUMBER(2,2) DEFAULT (:bonus_percentage/100);
  --  Declare a result set
  rs RESULTSET;

BEGIN
  -- Assign a query to the result set and execute the query
  rs := (SELECT * FROM bonuses);
  -- Use a FOR loop to iterate over the records in the result set
  FOR record IN rs DO
    -- Assign variable values using values in the current record
    LET emp_id_value INT := record.emp_id;
    LET performance_rating_value INT := record.performance_rating;
    LET salary_value NUMBER(12, 2) := record.salary;
    -- Determine whether the performance rating in the record matches the user input
    IF (performance_rating_value = :performance_value) THEN
      -- If the condition is met, update the bonuses table using the calculated bonus percentage
      UPDATE bonuses SET bonus = ( :salary_value * :updated_bonus_percentage )
        WHERE emp_id = :emp_id_value;
    END IF;
  END FOR;
  -- Return text when the stored procedure completes
  RETURN 'Update applied';
END;

Cursors
Resultsets

Objects¶

Snowflake supports querying and manipulating semi-structured data with up to 128mb per variant/object value.

Arrays¶

Imagine we are running a robot outpost and want to normalize data from a legacy system that stores odometer readings as a six-digit string. We want to set readings with all zeros to a null value. Here is an example of what our schema could look like:

js[
  {"field_name": "OdometerReading",
   "field_format": "NNNNNN",
   "data_type": "COUNTER"},
  {"field_name": "MachineUUID",
   "data_type": "UUID"}
]

To filter this to just COUNTER type fields, we could:

sqlLET FIELDS ARRAY := :TABLE_INFO:JSON_SCHEMA.fields::ARRAY;

SELECT TRANSFORM(FILTER(
    :FIELDS,
    k -> (UPPER(k:data_type::TEXT) = 'COUNTER' AND LENGTH(k:field_format::TEXT) = 8)
),
x -> x:field_name::TEXT)
INTO :FIELDS;

To check a table of these records and replace ’empty’ odometer readings with null, we can iterate through the array and run a query for each field we need to check.

sqlFOR i IN 0 TO (ARRAY_SIZE(:FIELDS) - 1) DO
    COL_NAME := :FIELDS[i]::STRING;

    UPDATE IDENTIFIER(:TABLE_FULL_PATH)
        SET ROW_DATA = OBJECT_INSERT(ROW_DATA, :COL_NAME, PARSE_JSON('null'), true)
        -- If field is set as empty (six zeros) or missing, nullify the value, ensure the field is included:
        WHERE (ROW_PARSED_JSON[:COL_NAME] = '000000' OR ROW_PARSED_JSON[:COL_NAME] IS NULL)
            AND PROCESS_ID = :PROCESS_ID;
END FOR;

Functions & System Functions¶

Some example system functions :

sqlSYSTEM$CANCEL_QUERY
SYSTEM$GLOBAL_ACCOUNT_SET_PARAMETER
SYSTEM$ALLOWLIST_PRIVATELINK
SYSTEM$TYPEOF

Stored Procedures (SPs)¶

Stored procedures are named collections of SQL statements. In Snowflake, these can be created with Snowflake Scripting (SQL) but also JavaScript and via Snowpark⁹.

SPs cannot be called in SQL statements, but can make use of the Snowpark API. The primary goal is to cause side effects in the system.

User Defined Functions (UDFs)¶

UDFs are custom functions that can be written in SQL, JS, Python, Java, or Scala. They can accept parameters and return scalar or tabular results, can be called from SQL statements, and can be overloaded. Data is converted to supported types as it is passed to the functions.

UDFs can be called as part of a SQL statement, returning values for use, but cannot utilize the Snowpark API or libraries. The primary goal is complex data processing.

sqlCREATE FUNCTION JS_SQUARE_ROOT(D double)
  RETURNS DOUBLE
  LANGUAGE JAVASCRIPT
  AS
  $$
  return(Math.sqrt(D));
  $$;

-- Call the function
SELECT js_square_root(2);

-- ==> 1.414213562

-- Drop the function
DROP FUNCTION JS_SQUARE_ROOT(DOUBLE);

UDFs & SPs can call external network services with configuration.
External functions can be set to POST a REST gateway on AWS/GCP/Azure, but they are slower, less secure, cannot be shared, and are scalar only.
Overloading: Functions can have the same name as long as the function signature is different.

External Functions¶

An EXTERNAL function is a lambda or web service behind a proxy.

sqlCREATE OR REPLACE EXTERNAL FUNCTION blorgon_process(str_input varchar)
  RETURNS variant
  API_INTEGRATION = blorgo9t_08 -- API integration object
  AS 'https://blorgo9t.execute-api.us-west-2.amazonaws.com/prod/blorg'; -- Proxy URL

SPs & UDFs - Alternative Languages¶

Python SPs & UDFs¶

See Snowflake Docs: Python Stored Procedures .
See

JavaScript SPs & UDFs¶

See Snowflake Docs: JavaScript Stored Procedures .

There exists a JavaScript Stored Procedures API that provides a snowflake object for use within stored procedures written with JS, enabling branching, looping, error handling, and the dynamic creation of SQL statements.

Java SPs & UDFs¶

See Snowflake Docs: Java Stored Procedures .
See Snowflake Docs: Java UDFs .

Java UDFs take HANDLER and TARGET_PATH parameters - allowing you to optionally provide a JAR file with classes and functions to use.

sqlCREATE OR REPLACE FUNCTION inline_hello(name STRING)
RETURNS STRING
LANGUAGE JAVA
AS
$$
    if (name == null) {
        return "Hello, you! Inline function called!";
    }
    return "Hello, " + name + "! Inline function called!";
$$;

-- Call the function:
SELECT inline_hello('Ryan');

Or as a pre-compiled JAR:

javapackage com.example;

public class HelloUDF {
    public static String hello(String name) {
        if (name == null) {
            return "Hello, you!";
        }
        return "Hello, " + name + "!";
    }
}

Compile to a JAR file:

bashjavac -d . HelloUDF.java
jar cf hello-udf.jar com/example/HelloUDF.class

# => hello-udf.jar created

In snowflake, refer to this function in the JAR file:

sqlCREATE OR REPLACE STAGE my_stage;
PUT file://hello-udf.jar @my_stage auto_compress=false;
-- Stages are referenced with @<name>

-- Define the function
CREATE OR REPLACE FUNCTION hello(name STRING)
  RETURNS STRING
  LANGUAGE JAVA
  IMPORTS = ('@my_stage/hello-udf.jar')
  HANDLER = 'com.example.HelloUDF.hello'
  AS
  $$
    /* Inline Java could be placed here that utilizes functions from the JAR. */
  $$;

-- Call the function:
SELECT hello();

==> Hello, you!

Scala SPs & UDFs¶

Tasks¶

Chains of tasks in directed graphs are an excellent way to handle complex processes in Snowflake environments, with some caveats.

See Snowflake tasks intro official docs.
Check Transformation > Tasks dashboard for tons of useful statistics.
Tasks are great for generating reports, loading data, and updating tables.
Can schedule execution of a stored procedure, or snowflake scripting call.
Can run on a schedule or as a follow-up task.
A 1000 task limit exists for task graphs.
Tasks can use a warehouse or serverless compute.
- Serverless if the task runs in under 30s (will cost less even with 1.5x charge multiplier for serverless compute.)
Use Crontab guru to build and validate your cron expressions.

sqlCREATE TASK SYNC1
  WAREHOUSE = WH1
  SCHEDULE = '30 MINUTE'
    --> OR a crontab expression like this:
    SCHEDULE = 'USING CRON 0,15,30,45 * * * * America/Denver'
    --> OR as a child task
    AFTER PRE_SYNC1, PRE_SYNC2
  -- For ROOT task:
  -- A finalizer task can be passed which will always be run at the end of the DAG.
  FINALIZE = WRAP_UP_TASK
  -- Allows multiple instances of the DAG at once
  ALLOW_OVERLAPPING_EXECUTION = FALSE | TRUE
  -- Max runtime for the task
  USER_TASK_TIMEOUT_MS = 3600000 -- one hour

AS
  COPY INTO BIG_TABLE_1
  FROM $SOME_STAGE;

-- Show task status (started/stopped)
SHOW TASKS;
DESCRIBE TASK <NAME>;

-- Start/Stop: Requires 'execute task' permission:
EXECUTE TASK SYNC1; -- Run task once
ALTER TASK SYNC1 RESUME; -- Start Task Schedule
ALTER TASK SYNC1 SUSPEND; -- Stop Task Schedule
SELECT SYSTEM$TASK_DEPENDENTS_ENABLE('<ROOT TASK NAME>'); -- Resume whole task graph

-- View Task Tree & Dependants
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_DEPENDENTS(task_name => '<NAME>'));
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_DEPENDENTS(task_name => '<NAME>', recursive => FALSE));

Task History¶

Use the Transformation > Tasks dashboard to view runs.

sql-- Check task history (all schemas)
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
    TASK_NAME => 'SYNC1', --> your task name
    SCHEDULED_TIME_RANGE_START => DATEADD('DAY', -6, CURRENT_TIMESTAMP())
))
-- (filter by database/schema here)
ORDER BY SCHEDULED_TIME DESC LIMIT 10;

-- Other useful tables:
SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
SNOWFLAKE.ACCOUNT_USAGE.TASK_HISTORY
SNOWFLAKE.ACCOUNT_USAGE.TASK_VERSIONS
INFORMATION_SCHEMA.SERVERLESS_TASK_HISTORY
INFORMATION_SCHEMA.TASK_HISTORY

Task Graphs (DAGs)¶

A root task can be defined with a schedule
Child tasks are defined without a schedule, and can have multiple parent dependencies to wait for before executing.

Streams¶

Identify and act on changed records in a table.

Created on a table or view
Tracks in Standard (insert/update/delete) or append-only (insert) modes.
Objects created to view and track DML changes to a source table.
A stream is queryable, and is created on top of a table.

sqlCREATE STREAM USER_STREAM ON TABLE USERS;
SELECT * FROM USER_STREAM;

-- ==> Consume a stream!
INSERT INTO HANDLED_TABLE
  SELECT NAME, EMAIL FROM USER_STREAM
  WHERE METADATA$ACTION = 'INSERT'
    AND METADATA$ISUPDATE = 'FALSE';

These have three additional columns:

The action (INSERT, UPDATE, DELETE)
Whether or not it was a write/update operation
The unique Snowflake row ID

Tasks can be configured to process the stream data:

sqlCREATE TASK NEWUSERS1
  WAREHOUSE = WH1
  SCHEDULE = '5 MINUTE' -- can be a CRON expression
WHEN
  SYSTEM$STREAM_HAS_DATA('USER_STREAM')
AS
  INSERT INTO SEND_EMAIL(ID, NAME) SELECT ID, NAME FROM USER_STREAM;

Warehouses & Compute¶

Topics:

What is a warehouse; required for queries / DML
Snowflake Documentation
Warehouse sizes and billing
Snowflake Documentation
Auto-suspend, auto-resume
Multi-cluster warehouses
Best practices for compute sizing & concurrency

Warehouses come in T-Shirt sizes (S, M, L, etc,) and a few varieties. Gen1, Gen2, Interactive, Snowpark-Optimized, and Adaptive warehouses each have particular properties, advantages, drawbacks, and cost.

Performance & Cost Optimization¶

Topics:

Choosing warehouse sizes vs concurrency
Pruning / clustering / clustering keys
Caching, result caching
Query profiling / query history
Avoiding overprovisioning / idle compute

Scaling Up & Scaling Out¶

Scale Out (more warehouses) for more concurrent tasks
Scale Up (larger warehouses) for heavier tasks
Larger warehouses are not faster (single-threaded) but have more threads and storage
Scale to maximize use of the data cache and minimize resource contention
For example, larger warehouses can be cheaper if they take half the time of a medium warehouse for the same task.
Auto-scaling policies:
- Standard: Eliminate queueing
- Economy: Eliminate queueing if the work will occupy another machine for over six minutes

Four X-Small warehouses will consume the same credits as one Medium warehouse, but will handle concurrent requests better - but will not keep as many cached micro-partitions.

Example - Calculating Optimal Warehouse Size¶

The following data was collected for a given computation:

Warehouse Size	Compute Time	Credits/Hour	Credits Consumed
Small	32m	2	1.067
Medium	14m	4	0.934
Large	7m	8	0.934
Extra Large	5m	16	1.334
2 Extra Large	3m	32	1.6

In this scenario, Medium and Large warehouse sizes are the sweet spot to run as they more than halve, or halve, the compute time. It is cheapest to run the query on those warehouses.

Serverless Compute (Task under 30s? Use it.)¶

Use serverless compute if a task takes less than 30 seconds.
- The 1.5x cost is justified in this case and will be cheaper.

Monitoring Usage & Billing¶

Identify, understand, and limit costs with monitoring.

Security & Access Control¶

Topics:

Account access
Roles, users, grants, privileges
Snowflake Documentation
Object-level permissions
Row access policies, masking policies
Network / IP policies, secure views, encryption

Access Control Framework¶

DAC: Discretionary Access Control
RBAC: Role-Based Access Control (key)
UBAC: User-Based Access Control

RBAC¶

Roles can be applied with GRANT
A single role has OWNERSHIP privileges on each object.

sqlSHOW GRANTS; --> See your current grants
SHOW GRANTS TO USER <USER_NAME>; --> Grants for USER
SHOW GRANTS TO ROLE <ROLE_NAME>; --> Grants for ROLE

-- Use GRANT to add privileges
GRANT USAGE ON WAREHOUSE <WAREHOUSE_NAME>; --> To your current role
GRANT USAGE ON DATABASE <DB_NAME> TO ROLE <NAME>;
GRANT USAGE ON SCHEMA <DB_NAME.SCHEMA_NAME> TO ROLE <NAME>;
GRANT SELECT ON TABLE <TABLE_NAME> TO ROLE <NAME>;
-- ^^ USAGE on the DB and SCHEMA is required to see this table.

Secondary Roles can be aggregated to the user, but do not include CREATE privileges - useful for viewing protected tables.

sqlSELECT CURRENT_SECONDARY_ROLES(); --> See your secondary roles
USE SECONDARY ROLES <ROLE_NAME>; --> Use this role
USE SECONDARY ROLES NONE; --> Clear seondary roles

A user has DEFAULT_SECONDARY_ROLES. (verify)

Column Access Policies (Masking, Tokenization)¶

docs.snowflake.com/en/user-guide/security-column-intro

Row Access Policies¶

docs.snowflake.com/en/user-guide/security-column-intro

Global Organization Administrator¶

Each account is created on specific provider, in a particular region, as a single Snowflake edition. An organization can manage one or more accounts for different departments, projects, or locations.

The GLOBALORGADMIN account can be used to create more accounts and manage their lifecycle;

sqlUSE ROLE GLOBALORGADMIN;

CREATE ACCOUNT analytics4
  ADMIN_NAME = admin7
  REGION = aws_us_west_2
  etc;

SHOW ACCOUNTS;

The short-form of the organization and account is shown in your URL.

https://uslkjpw-sjl18827.snowflakecomputing.com/
        ------- --------
          Org.    Acct.

This URL will prompt you to login, then redirect you to Snowsight¹.

Replication, Backup, Time-Travel, Cloning¶

Topics:

Time Travel & Fail-safe
Zero-copy cloning

Time Travel¶

Once enabled, allows querying and point-in-time restoration of data.

sql-- Create a table and set the retention period
CREATE TABLE BIRD_COUNT (num INTEGER, bird STRING)
  DATA_RETENTION_TIME_IN_DAYS=90;

-- Query utilizing time travel features
SELECT * FROM BIRD_COUNT
  -- A certain number of (seconds) ago?
  AT(OFFSET => -60*7);
  -- At a particular time
  AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);
  -- Before a particular query
  BEFORE(STATEMENT => 'UUID (get with LAST_QUERY_ID())');

-- See tables in history
SHOW TABLES HISTORY;

Fail-safe storage also keeps these past the retention period for 7 days for all subscription tiers, but is only accessible to Snowflake personnel.

Zero Copy Cloning¶

A new read/write object is created without duplication of data.

sql-- Create a clone of the table.
CREATE OR REPLACE TABLE birdwatching
  CLONE other_db.other_schema.birdwatching;

-- Clone from a particular time
CREATE OR REPLACE TABLE birdwatching_tuesday
  CLONE other_db.other_schema.birdwatching
  AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);

-- You can clone entire databases
CREATE OR REPLACE DATABASE birdwatching_db_tuesday
  CLONE other_db AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);

Backups¶

Backup, an immutable point-in-time object
Backup Policy with a schedule, expiry, and retention lock
Backup Set which groups backups of a particular object

A backup can have a legal hold at and above Business Critical tier.

Replication¶

Objects can be synchronized between accounts in the same organization

Advanced Features¶

Materialized Views
Tasks & scheduled pipelines
Search optimization service
Streams & Change Data Capture (CDC)

Snowflake Cortex (AISQL)¶

Extends the SQL language with AI-related functions like AI_COMPLETE, AI_EXTRACT, and AI_TRANSCRIBE. A specific model can be provided as a string in the input parameters, along with other typical parameters like temperature. Cortex also enables the creation of chat and vector search APIs from your data.

sqlSELECT AI_COMPLETE('What is the airspeed velocity of an unladen swallow?');

Snowflake cortex fine-tuning can be used to optimize a smaller, cheaper model for tasks like sentiment analysis
Cortex Search enables semantic/vector search on your table data
Cortex Analyst enables a ‘chat with your data’ type REST API

Snowflake ML & Document AI¶

Cost considerations:

Snowflake has some proprietary built-in classical ML models - pipelines using those models that can extract data from a stage full of PDFs. Snowflake provides a fine-tuning interface to help improve the fields and output for your particular type of document.

sqlSELECT invoice_reader!PREDICT(get_presigned_url('@stage/one.pdf'), 1);

Tips, Best Practices & Gotchas¶

Topics:

Avoid leaving warehouses running unnecessarily
For small queries, very large warehouses may not help
Use clustering only when needed
Be aware of billing granularity (per second)
Watch out for too many small files in load
Query patterns that can disrupt performance

The Bad Parts, Sharp Edges, and Boondoggles¶

Despite its benefits, Snowflake (like any large platform) has a number of strange edge cases that can cut and hurt you without foreknowledge.

Control Character Handling¶

Regarding characters like 0x00 and 0x01:

You cannot pass control characters in strings as procedure arguments
You cannot use control characters as arguments for COPY INTO and other functions
The REPLACE_INVALID_CHARACTERS flag compromises data integrity when attempting to perfectly replicate the data in Snowflake’s databases

Different NULLS¶

See Snowflake user-guide/semistructured-considerations#null-values

SQL “NULL” and JSON “null” are handled differently in Snowflake. Checking a value like so will fail and always return false:

sql-- OBJECT {"test": null}

IF(:OBJECT:test is NULL) THEN
   -- This will never run
END IF;

Instead, use the IS_NULL_VALUE function to check this.

sqlIF(IS_NULL_VALUE(:OBJECT:test)) THEN
   -- This will correctly trigger
END IF;

Returning in Anonymous Blocks¶

Returning something in an anonymous nested block will terminate the parent and stop execution.

Keeping Up To Date¶

Preview features: docs.snowflake.com/en/release-notes/preview-features

COF-C02 - Snowpro Core Certification¶

Study guide: learn.snowflake.com/en/certifications/snowpro-core-c03

You can download the study guide for this exam on the Snowflake COF-C02 Exam Guide page. This guide is updated frequently, so go and request your own copy if possible. It is a 100-question, 115-minute test. Snowpro Core is a prerequisite for advanced certifications.

You will be expected to have knowledge of:

Data loading and transformation in Snowflake
Virtual Warehouses - best practices, performance, concurrency
DDL and DML queries
Working with semi-structured and unstructured data
Cloning and time travel
Data sharing
Account structure and management

The cole’s notes on each of the key topics are below.

(24%) Snowflake AI Data Cloud - Features and Architecture¶

[GENAI] Key Concepts:

Multi-Cluster Shared Data Architecture: Understand the separation of Storage (centralized on cloud providers like S3/Azure Blob), Compute (independent virtual warehouses), and Cloud Services (manages metadata, security, and optimization).
Snowflake Editions & Capabilities: Know the differences between Standard, Enterprise, Business Critical, and VPS, specifically regarding Time Travel retention (1 day vs. 90 days) and security features like Customer Managed Keys (Tri-Secret Secure).
Storage Management: Snowflake uses micro-partitions (immutable, columnar storage) and data clustering to optimize query performance without manual indexing.

(18%) Data Transformations¶

[GENAI] Key Concepts:

ELT (Extract, Load, Transform): Snowflake prioritizes an ELT approach where raw data is loaded first and then transformed using Snowflake’s compute power via SQL, Tasks, and Streams.
Semi-Structured Data Handling: Master the VARIANT data type and the use of FLATTEN and lateral joins to query and transform JSON, Avro, and Parquet data.
Automated Transformation Tools: Understand the roles of Dynamic Tables (declarative data pipelines), Streams (change data capture), and Tasks (scheduled SQL execution).

(18%) Accounts: Access & Security¶

[GENAI] Key Concepts:

Role-Based Access Control (RBAC): Grasp the hierarchy of system roles (ACCOUNTADMIN, SECURITYADMIN, USERADMIN, SYSADMIN, PUBLIC) and the concept of “inheritance” where higher-level roles inherit privileges from lower ones.
Authentication & Connectivity: Familiarize yourself with Multi-Factor Authentication (MFA), Key-Pair Authentication for service accounts, and Network Policies for IP whitelisting.
Data Governance: Know the features of Snowflake Horizon, including Column-level Security (Dynamic Data Masking) and Row-level Security (Row Access Policies).

(16%) Performance & Cost Optimization¶

[GENAI] Key Concepts:

Virtual Warehouse Scaling: Differentiate between Scaling Up (increasing warehouse size for large, complex queries) and Scaling Out (adding clusters to a Multi-Cluster Warehouse to handle high concurrency).
Caching Layers: Understand the three types of cache: Result Cache (24-hour persistence), Local Disk Cache (data cached on SSDs of the warehouse), and Cloud Services Cache (metadata cache).
Cost Management Tools: Learn how to use Resource Monitors to set credit limits and the Account Usage views to track consumption of compute and storage.

(12%) Data Loading & Unloading¶

[GENAI] Key Concepts:

Bulk vs. Continuous Loading: Know when to use the COPY INTO command (batch loading using virtual warehouses) versus Snowpipe (automated, serverless continuous loading).
Staging Areas: Distinguish between Internal Stages (User, Table, and Named stages) and External Stages (pointing to S3, Azure, or GCS buckets).
File Formats & Optimization: Understand how to define File Format objects and why it is a best practice to split large files into 100-250 MB chunks for parallel processing during load.

[GENAI] Key Concepts:

Time Travel & Fail-safe: Remember that Time Travel allows querying/restoring data within the retention period (1-90 days), while Fail-safe provides a non-configurable 7-day recovery window for Snowflake support only.
Zero-Copy Cloning: Understand that cloning creates a snapshot of data (tables, schemas, or databases) that initially shares the same storage as the original, meaning no additional storage costs until the clone is modified.
Secure Data Sharing: Know that Shares allow providers to grant read-only access to consumers without copying data, and Reader Accounts are used to share data with parties who do not have their own Snowflake account.

References & Further Reading¶

Snowflake has tons of interesting documents in their resource library , including migration advice, using Snowflake as a backing database for agents, and much more.

Topics:

Official Snowflake documentation (docs.snowflake.com)
Snowflake Documentation
Tutorials (“Snowflake in 20 minutes”)
Snowflake Documentation
Quickstarts & hands-on labs
Snowflake Quickstarts
SQL command reference
Snowflake Documentation

Snowsight is the Snowflake web interface . ↩︎ ↩︎ ↩︎
Columnar Storage is read-optimized, enabling quick seeking through rows without having to read over the entire content of each row, like a CSV or other row-oriented storage format. ↩︎
Retention period can be set on tables during table creation or alteration with the DATA_RETENTION_TIME_IN_DAYS option. Snowflake support can also access fail-safe storage for seven days past this period. ↩︎
EC2 or equivalent cloud-based virtual machines. ↩︎
CBO is Cost-Based Optimization ↩︎
Apache Iceberg data format, see https://iceberg.apache.org/ ↩︎ ↩︎
ACID: Atomicity, Consistency, Isolation, Durability, see wikipedia: ACID ↩︎
Snowflake: Transactions: Resource Locking ↩︎
Snowpark is a multi-language framework for executing remote data operations within Snowflake warehouses, close to the data. ↩︎ ↩︎

Site Directory

Pages are organized by last modified.

Page Information

Title: Snowflake
Word Count: 7016 words
Reading Time: 33 minutes
Permalink:
→ https://manuals.ryanfleck.ca/snowflake/

Work licensed under the CC BY-SA 4.0 license unless otherwise specified.

Ryan's Manuals

Snowflake

Contents

Snowflake¶

Why Snowflake?¶

Key Concepts & Architecture¶

Multi-Cluster Shared Data Architecture¶

Data Storage Layer¶

Query Processing Layer¶

Global Services Layer¶

Editions & Pricing¶

Editions¶

Billing¶

Integration and Connectors¶

Finding Your Environment & Connection Details¶

LLM-Accessible Documentation¶

Snowsight¶

Streamlit Apps¶

Snowflake Drivers & Connectors¶

Snowflake CLI¶

Partner Tools¶

Snowpark¶

Snowflake Objects & DDL Commands¶

Object Naming Rules¶

General DDL Commands¶

Parameters & Query Tags¶

Session Variables¶

Databases¶

Schemas¶

Tables¶

Hybrid Tables¶

Shares¶

Views¶

Data Loading & Unloading¶

High-Level Data Loading Process¶

Stages¶

File Formats & COPY INTO¶

Monitoring Copy Commands¶

Snowpipe¶

Snowpipe Streaming¶

Querying & Data Manipulation Language¶

DML & Snowflake SQL - Language Properties & Quirks¶

Transactions¶

Data Types¶

Semi-Structured Data¶

Metadata¶

Caching¶

Query Result Cache¶

Data (Warehouse) Cache¶

Query Optimization & Performance Tips¶

SPs & UDFs - Snowflake Scripting¶

DECLARE & Snowflake Scripting Blocks¶

Objects¶

Arrays¶

Functions & System Functions¶

Stored Procedures (SPs)¶

User Defined Functions (UDFs)¶

External Functions¶

SPs & UDFs - Alternative Languages¶

Python SPs & UDFs¶

JavaScript SPs & UDFs¶

Java SPs & UDFs¶

Scala SPs & UDFs¶

Tasks¶

Task History¶

Task Graphs (DAGs)¶

Streams¶

Warehouses & Compute¶

Performance & Cost Optimization¶

Scaling Up & Scaling Out¶

Example - Calculating Optimal Warehouse Size¶

Serverless Compute (Task under 30s? Use it.)¶

Monitoring Usage & Billing¶

Security & Access Control¶

Access Control Framework¶

RBAC¶

Column Access Policies (Masking, Tokenization)¶

Row Access Policies¶

Global Organization Administrator¶

Replication, Backup, Time-Travel, Cloning¶