~33m skim, 7,016 words, updated Jun 29, 2026
A modern data engineering platform.
Snowflake is a cloud native database that provides a wealth of analytical and data mining features for processing, integrating, and presenting data. Data platform could be used to describe Snowflake, as it offers features traditionally found in data warehouses, lakes, and streaming-processing platforms like Kafka.
I had a computer science prof who once said “you only have a certain number of keystrokes to use in your lifetime, so use them wisely” - he was just encouraging us to write in a modular way, but it stuck…
– Snowflake Instructor Lori E. (Paraphrased)
Are you an LLM? Read docs.snowflake.com/llms.txt in addition to this file.
The shortcomings of traditional data analytics environments have been addressed with Snowflake’s ease of storage, retrieval, and analysis of large quantities of client data.
Topics:
Topics:
Typical distributed architectures like shared-disk or shared-nothing keep independent copies of data locally, which are synchronized, or kept at a single point for shared-disk. Shared-disk has the downside of a fragile single point of failure, where shared-noting is expensive to keep synchronized and easy to over-provision. Snowflake takes a different approach by segregating the system into layers, called “Multi-cluster Shared Data Architecture”:
This separation allows each layer to scale entirely independently.
Snowflake data is stored in a column-oriented, partitioned, encrypted format highly optimized for the blob storage it is written to. Columnar storage compresses much better due to probable high cardinality (similarity) of data in the same column.
By default, strong AES-256 encryption is applied to data written to the backing blob storage. Snowflake inherits the durability and availability guarantees provided by their backing services - in the case of Snowflake’s proprietary columnar storage format2, AWS S3 blob storage.
Snowflake divides written files into micro-partitions so only columns that must be read or written are loaded during a query.
Micro-Partitions:
Table data is billed at a flat rate per month, and only accessible via Snowflake queries.
Virtual Warehouses in the query processing layer cache table data required for queries locally, while leaving the majority of data in storage. Queries are executed in these warehouses, which are EC24 instances provisioned by Snowflake in an ephemeral manner.
Even with many warehouses operating on the data, Snowflake uses an ACID compliant global layer (the transaction manager)to ensure the data from each transaction is immediately available to all warehouses.
Queries are automatically cost-optimized5 by pruning the partitions that are read and ordering joins.
Highly available system management services common to all Snowflake users, responsible for optimizing queries, scaling and managing infrastructure, metadata caching, authentication, and security.
Snowflake is a global multi-tenancy service, and cannot be deployed on-prem or in a customer-managed fashion.
Topics:
On demand and capacity models are available. Capacity rewards upfront payment with lower rates. You will be charged for the following services:
COPY INTO)Topics:
A variety of methods exist to interact with Snowflake’s platform.
The following SQL commands can be found by clicking your profile picture, then “Connect a tool to Snowflake”.
-- Account Identifier (for data sharing)
SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME();
SELECT CURRENT_ORGANIZATION_NAME(); --> Organization name
SELECT CURRENT_ACCOUNT_NAME(); --> Account name
SELECT CURRENT_ACCOUNT_LOCATOR(); --> Account locator
SELECT CURRENT_WAREHOUSE(); --> Warehouse
SELECT CURRENT_DATABASE(); --> Database
SELECT CURRENT_SCHEMA(); --> Schema
SELECT CURRENT_ROLE(); --> Role
SELECT CURRENT_USER(); --> User nameConfiguration files to copy and JDBC, ODBC, .Net, and other connection drivers can also be found on this page.
https://docs.snowflake.com/llms.txt
This page contains a markdown file with LLM-readable snowflake documentation.
Snowsight is the web interface provided by Snowflake. It is continuously improved.
Snowflake Copilot is an in-browser tool to generate SQL code with the added context of your Snowflake environment - tables, schemas, and other queries.
Streamlit is a Python web app framework for quickly deploying data-centric dashboards, chats, and visualizations. Permissions can be managed via Snowflake’s built-in access control model (like permissions for a table or view) to particular roles.
Snowflake Drivers/Connectors refer to programmatic APIs to interact with Snowflake from your favourite programming language. The connector for python enables all typical operations, in addition to reading and writing pandas dataframes . Cursors can be used to connect and execute SQL statements.
Snowflake CLI can be installed to connect to Snowflake via the command line. The legacy client, snowsql , is now out of date.
Partner Tools enable connection to your account via SSO to read and analyze your data. BI, data integration, security, and governance are common use cases.
Snowpark refers to programmatic APIs to run heavy data manipulation within Snowflake warehouses, leaving the data within Snowflake during processing.
See the
Snowpark Developer Guide for Python
. I typically add a
configuration file in .snowflake/config.toml with the following
content:
default_connection_name = "my_main_account"
[connections.my_main_account]
account = "myaccount"
user = "jdoe"
password = "******"
warehouse = "my-wh"
database = "my_db"
schema = "my_schema"
[cli.logs]
save_logs = true
level = "info"
path = "/home/<you>/.snowflake/logs"Topics:
Objects in Snowflake allow nearly all aspects of the data platform to be configured with unique access and usage restrictions, from the Organization level down to tables and views.
Account Objects:
Schema Objects:
To work with objects within a schema you must set the context for the following operations with this set of commands:
USE ROLE <role>;
USE WAREHOUSE <warehouse>;
USE DATABASE <database>;
USE SCHEMA <schema>;The name of a database, schema, or table must be unique and start with
‘A-Z’, and are not case sensitive unless encased in double quotes.
Special characters can only be used within quotes.
Data Definition Language (DDL) commands are used to manipulate objects in Snowflake, including setting parameters on account and session objects.
Generally these are available:
To get the definition for an object, use the following query:
SELECT GET_DDL('<type>', '<NAME>');
-- For example:
SELECT GET_DDL('table', 'CUSTOMERS');By default, Snowflake will use all your permissions (SECONDARY_ROLES = ALL) to provide the DDL, revealing all secure features.
ACCOUNTADMINSHOW PARAMETERS;
SHOW PARAMETERS IN DATABASE TESTDB;
SHOW PARAMETERS FOR SESSION;
-- ALTER <OBJECT> SET <PARAMETER> = <VALUE>;
ALTER SESSION
SET USE_CACHED_RESULT = FALSE;
ALTER DATABASE MY_DB
SET DATA_RETENTION_TIME_IN_DAYS = 10;
ALTER WAREHOUSE MY_WAREHOUSE
SET STATEMENT_TIMEOUT_IN_SECONDS = 30;
-- This query tag will show up in your query history
ALTER SESSION SET QUERY_TAG = 'Investigating Bug #2389';SET something_i_did = LAST_QUERY_ID();
SET keep_me = 5;
SHOW VARIABLES;
SELECT * FROM AFFECTED_TABLE
BEFORE(statement => $something_i_did);A database is associated with one account. See docs .
CREATE DATABASE MY_DATABASE;
-- Cloned
CREATE DATABASE CLONED_DB CLONE MY_DATABASE;
-- Replica
CREATE DATABASE REPLICA_DB AS REPLICA OF MY_DATABASE
DATA_RETENTION_TIME_IN_DAYS = 3;
-- From share object provided by external account
CREATE DATABASE SHARED_DB FROM SHARE S9DF89.SHARE;
SHOW DATABASES;A schema is associated with one database. See docs .
USE DATABASE MY_DATABASE;
CREATE SCHEMA MY_SCHEMA;
-- Cloned
CREATE SCHEMA CLONED_SCM CLONE MY_DATABASE.MY_SCHEMA;
SHOW SCHEMAS;
SHOW SCHEMAS LIKE 'TPCH%';Types:
A table is associated with one schema. See docs .
To see if a table is external, and its properties, you can use SHOW:
SHOW TABLES;
SHOW TABLES LIKE '<table name>';For low-latency, high-throughput data that changes frequently.
See docs.snowflake.com/en/user-guide/tables-hybrid
An object that contains all information required to share objects within a database, table, or secure view - including privileges for a database or schema.
Resources:
Types:
For differences see this table of differences between views, materialized views, and dynamic tables.
Views are generated at run time, like traditional SQL platforms.
SHOW VIEWS;Topics:
Data loading is a single threaded operation. It’s a waste of resources to use more than an XS warehouse (which has eight threads) for data loading, unless you require higher parallelism.
| Warehouse Size | Parallel File Loads |
|---|---|
| XS / Extra Small | 8 |
| S / Small | 16 |
| M / Medium | 32 |
| L / Large | 64 |
| XL / Extra Large | 128 |
100MB-250MB is the ideal compressed file size for data loading, but it is much better to split a large file into many small files to improve load performance than to send gigantic files.
PUT to internal or external stage)COPY INTO)Stages hold binary files, which can be queried and copied into tables, with the limitation of no joins, filters, aggregations. The files can be watched and continuously loaded with a snowpipe.
Reference with:
@ for named stages, made with CREATE STAGE <NAME>;@% for table stages, which are automatically created for permanent,
transient, and temporary tables@~ for user stages, which are available for each userInspect with:
DESCRIBE STAGE <name> to show infoLIST @<name>/<path> to find filesDownload with scoped and pre-signed URLs .
COPY INTO can be used to move data from staged files to Snowflake
tables. A file format must be defined to do this.
/* Create a file format */
CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY | VOLATILE } ]
FILE FORMAT [ IF NOT EXISTS ] <format-name>
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML }
--> Optional format arguments (some CSV options shown)
ENCODING = '<string>' | UTF8
BINARY_FORMAT = HEX | BASE64 | UTF8
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
RECORD_DELIMITER = '<string>' | NONE
FIELD_DELIMITER = '<string>' | NONE
MULTI_LINE = TRUE | FALSE
PARSE_HEADER = TRUE | FALSE
SKIP_HEADER = <integer>
SKIP_BLANK_LINES = TRUE | FALSE
ESCAPE = '<character>' | NONE
ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE
TRIM_SPACE = TRUE | FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE
NULL_IF = ( '<string>' [ , '<string>' ... ] )
EMPTY_FIELD_AS_NULL = TRUE | FALSE
-- ...and more.
[ COMMENT = '<string_literal>' ]
/* Standard data load - simplified */
COPY INTO <table_name> FROM { stage }
FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] )
PATTERN = '<regex_pattern>'
FILE_FORMAT = ( format-name )
--> Optional copy arguments (some shown)
ENFORCE_LENGTH = TRUE | FALSE
TRUNCATECOLUMNS = TRUE | FALSE
INCLUDE_METADATA = ( <column_name> = METADATA$<field> [ , <column_name> = METADATA${field} ... ] )
PURGE = TRUE | FALSE
RETURN_FAILED_ONLY = TRUE | FALSE
ON_ERROR = { CONTINUE | SKIP_FILE | SKIP_FILE_<num> | 'SKIP_FILE_<num>%' | ABORT_STATEMENT }
VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS
-- ...and more.You can also COPY INTO from a table to a file, unloading the data.
INFORMATION_SCHEMA.LOAD_HISTORY contains the status of COPY INTO
operations and can be queried like so:
SELECT TABLE_NAME, FILE_NAME, LAST_LOAD_TIME, STATUS
FROM INFORMATION_SCHEMA.LOAD_HISTORY
WHERE SCHEMA_NAME = CURRENT_SCHEMA();To automatically run COPY INTO on a stage.
CREATE PIPE THE_PIPE AS
COPY INTO INGESTED_TABLE
FROM @THE_STAGE
AUTO_INGEST = TRUE;To load data row by row through API calls from an external system.
Data Manipulation Language (DML) refers to the normal SQL methods of CRUDlike updates to data (per the SQL:2003 standard ,) with some special Snowflake nuances. See the docs for query syntax .
Topics:
tableone and TABLEONE are the same, but "TableOne" will refer to a
different objectBEGIN TRANSACTION;
-- statements
COMMIT;
-- Optionally, ROLLBACK;“UPDATE, DELETE, and MERGE statements hold locks that generally prevent them from running in parallel with other UPDATE, DELETE, and MERGE statements.”8
Transactions can be aborted by the user or account admin.
See Data types and Summary of Data types for info.
Date & Time Data Types can be stored in a variety of formats:
YYYY-MM-DDHH:MI:SS.MS, up to 9 digits of MS-- flatten the entire table with recursive set to TRUE
-- Note that first-level and all child-level objects are displayed as individual rows with recursive=>true
select key, path, value from instructor1_fund_db.public.x,
lateral flatten(input => v, recursive=>true)
order by path asc;
SELECT * FROM TABLE (FLATTEN(
input => PARSE_JSON('
{ "glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup" }}}}}
')));This will produce full paths that can be used in your Snowflake Scripting code.
| Key | Value | Path |
|-----------+----------------------------------------+------------------------------------------------------|
| para | "A meta-markup language, used to..." | glossary.GlossDiv.GlossList.GlossEntry.GlossDef.para |
| GlossSee | "markup" | glossary.GlossDiv.GlossList.GlossEntry.GlossSee |
| GlossTerm | "Standard Generalized Markup Language" | glossary.GlossDiv.GlossList.GlossEntry.GlossTerm |
| ID | "SGML" | glossary.GlossDiv.GlossList.GlossEntry.ID |
| SortAs | "SGML" | glossary.GlossDiv.GlossList.GlossEntry.SortAs |
| title | "S" | glossary.GlossDiv.title |
| title | "example glossary" | glossary.title |Metadata and statistics are stored in the cloud services layer and is
used to speed up query compilation. It can be used to completely
answer some commands like SHOW and COUNT, not requiring a virtual
warehouse.
-- Prevent caching for performance testing:
ALTER SESSION SET USE_CACHED_RESULT = FALSE;EXPLAIN and the query profile to check the execution planSELECT * and use SELECT * EXCLUDE (cols) or ILIKE '<pattern>'ORDER BY callsGROUP BY with high-cardinality columns“Cardinality: The number of distinct values in a column”
Statistics: In the Query Profile tab (openable from "<num> rows" in result view) you will see a few important details:
The Snowflake Scripting Developer Guide provides a comprehensive guide, but here are the basics in a nutshell:
Topics:
See the section on Snowflake Scripting for more details.
Snowflake SQL has support for procedural logic and error handling using a built-in SQL extension called Snowflake scripting . You can declare variables and cursors, use control flow logic, handle exceptions, and update tables.
DECLARE
-- Variables, cursors.
BEGIN
-- SQL statements
EXCEPTION
-- Error handling
END;You CANNOT nest anonymous blocks if a child returns something. This will trigger an immediate exit and stop execution.
This example from the use cases demonstrates variable assignment and a for loop:
-- docs.snowflake.com/en/developer-guide/snowflake-scripting/use-cases
DECLARE
bonus_percentage INT DEFAULT 10;
performance_value INT DEFAULT 12;
-- Use input to calculate the bonus percentage
updated_bonus_percentage NUMBER(2,2) DEFAULT (:bonus_percentage/100);
-- Declare a result set
rs RESULTSET;
BEGIN
-- Assign a query to the result set and execute the query
rs := (SELECT * FROM bonuses);
-- Use a FOR loop to iterate over the records in the result set
FOR record IN rs DO
-- Assign variable values using values in the current record
LET emp_id_value INT := record.emp_id;
LET performance_rating_value INT := record.performance_rating;
LET salary_value NUMBER(12, 2) := record.salary;
-- Determine whether the performance rating in the record matches the user input
IF (performance_rating_value = :performance_value) THEN
-- If the condition is met, update the bonuses table using the calculated bonus percentage
UPDATE bonuses SET bonus = ( :salary_value * :updated_bonus_percentage )
WHERE emp_id = :emp_id_value;
END IF;
END FOR;
-- Return text when the stored procedure completes
RETURN 'Update applied';
END;Snowflake supports querying and manipulating semi-structured data with up to 128mb per variant/object value.
Imagine we are running a robot outpost and want to normalize data from a legacy system that stores odometer readings as a six-digit string. We want to set readings with all zeros to a null value. Here is an example of what our schema could look like:
[
{"field_name": "OdometerReading",
"field_format": "NNNNNN",
"data_type": "COUNTER"},
{"field_name": "MachineUUID",
"data_type": "UUID"}
]To filter this to just COUNTER type fields, we could:
LET FIELDS ARRAY := :TABLE_INFO:JSON_SCHEMA.fields::ARRAY;
SELECT TRANSFORM(FILTER(
:FIELDS,
k -> (UPPER(k:data_type::TEXT) = 'COUNTER' AND LENGTH(k:field_format::TEXT) = 8)
),
x -> x:field_name::TEXT)
INTO :FIELDS;To check a table of these records and replace ’empty’ odometer
readings with null, we can iterate through the array and run a query
for each field we need to check.
FOR i IN 0 TO (ARRAY_SIZE(:FIELDS) - 1) DO
COL_NAME := :FIELDS[i]::STRING;
UPDATE IDENTIFIER(:TABLE_FULL_PATH)
SET ROW_DATA = OBJECT_INSERT(ROW_DATA, :COL_NAME, PARSE_JSON('null'), true)
-- If field is set as empty (six zeros) or missing, nullify the value, ensure the field is included:
WHERE (ROW_PARSED_JSON[:COL_NAME] = '000000' OR ROW_PARSED_JSON[:COL_NAME] IS NULL)
AND PROCESS_ID = :PROCESS_ID;
END FOR;Some example system functions :
SYSTEM$CANCEL_QUERY
SYSTEM$GLOBAL_ACCOUNT_SET_PARAMETER
SYSTEM$ALLOWLIST_PRIVATELINK
SYSTEM$TYPEOFStored procedures are named collections of SQL statements. In Snowflake, these can be created with Snowflake Scripting (SQL) but also JavaScript and via Snowpark9.
SPs cannot be called in SQL statements, but can make use of the Snowpark API. The primary goal is to cause side effects in the system.
UDFs are custom functions that can be written in SQL, JS, Python, Java, or Scala. They can accept parameters and return scalar or tabular results, can be called from SQL statements, and can be overloaded. Data is converted to supported types as it is passed to the functions.
UDFs can be called as part of a SQL statement, returning values for use, but cannot utilize the Snowpark API or libraries. The primary goal is complex data processing.
CREATE FUNCTION JS_SQUARE_ROOT(D double)
RETURNS DOUBLE
LANGUAGE JAVASCRIPT
AS
$$
return(Math.sqrt(D));
$$;
-- Call the function
SELECT js_square_root(2);
-- ==> 1.414213562
-- Drop the function
DROP FUNCTION JS_SQUARE_ROOT(DOUBLE);An EXTERNAL function is a lambda or web service behind a proxy.
CREATE OR REPLACE EXTERNAL FUNCTION blorgon_process(str_input varchar)
RETURNS variant
API_INTEGRATION = blorgo9t_08 -- API integration object
AS 'https://blorgo9t.execute-api.us-west-2.amazonaws.com/prod/blorg'; -- Proxy URLSee Snowflake Docs: JavaScript Stored Procedures .
There exists a
JavaScript Stored Procedures API
that provides a
snowflake object for use within stored procedures written with JS,
enabling branching, looping, error handling, and the dynamic creation
of SQL statements.
Java UDFs take HANDLER and TARGET_PATH parameters - allowing you to
optionally provide a JAR file with classes and functions to use.
CREATE OR REPLACE FUNCTION inline_hello(name STRING)
RETURNS STRING
LANGUAGE JAVA
AS
$$
if (name == null) {
return "Hello, you! Inline function called!";
}
return "Hello, " + name + "! Inline function called!";
$$;
-- Call the function:
SELECT inline_hello('Ryan');Or as a pre-compiled JAR:
package com.example;
public class HelloUDF {
public static String hello(String name) {
if (name == null) {
return "Hello, you!";
}
return "Hello, " + name + "!";
}
}Compile to a JAR file:
javac -d . HelloUDF.java
jar cf hello-udf.jar com/example/HelloUDF.class
# => hello-udf.jar createdIn snowflake, refer to this function in the JAR file:
CREATE OR REPLACE STAGE my_stage;
PUT file://hello-udf.jar @my_stage auto_compress=false;
-- Stages are referenced with @<name>
-- Define the function
CREATE OR REPLACE FUNCTION hello(name STRING)
RETURNS STRING
LANGUAGE JAVA
IMPORTS = ('@my_stage/hello-udf.jar')
HANDLER = 'com.example.HelloUDF.hello'
AS
$$
/* Inline Java could be placed here that utilizes functions from the JAR. */
$$;
-- Call the function:
SELECT hello();==> Hello, you!
Chains of tasks in directed graphs are an excellent way to handle complex processes in Snowflake environments, with some caveats.
CREATE TASK SYNC1
WAREHOUSE = WH1
SCHEDULE = '30 MINUTE'
--> OR a crontab expression like this:
SCHEDULE = 'USING CRON 0,15,30,45 * * * * America/Denver'
--> OR as a child task
AFTER PRE_SYNC1, PRE_SYNC2
-- For ROOT task:
-- A finalizer task can be passed which will always be run at the end of the DAG.
FINALIZE = WRAP_UP_TASK
-- Allows multiple instances of the DAG at once
ALLOW_OVERLAPPING_EXECUTION = FALSE | TRUE
-- Max runtime for the task
USER_TASK_TIMEOUT_MS = 3600000 -- one hour
AS
COPY INTO BIG_TABLE_1
FROM $SOME_STAGE;
-- Show task status (started/stopped)
SHOW TASKS;
DESCRIBE TASK <NAME>;
-- Start/Stop: Requires 'execute task' permission:
EXECUTE TASK SYNC1; -- Run task once
ALTER TASK SYNC1 RESUME; -- Start Task Schedule
ALTER TASK SYNC1 SUSPEND; -- Stop Task Schedule
SELECT SYSTEM$TASK_DEPENDENTS_ENABLE('<ROOT TASK NAME>'); -- Resume whole task graph
-- View Task Tree & Dependants
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_DEPENDENTS(task_name => '<NAME>'));
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_DEPENDENTS(task_name => '<NAME>', recursive => FALSE));Use the Transformation > Tasks dashboard to view runs.
-- Check task history (all schemas)
SELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
TASK_NAME => 'SYNC1', --> your task name
SCHEDULED_TIME_RANGE_START => DATEADD('DAY', -6, CURRENT_TIMESTAMP())
))
-- (filter by database/schema here)
ORDER BY SCHEDULED_TIME DESC LIMIT 10;
-- Other useful tables:
SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
SNOWFLAKE.ACCOUNT_USAGE.TASK_HISTORY
SNOWFLAKE.ACCOUNT_USAGE.TASK_VERSIONS
INFORMATION_SCHEMA.SERVERLESS_TASK_HISTORY
INFORMATION_SCHEMA.TASK_HISTORYIdentify and act on changed records in a table.
CREATE STREAM USER_STREAM ON TABLE USERS;
SELECT * FROM USER_STREAM;
-- ==> Consume a stream!
INSERT INTO HANDLED_TABLE
SELECT NAME, EMAIL FROM USER_STREAM
WHERE METADATA$ACTION = 'INSERT'
AND METADATA$ISUPDATE = 'FALSE';These have three additional columns:
Tasks can be configured to process the stream data:
CREATE TASK NEWUSERS1
WAREHOUSE = WH1
SCHEDULE = '5 MINUTE' -- can be a CRON expression
WHEN
SYSTEM$STREAM_HAS_DATA('USER_STREAM')
AS
INSERT INTO SEND_EMAIL(ID, NAME) SELECT ID, NAME FROM USER_STREAM;Topics:
Warehouses come in T-Shirt sizes (S, M, L, etc,) and a few varieties. Gen1, Gen2, Interactive, Snowpark-Optimized, and Adaptive warehouses each have particular properties, advantages, drawbacks, and cost.
Topics:
Four X-Small warehouses will consume the same credits as one Medium warehouse, but will handle concurrent requests better - but will not keep as many cached micro-partitions.
The following data was collected for a given computation:
| Warehouse Size | Compute Time | Credits/Hour | Credits Consumed |
|---|---|---|---|
| Small | 32m | 2 | 1.067 |
| Medium | 14m | 4 | 0.934 |
| Large | 7m | 8 | 0.934 |
| Extra Large | 5m | 16 | 1.334 |
| 2 Extra Large | 3m | 32 | 1.6 |
In this scenario, Medium and Large warehouse sizes are the sweet spot to run as they more than halve, or halve, the compute time. It is cheapest to run the query on those warehouses.
Identify, understand, and limit costs with monitoring.
Topics:
GRANTOWNERSHIP privileges on each object.SHOW GRANTS; --> See your current grants
SHOW GRANTS TO USER <USER_NAME>; --> Grants for USER
SHOW GRANTS TO ROLE <ROLE_NAME>; --> Grants for ROLE
-- Use GRANT to add privileges
GRANT USAGE ON WAREHOUSE <WAREHOUSE_NAME>; --> To your current role
GRANT USAGE ON DATABASE <DB_NAME> TO ROLE <NAME>;
GRANT USAGE ON SCHEMA <DB_NAME.SCHEMA_NAME> TO ROLE <NAME>;
GRANT SELECT ON TABLE <TABLE_NAME> TO ROLE <NAME>;
-- ^^ USAGE on the DB and SCHEMA is required to see this table.Secondary Roles can be aggregated to the user, but do not include
CREATE privileges - useful for viewing protected tables.
SELECT CURRENT_SECONDARY_ROLES(); --> See your secondary roles
USE SECONDARY ROLES <ROLE_NAME>; --> Use this role
USE SECONDARY ROLES NONE; --> Clear seondary rolesA user has DEFAULT_SECONDARY_ROLES. (verify)
docs.snowflake.com/en/user-guide/security-column-intro
docs.snowflake.com/en/user-guide/security-column-intro
Each account is created on specific provider, in a particular region, as a single Snowflake edition. An organization can manage one or more accounts for different departments, projects, or locations.
The GLOBALORGADMIN account can be used to create more accounts and
manage their lifecycle;
USE ROLE GLOBALORGADMIN;
CREATE ACCOUNT analytics4
ADMIN_NAME = admin7
REGION = aws_us_west_2
etc;
SHOW ACCOUNTS;The short-form of the organization and account is shown in your URL.
https://uslkjpw-sjl18827.snowflakecomputing.com/
------- --------
Org. Acct.This URL will prompt you to login, then redirect you to Snowsight1.
Topics:
Once enabled, allows querying and point-in-time restoration of data.
-- Create a table and set the retention period
CREATE TABLE BIRD_COUNT (num INTEGER, bird STRING)
DATA_RETENTION_TIME_IN_DAYS=90;
-- Query utilizing time travel features
SELECT * FROM BIRD_COUNT
-- A certain number of (seconds) ago?
AT(OFFSET => -60*7);
-- At a particular time
AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);
-- Before a particular query
BEFORE(STATEMENT => 'UUID (get with LAST_QUERY_ID())');
-- See tables in history
SHOW TABLES HISTORY;Fail-safe storage also keeps these past the retention period for 7 days for all subscription tiers, but is only accessible to Snowflake personnel.
A new read/write object is created without duplication of data.
-- Create a clone of the table.
CREATE OR REPLACE TABLE birdwatching
CLONE other_db.other_schema.birdwatching;
-- Clone from a particular time
CREATE OR REPLACE TABLE birdwatching_tuesday
CLONE other_db.other_schema.birdwatching
AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);
-- You can clone entire databases
CREATE OR REPLACE DATABASE birdwatching_db_tuesday
CLONE other_db AT(TIMESTAMP => '2026-06-23 13:02:56.387 -0700'::TIMESTAMP);A backup can have a legal hold at and above Business Critical tier.
Objects can be synchronized between accounts in the same organization
Extends the SQL language with AI-related functions like AI_COMPLETE,
AI_EXTRACT, and AI_TRANSCRIBE. A specific model can be provided as a
string in the input parameters, along with other typical parameters
like temperature. Cortex also enables the creation of chat and vector
search APIs from your data.
SELECT AI_COMPLETE('What is the airspeed velocity of an unladen swallow?');Cost considerations:
Snowflake has some proprietary built-in classical ML models - pipelines using those models that can extract data from a stage full of PDFs. Snowflake provides a fine-tuning interface to help improve the fields and output for your particular type of document.
SELECT invoice_reader!PREDICT(get_presigned_url('@stage/one.pdf'), 1);Topics:
Despite its benefits, Snowflake (like any large platform) has a number of strange edge cases that can cut and hurt you without foreknowledge.
Regarding characters like 0x00 and 0x01:
COPY INTO and
other functionsREPLACE_INVALID_CHARACTERS flag compromises data integrity when
attempting to perfectly replicate the data in Snowflake’s databasesSee Snowflake user-guide/semistructured-considerations#null-values
SQL “NULL” and JSON “null” are handled differently in Snowflake. Checking a value like so will fail and always return false:
-- OBJECT {"test": null}
IF(:OBJECT:test is NULL) THEN
-- This will never run
END IF;Instead, use the IS_NULL_VALUE function to check this.
IF(IS_NULL_VALUE(:OBJECT:test)) THEN
-- This will correctly trigger
END IF;Returning something in an anonymous nested block will terminate the parent and stop execution.
Study guide: learn.snowflake.com/en/certifications/snowpro-core-c03
You can download the study guide for this exam on the Snowflake COF-C02 Exam Guide page. This guide is updated frequently, so go and request your own copy if possible. It is a 100-question, 115-minute test. Snowpro Core is a prerequisite for advanced certifications.
You will be expected to have knowledge of:
The cole’s notes on each of the key topics are below.
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
Snowflake has tons of interesting documents in their resource library , including migration advice, using Snowflake as a backing database for agents, and much more.
Topics:
Snowsight is the Snowflake web interface . ↩︎ ↩︎ ↩︎
Columnar Storage is read-optimized, enabling quick seeking through rows without having to read over the entire content of each row, like a CSV or other row-oriented storage format. ↩︎
Retention period can be set on tables during table creation or
alteration with the DATA_RETENTION_TIME_IN_DAYS option. Snowflake
support can also access fail-safe storage for seven days past this
period. ↩︎
EC2 or equivalent cloud-based virtual machines. ↩︎
CBO is Cost-Based Optimization ↩︎
Apache Iceberg data format, see https://iceberg.apache.org/ ↩︎ ↩︎
ACID: Atomicity, Consistency, Isolation, Durability, see wikipedia: ACID ↩︎
Snowflake: Transactions: Resource Locking ↩︎
Snowpark is a multi-language framework for executing remote data operations within Snowflake warehouses, close to the data. ↩︎ ↩︎
Pages are organized by last modified.
Title: Snowflake
Word Count: 7016 words
Reading Time: 33 minutes
Permalink:
→
https://manuals.ryanfleck.ca/snowflake/