~17m skim, 3,517 words, updated Dec 17, 2025
A modern data engineering platform.
Snowflake is a cloud native database that provides a wealth of analytical and data mining features for processing, integrating, and presenting data. Data platform could be used to describe Snowflake, as it offers features traditionally found in data warehouses, lakes, and streaming-processing platforms like Kafka.
The shortcomings of traditional data analytics environments have been addressed with Snowflake’s ease of storage, retrieval, and analysis of large quantities of client data.
Topics:
Topics:
Typical distributed architectures like shared-disk or shared-nothing keep independent copies of data locally, which are synchronized, or kept at a single point for shared-disk. Shared-disk has the downside of a fragile single point of failure, where shared-noting is expensive to keep synchronized and easy to over-provision. Snowflake takes a different approach by segregating the system into layers, called “Multi-cluster Shared Data Architecture”:
This separation allows each layer to scale entirely independently.
Snowflake data is stored in a column-oriented, partitioned, encrypted format highly optimized for the blob storage it is written to.
By default, strong AES-256 encryption is applied to data written to the backing blob storage. Snowflake inherits the durability and availability guarantees provided by their backing services - in the case of Snowflake’s proprietary columnar storage format2, AWS S3 blob storage.
Snowflake divides written files into micro-partitions so only columns that must be read or written are loaded during a query.
Table data is billed at a flat rate per month, and only accessible via Snowflake queries.
Virtual Warehouses in the query processing layer cache table data required for queries locally, while leaving the majority of data in storage. Queries are executed in these warehouses, which are EC23 instances provisioned by Snowflake in an ephemeral manner.
Small - 6XL warehouse sizes (t-shirt sizes) are available.
Even with many warehouses operating on the data, Snowflake uses an ACID compliant global layer (the transaction manager)to ensure the data from each transaction is immediately available to all warehouses.
Highly available system management services common to all Snowflake users, responsible for optimizing queries, scaling and managing infrastructure, metadata caching, authentication, and security.
Snowflake is a global multi-tenancy service.
Topics:
On demand and capacity models are available. Capacity rewards upfront payment.
Costs:
COPY INTO)Topics:
A variety of methods exist to interact with Snowflake’s platform.
Snowsight is the web interface provided by Snowflake. It is continuously improved.
Snowflake Copilot is an in-browser tool to generate SQL code with the added context of your Snowflake environment - tables, schemas, and other queries.
Streamlit is a Python web app framework for quickly deploying data-centric dashboards, chats, and visualizations. Permissions can be managed via Snowflake’s built-in access control model (like permissions for a table or view) to particular roles.
Snowflake Drivers/Connectors refer to programmatic APIs to interact with Snowflake from your favourite programming language. The connector for python enables all typical operations, in addition to reading and writing pandas dataframes . Cursors can be used to connect and execute SQL statements.
Snowflake CLI can be installed to connect to Snowflake via the command line. The legacy client, snowsql , is now out of date.
Partner Tools enable connection to your account via SSO to read and analyze your data. BI, data integration, security, and governance are common use cases.
Snowpark refers to programmatic APIs to run heavy data manipulation within Snowflake warehouses, leaving the data within Snowflake during processing.
See the
Snowpark Developer Guide for Python
. I typically add a
configuration file in .snowflake/config.toml with the following
content:
default_connection_name = "my_main_account"
[connections.my_main_account]
account = "myaccount"
user = "jdoe"
password = "******"
warehouse = "my-wh"
database = "my_db"
schema = "my_schema"
[cli.logs]
save_logs = true
level = "info"
path = "/home/<you>/.snowflake/logs"Topics:
Objects in Snowflake allow nearly all aspects of the data platform to be configured with unique access and usage restrictions, from the Organization level down to tables and views.
Account Objects:
Schema Objects:
To work with objects within a schema you must set the context for the following operations with this set of commands:
USE ROLE <role>;
USE WAREHOUSE <warehouse>;
USE DATABASE <database>;
USE SCHEMA <schema>;The name of a database, schema, or table must be unique and start with
‘A-Z’, and are not case sensitive unless encased in double quotes.
Special characters can only be used within quotes.
Data Definition Language (DDL) commands are used to manipulate objects in Snowflake, including setting parameters on account and session objects.
Generally these are available:
To get the definition for an object, use the following query:
SELECT get_ddl('<type>', '<NAME>');
-- For example:
SELECT get_ddl('table', 'CUSTOMERS');By default, Snowflake will use all your permissions (SECONDARY_ROLES = ALL) to provide the DDL, revealing all secure features.
A database is associated with one account. See docs .
CREATE DATABASE MY_DATABASE;
-- Cloned
CREATE DATABASE CLONED_DB CLONE MY_DATABASE;
-- Replica
CREATE DATABASE REPLICA_DB AS REPLICA OF MY_DATABASE
DATA_RETENTION_TIME_IN_DAYS = 3;
-- From share object provided by external account
CREATE DATABASE SHARED_DB FROM SHARE S9DF89.SHARE;
SHOW DATABASES;A schema is associated with one database. See docs .
USE DATABASE MY_DATABASE;
CREATE SCHEMA MY_SCHEMA;
-- Cloned
CREATE SCHEMA CLONED_SCM CLONE MY_DATABASE.MY_SCHEMA;
SHOW SCHEMAS;
SHOW SCHEMAS LIKE 'TPCH%';A table is associated with one schema. See docs .
To see if a table is external, and its properties, you can use SHOW:
SHOW TABLES;
SHOW TABLES LIKE '<table name>';SHOW VIEWS;Topics:
Data Manipulation Language (DML) refers to the normal SQL methods of CRUDlike updates to data, with some special Snowflake nuances.
Topics:
Snowflake SQL has support for procedural logic and error handling using a built-in SQL extension called Snowflake scripting . You can declare variables and cursors, use control flow logic, handle exceptions, and update tables.
DECLARE
-- Variables, cursors.
BEGIN
-- SQL statements
EXCEPTION
-- Error handling
END;This example from the use cases demonstrates variable assignment and a for loop:
-- docs.snowflake.com/en/developer-guide/snowflake-scripting/use-cases
CREATE OR REPLACE PROCEDURE apply_bonus(bonus_percentage INT, performance_value INT)
RETURNS TEXT
LANGUAGE SQL
AS
DECLARE
-- Use input to calculate the bonus percentage
updated_bonus_percentage NUMBER(2,2) DEFAULT (:bonus_percentage/100);
-- Declare a result set
rs RESULTSET;
BEGIN
-- Assign a query to the result set and execute the query
rs := (SELECT * FROM bonuses);
-- Use a FOR loop to iterate over the records in the result set
FOR record IN rs DO
-- Assign variable values using values in the current record
LET emp_id_value INT := record.emp_id;
LET performance_rating_value INT := record.performance_rating;
LET salary_value NUMBER(12, 2) := record.salary;
-- Determine whether the performance rating in the record matches the user input
IF (performance_rating_value = :performance_value) THEN
-- If the condition is met, update the bonuses table using the calculated bonus percentage
UPDATE bonuses SET bonus = ( :salary_value * :updated_bonus_percentage )
WHERE emp_id = :emp_id_value;
END IF;
END FOR;
-- Return text when the stored procedure completes
RETURN 'Update applied';
END;Topics:
Stored procedures are named collections of SQL statements. In Snowflake, these can be created with Snowflake Scripting (SQL) but also JavaScript and via Snowpark4.
SPs cannot be called in SQL statements, but can make use of the Snowpark API. The primary goal is to cause side effects in the system.
There exists a
JavaScript Stored Procedures API
that provides a
snowflake object for use within stored procedures written with JS,
enabling branching, looping, error handling, and the dynamic creation
of SQL statements.
UDFs are custom functions that can be written in SQL, JS, Python, Java, or Scala. They can accept parameters and return scalar or tabular results, can be called from SQL statements, and can be overloaded. Data is converted to supported types as it is passed to the functions.
UDFs can be called as part of a SQL statement, returning values for use, but cannot utilize the Snowpark API or libraries. The primary goal is complex data processing.
CREATE FUNCTION JS_SQUARE_ROOT(D double)
RETURNS DOUBLE
LANGUAGE JAVASCRIPT
AS
$$
return(Math.sqrt(D));
$$;
-- Call the function
SELECT js_square_root(2);
-- ==> 1.414213562
-- Drop the function
DROP FUNCTION JS_SQUARE_ROOT(DOUBLE);See Snowflake Docs: Java UDFs .
Java UDFs take HANDLER and TARGET_PATH parameters - allowing you to
optionally provide a JAR file with classes and functions to use.
CREATE OR REPLACE FUNCTION inline_hello(name STRING)
RETURNS STRING
LANGUAGE JAVA
AS
$$
if (name == null) {
return "Hello, you! Inline function called!";
}
return "Hello, " + name + "! Inline function called!";
$$;
-- Call the function:
SELECT inline_hello('Ryan');Or as a pre-compiled JAR:
package com.example;
public class HelloUDF {
public static String hello(String name) {
if (name == null) {
return "Hello, you!";
}
return "Hello, " + name + "!";
}
}Compile to a JAR file:
javac -d . HelloUDF.java
jar cf hello-udf.jar com/example/HelloUDF.class
# => hello-udf.jar createdIn snowflake, refer to this function in the JAR file:
CREATE OR REPLACE STAGE my_stage;
PUT file://hello-udf.jar @my_stage auto_compress=false;
-- Define the function
CREATE OR REPLACE FUNCTION hello(name STRING)
RETURNS STRING
LANGUAGE JAVA
IMPORTS = ('@my_stage/hello-udf.jar')
HANDLER = 'com.example.HelloUDF.hello'
AS
$$
/* Inline Java could be placed here that utilizes functions from the JAR. */
$$;
-- Call the function:
SELECT hello();==> Hello, you!
An EXTERNAL function is a lambda or web service behind a proxy.
CREATE OR REPLACE EXTERNAL FUNCTION blorgon_process(str_input varchar)
RETURNS variant
API_INTEGRATION = blorgo9t_08 -- API integration object
AS 'https://blorgo9t.execute-api.us-west-2.amazonaws.com/prod/blorg'; -- Proxy URLTopics:
Topics:
Topics:
Each account is created on specific provider, in a particular region, as a single Snowflake edition. An organization can manage one or more accounts for different departments, projects, or locations.
The GLOBALORGADMIN account can be used to create more accounts and
manage their lifecycle;
USE ROLE GLOBALORGADMIN;
CREATE ACCOUNT analytics4
ADMIN_NAME = admin7
REGION = aws_us_west_2
etc;
SHOW ACCOUNTS;The short-form of the organization and account is shown in your URL.
https://uslkjpw-sjl18827.snowflakecomputing.com/
------- --------
Org. Acct.This URL will prompt you to login, then redirect you to Snowsight1.
Topics:
CREATE TASK SYNC1
WAREHOUSE = WH1
SCHEDULE = '30 MINUTE' -- can be a CRON expression
AS
COPY INTO BIG_TABLE_1
FROM $SOME_STAGE;
-- Start Task Schedule:
-- Requires 'execute task' permission
ALTER TASK SYNC1 RESUME;
-- Stop Task Schedule:
ALTER TASK SYNC1 SUSPEND;CREATE STREAM USER_STREAM ON TABLE USERS;
SELECT * FROM USER_STREAM;These have three additional columns:
Tasks can be configured to process the stream data:
CREATE TASK NEWUSERS1
WAREHOUSE = WH1
SCHEDULE = '5 MINUTE' -- can be a CRON expression
WHEN
SYSTEM$STREAM_HAS_DATA('USER_STREAM')
AS
INSERT INTO SEND_EMAIL(ID, NAME) SELECT ID, NAME FROM USER_STREAM;Extends the SQL language with AI-related functions like AI_COMPLETE,
AI_EXTRACT, and AI_TRANSCRIBE. A specific model can be provided as a
string in the input parameters, along with other typical parameters
like temperature. Cortex also enables the creation of chat and vector
search APIs from your data.
SELECT AI_COMPLETE('What is the airspeed velocity of an unladen swallow?');Snowflake has some proprietary built-in classical ML models - pipelines using those models that can extract data from a stage full of PDFs. Snowflake provides a fine-tuning interface to help improve the fields and output for your particular type of document.
SELECT invoice_reader!PREDICT(get_presigned_url('@stage/one.pdf'), 1);Topics:
You can download the study guide for this exam on the Snowflake COF-C02 Exam Guide page. This guide is updated frequently, so go and request your own copy if possible. It is a 100-question, 115-minute test. You will be expected to have knowledge of (taken from the guide):
The cole’s notes on each of the key topics are below.
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
[GENAI] Key Concepts:
Snowflake has tons of interesting documents in their resource library , including migration advice, using Snowflake as a backing database for agents, and much more.
Topics:
Snowsight is the Snowflake web interface . ↩︎ ↩︎ ↩︎
Columnar Storage is read-optimized, enabling quick seeking through rows without having to read over the entire content of each row, like a CSV or other row-oriented storage format. ↩︎
EC2 or equivalent cloud-based virtual machines. ↩︎
Snowpark is a multi-language framework for executing remote data operations within Snowflake warehouses, close to the data. ↩︎ ↩︎
Pages are organized by last modified.
Title: Snowflake
Word Count: 3517 words
Reading Time: 17 minutes
Permalink:
→
https://manuals.ryanfleck.ca/snowflake/