Ryan's Manuals

Snowflake

~17m skim, 3,517 words, updated Dec 17, 2025

Top 

A modern data engineering platform.


Contents



Snowflake

Snowflake is a cloud native database that provides a wealth of analytical and data mining features for processing, integrating, and presenting data. Data platform could be used to describe Snowflake, as it offers features traditionally found in data warehouses, lakes, and streaming-processing platforms like Kafka.

Why Snowflake?

The shortcomings of traditional data analytics environments have been addressed with Snowflake’s ease of storage, retrieval, and analysis of large quantities of client data.

Topics:

Key Concepts & Architecture

Topics:

Multi-Cluster Shared Data Architecture

Typical distributed architectures like shared-disk or shared-nothing keep independent copies of data locally, which are synchronized, or kept at a single point for shared-disk. Shared-disk has the downside of a fragile single point of failure, where shared-noting is expensive to keep synchronized and easy to over-provision. Snowflake takes a different approach by segregating the system into layers, called “Multi-cluster Shared Data Architecture”:

  1. Data Storage
  2. Query Processing (Virtual Warehouses)
  3. Cloud Services

This separation allows each layer to scale entirely independently.

Data Storage Layer

Snowflake data is stored in a column-oriented, partitioned, encrypted format highly optimized for the blob storage it is written to.

By default, strong AES-256 encryption is applied to data written to the backing blob storage. Snowflake inherits the durability and availability guarantees provided by their backing services - in the case of Snowflake’s proprietary columnar storage format2, AWS S3 blob storage.

Snowflake divides written files into micro-partitions so only columns that must be read or written are loaded during a query.

Table data is billed at a flat rate per month, and only accessible via Snowflake queries.

Query Processing Layer

Virtual Warehouses in the query processing layer cache table data required for queries locally, while leaving the majority of data in storage. Queries are executed in these warehouses, which are EC23 instances provisioned by Snowflake in an ephemeral manner.

Small - 6XL warehouse sizes (t-shirt sizes) are available.

Even with many warehouses operating on the data, Snowflake uses an ACID compliant global layer (the transaction manager)to ensure the data from each transaction is immediately available to all warehouses.

Global Services Layer

Highly available system management services common to all Snowflake users, responsible for optimizing queries, scaling and managing infrastructure, metadata caching, authentication, and security.

Snowflake is a global multi-tenancy service.

Editions & Pricing

Topics:

Editions

  1. Standard
  2. Enterprise adds database failover, multi-cluster warehouses, and additional data protection and encryption features.
  3. Business Critical
  4. Virtual Private is isolated from the global Snowflake program.

Billing

On demand and capacity models are available. Capacity rewards upfront payment.

Costs:

  1. Snowflake Credits :
  2. Dollars & Cents:

Integration and Connectors

Topics:

A variety of methods exist to interact with Snowflake’s platform.

Snowsight

Snowsight is the web interface provided by Snowflake. It is continuously improved.

Snowflake Copilot is an in-browser tool to generate SQL code with the added context of your Snowflake environment - tables, schemas, and other queries.

Streamlit Apps

Streamlit is a Python web app framework for quickly deploying data-centric dashboards, chats, and visualizations. Permissions can be managed via Snowflake’s built-in access control model (like permissions for a table or view) to particular roles.

==> https://streamlit.io/

Snowflake Drivers & Connectors

Snowflake Drivers/Connectors refer to programmatic APIs to interact with Snowflake from your favourite programming language. The connector for python enables all typical operations, in addition to reading and writing pandas dataframes . Cursors can be used to connect and execute SQL statements.

Snowflake CLI

Snowflake CLI can be installed to connect to Snowflake via the command line. The legacy client, snowsql , is now out of date.

Partner Tools

Partner Tools enable connection to your account via SSO to read and analyze your data. BI, data integration, security, and governance are common use cases.

Snowpark

Snowpark refers to programmatic APIs to run heavy data manipulation within Snowflake warehouses, leaving the data within Snowflake during processing.

See the Snowpark Developer Guide for Python . I typically add a configuration file in .snowflake/config.toml with the following content:

toml
default_connection_name = "my_main_account"

[connections.my_main_account]
account = "myaccount"
user = "jdoe"
password = "******"
warehouse = "my-wh"
database = "my_db"
schema = "my_schema"

[cli.logs]
save_logs = true
level = "info"
path = "/home/<you>/.snowflake/logs"

Snowflake Objects & DDL Commands

Topics:

Objects in Snowflake allow nearly all aspects of the data platform to be configured with unique access and usage restrictions, from the Organization level down to tables and views.

Account Objects:

Schema Objects:

To work with objects within a schema you must set the context for the following operations with this set of commands:

sql
USE ROLE <role>;
USE WAREHOUSE <warehouse>;
USE DATABASE <database>;
USE SCHEMA <schema>;

Object Naming Rules

The name of a database, schema, or table must be unique and start with ‘A-Z’, and are not case sensitive unless encased in double quotes. Special characters can only be used within quotes.

General DDL Commands

Data Definition Language (DDL) commands are used to manipulate objects in Snowflake, including setting parameters on account and session objects.

Generally these are available:

To get the definition for an object, use the following query:

sql
SELECT get_ddl('<type>', '<NAME>');

-- For example:
SELECT get_ddl('table', 'CUSTOMERS');

By default, Snowflake will use all your permissions (SECONDARY_ROLES = ALL) to provide the DDL, revealing all secure features.

Databases

A database is associated with one account. See docs .

sql
CREATE DATABASE MY_DATABASE;

-- Cloned
CREATE DATABASE CLONED_DB CLONE MY_DATABASE;

-- Replica
CREATE DATABASE REPLICA_DB AS REPLICA OF MY_DATABASE
  DATA_RETENTION_TIME_IN_DAYS = 3;

-- From share object provided by external account
CREATE DATABASE SHARED_DB FROM SHARE S9DF89.SHARE;

SHOW DATABASES;

Schemas

A schema is associated with one database. See docs .

sql
USE DATABASE MY_DATABASE;
CREATE SCHEMA MY_SCHEMA;

-- Cloned
CREATE SCHEMA CLONED_SCM CLONE MY_DATABASE.MY_SCHEMA;

SHOW SCHEMAS;
SHOW SCHEMAS LIKE 'TPCH%';

Tables

A table is associated with one schema. See docs .

To see if a table is external, and its properties, you can use SHOW:

sql
SHOW TABLES;
SHOW TABLES LIKE '<table name>';

Views

sql
SHOW VIEWS;

Data Loading & Unloading

Topics:

Querying & Data Manipulation Language

Data Manipulation Language (DML) refers to the normal SQL methods of CRUDlike updates to data, with some special Snowflake nuances.

Topics:

DECLARE & Snowflake Scripting

Snowflake SQL has support for procedural logic and error handling using a built-in SQL extension called Snowflake scripting . You can declare variables and cursors, use control flow logic, handle exceptions, and update tables.

sql
DECLARE
  -- Variables, cursors.
BEGIN
  -- SQL statements
EXCEPTION
  -- Error handling
END;

This example from the use cases demonstrates variable assignment and a for loop:

sql
-- docs.snowflake.com/en/developer-guide/snowflake-scripting/use-cases

CREATE OR REPLACE PROCEDURE apply_bonus(bonus_percentage INT, performance_value INT)
  RETURNS TEXT
  LANGUAGE SQL
AS

DECLARE
  -- Use input to calculate the bonus percentage
  updated_bonus_percentage NUMBER(2,2) DEFAULT (:bonus_percentage/100);
  --  Declare a result set
  rs RESULTSET;

BEGIN
  -- Assign a query to the result set and execute the query
  rs := (SELECT * FROM bonuses);
  -- Use a FOR loop to iterate over the records in the result set
  FOR record IN rs DO
    -- Assign variable values using values in the current record
    LET emp_id_value INT := record.emp_id;
    LET performance_rating_value INT := record.performance_rating;
    LET salary_value NUMBER(12, 2) := record.salary;
    -- Determine whether the performance rating in the record matches the user input
    IF (performance_rating_value = :performance_value) THEN
      -- If the condition is met, update the bonuses table using the calculated bonus percentage
      UPDATE bonuses SET bonus = ( :salary_value * :updated_bonus_percentage )
        WHERE emp_id = :emp_id_value;
    END IF;
  END FOR;
  -- Return text when the stored procedure completes
  RETURN 'Update applied';
END;

Functions & Procedures

Topics:

Stored Procedures (SPs)

Stored procedures are named collections of SQL statements. In Snowflake, these can be created with Snowflake Scripting (SQL) but also JavaScript and via Snowpark4.

SPs cannot be called in SQL statements, but can make use of the Snowpark API. The primary goal is to cause side effects in the system.

JavaScript Stored Procedures

There exists a JavaScript Stored Procedures API that provides a snowflake object for use within stored procedures written with JS, enabling branching, looping, error handling, and the dynamic creation of SQL statements.

User Defined Functions (UDFs)

UDFs are custom functions that can be written in SQL, JS, Python, Java, or Scala. They can accept parameters and return scalar or tabular results, can be called from SQL statements, and can be overloaded. Data is converted to supported types as it is passed to the functions.

UDFs can be called as part of a SQL statement, returning values for use, but cannot utilize the Snowpark API or libraries. The primary goal is complex data processing.

sql
CREATE FUNCTION JS_SQUARE_ROOT(D double)
  RETURNS DOUBLE
  LANGUAGE JAVASCRIPT
  AS
  $$
  return(Math.sqrt(D));
  $$;

-- Call the function
SELECT js_square_root(2);

-- ==> 1.414213562

-- Drop the function
DROP FUNCTION JS_SQUARE_ROOT(DOUBLE);

Java Functions

See Snowflake Docs: Java UDFs .

Java UDFs take HANDLER and TARGET_PATH parameters - allowing you to optionally provide a JAR file with classes and functions to use.

sql
CREATE OR REPLACE FUNCTION inline_hello(name STRING)
RETURNS STRING
LANGUAGE JAVA
AS
$$
    if (name == null) {
        return "Hello, you! Inline function called!";
    }
    return "Hello, " + name + "! Inline function called!";
$$;

-- Call the function:
SELECT inline_hello('Ryan');

Or as a pre-compiled JAR:

java
package com.example;

public class HelloUDF {
    public static String hello(String name) {
        if (name == null) {
            return "Hello, you!";
        }
        return "Hello, " + name + "!";
    }
}

Compile to a JAR file:

bash
javac -d . HelloUDF.java
jar cf hello-udf.jar com/example/HelloUDF.class

# => hello-udf.jar created

In snowflake, refer to this function in the JAR file:

sql
CREATE OR REPLACE STAGE my_stage;
PUT file://hello-udf.jar @my_stage auto_compress=false;

-- Define the function
CREATE OR REPLACE FUNCTION hello(name STRING)
  RETURNS STRING
  LANGUAGE JAVA
  IMPORTS = ('@my_stage/hello-udf.jar')
  HANDLER = 'com.example.HelloUDF.hello'
  AS
  $$
    /* Inline Java could be placed here that utilizes functions from the JAR. */
  $$;

-- Call the function:
SELECT hello();

==> Hello, you!

External Functions

An EXTERNAL function is a lambda or web service behind a proxy.

sql
CREATE OR REPLACE EXTERNAL FUNCTION blorgon_process(str_input varchar)
  RETURNS variant
  API_INTEGRATION = blorgo9t_08 -- API integration object
  AS 'https://blorgo9t.execute-api.us-west-2.amazonaws.com/prod/blorg'; -- Proxy URL

Warehouses & Compute

Topics:

Performance & Cost Optimization

Topics:

Security & Access Control

Topics:

Each account is created on specific provider, in a particular region, as a single Snowflake edition. An organization can manage one or more accounts for different departments, projects, or locations.

The GLOBALORGADMIN account can be used to create more accounts and manage their lifecycle;

sql
USE ROLE GLOBALORGADMIN;

CREATE ACCOUNT analytics4
  ADMIN_NAME = admin7
  REGION = aws_us_west_2
  etc;

SHOW ACCOUNTS;

The short-form of the organization and account is shown in your URL.

https://uslkjpw-sjl18827.snowflakecomputing.com/
        ------- --------
          Org.    Acct.

This URL will prompt you to login, then redirect you to Snowsight1.

Advanced Features

Topics:

Tasks

sql
CREATE TASK SYNC1
  WAREHOUSE = WH1
  SCHEDULE = '30 MINUTE' -- can be a CRON expression
AS
  COPY INTO BIG_TABLE_1
  FROM $SOME_STAGE;

-- Start Task Schedule:
-- Requires 'execute task' permission
ALTER TASK SYNC1 RESUME;

-- Stop Task Schedule:
ALTER TASK SYNC1 SUSPEND;

Task Graphs (DAGs)

Streams

sql
CREATE STREAM USER_STREAM ON TABLE USERS;
SELECT * FROM USER_STREAM;

These have three additional columns:

Tasks can be configured to process the stream data:

sql
CREATE TASK NEWUSERS1
  WAREHOUSE = WH1
  SCHEDULE = '5 MINUTE' -- can be a CRON expression
WHEN
  SYSTEM$STREAM_HAS_DATA('USER_STREAM')
AS
  INSERT INTO SEND_EMAIL(ID, NAME) SELECT ID, NAME FROM USER_STREAM;

Snowflake Cortex (AISQL)

Extends the SQL language with AI-related functions like AI_COMPLETE, AI_EXTRACT, and AI_TRANSCRIBE. A specific model can be provided as a string in the input parameters, along with other typical parameters like temperature. Cortex also enables the creation of chat and vector search APIs from your data.

sql
SELECT AI_COMPLETE('What is the airspeed velocity of an unladen swallow?');

Snowflake ML & Document AI

Snowflake has some proprietary built-in classical ML models - pipelines using those models that can extract data from a stage full of PDFs. Snowflake provides a fine-tuning interface to help improve the fields and output for your particular type of document.

sql
SELECT invoice_reader!PREDICT(get_presigned_url('@stage/one.pdf'), 1);

Tips, Best Practices & Gotchas

Topics:

COF-C02 - Snowpro Core Certification

You can download the study guide for this exam on the Snowflake COF-C02 Exam Guide page. This guide is updated frequently, so go and request your own copy if possible. It is a 100-question, 115-minute test. You will be expected to have knowledge of (taken from the guide):

  1. Data loading and transformation in Snowflake
  2. Virtual Warehouses - best practices, performance, concurrency
  3. DDL and DML queries
  4. Working with semi-structured and unstructured data
  5. Cloning and time travel
  6. Data sharing
  7. Account structure and management

The cole’s notes on each of the key topics are below.

(24%) Snowflake AI Data Cloud - Features and Architecture

[GENAI] Key Concepts:

(18%) Data Transformations

[GENAI] Key Concepts:

(18%) Accounts: Access & Security

[GENAI] Key Concepts:

(16%) Performance & Cost Optimization

[GENAI] Key Concepts:

(12%) Data Loading & Unloading

[GENAI] Key Concepts:

(12%) Data Protection and Sharing

[GENAI] Key Concepts:

References & Further Reading

Snowflake has tons of interesting documents in their resource library , including migration advice, using Snowflake as a backing database for agents, and much more.

Topics:


  1. Snowsight is the Snowflake web interface ↩︎ ↩︎ ↩︎

  2. Columnar Storage is read-optimized, enabling quick seeking through rows without having to read over the entire content of each row, like a CSV or other row-oriented storage format. ↩︎

  3. EC2 or equivalent cloud-based virtual machines. ↩︎

  4. Snowpark is a multi-language framework for executing remote data operations within Snowflake warehouses, close to the data. ↩︎ ↩︎



Site Directory

Pages are organized by last modified.



Page Information

Title: Snowflake
Word Count: 3517 words
Reading Time: 17 minutes
Permalink:
https://manuals.ryanfleck.ca/snowflake/