What Are Streams?

A stream is an object in snowflake that records DML (data manipulation language ) changes, the inserts, updates, and deletes made to a source table

Instead of storing the data itself, a stream stores metadata about changes so downstream processes can work with just the changes

When a stream is created It takes a logical snapshot of the source table at that moment in time, and after each change it stores an offset, meaning the system remembers the last change it processed, so it knows where to continue next time without missing or reprocessing anything. The system tracks DML activity relative to this offset.

Using streams provides several key benefits:

it doesn't require any triggers, or external tools to perform Change Data Capture
Instead of reprocessing entire tables, pipelines can operate only on data that changed. This reduces compute cost and accelerates processing.
Streams can trigger tasks downstream, automating you pipeline without the need to create schedule times

One thing to note:

When you run a DML statement using the stream (like INSERT INTO ... SELECT FROM stream, or MERGE INTO ... USING stream) the stream marks those changes as processed by updating its offset. This means the next time you read from the stream, those same changes won’t appear again. The underlying table isn’t deleted, only the stream’s “bookmark” moves forward.

Types of streams:

Standard Streams

Capture inserts, updates, and deletes
Provide the net outcome of all changes
Work on tables, directory tables, and views
Great for general-purpose CDC.

Append-Only Streams

Capture only inserts
Available on tables, directory tables, and views
Ideal for log-style data where updates/deletes aren’t meaningful.

Insert-Only Streams (External Tables Only)

Capture inserts
Designed specifically for external table environments (e.g., object storage)
Useful when working with data lakes.

Another benefit of streams is that they allow our pipelines to be self-adaptive when dealing with semi-structured data that changes over time.

Example Scenario

Initially, the json files you recieve look like this:

{ "device_id": "thermo123", "temperature": 72, "humidity": 45 }

Later, a new row comes into the data that looks like this

{ "device_id": "thermo123", "temperature": 73, "humidity": 44, "battery_level": 85 }

The stream captures this as a new change, including the new battery_level field.

in a traditional pipeline that simply truncates and inserts into the target table, new columns like battery_level might be missed, but by leveraging the stream, we can make our table self-adapting using Dynamic Column Detection:

Read new changes from the stream

Only incremental changes are processed, including new fields.

Compare incoming fields to the target schema

Detect any new columns that aren’t in the processed table yet.

Automatically alter the table or write to a flexible staging table

For new columns, run ALTER TABLE processed_table ADD COLUMN ..., or temporarily store data in a VARIANT/JSON column for flexible processing.

Merge the data into the processed table

Insert or merge the updated rows safely, ensuring the new fields are preserved.

With this approach, the pipeline adapts automatically to schema evolution. As devices or data sources introduce new fields, the tables and downstream analytics update without manual intervention. Streams make this possible because they deliver only the new or updated rows along with their structure, providing the foundation for real-time self-adapting pipelines.

Author:

Otto Richardson

View Profile