Data Structuring DIM and Fact Tables

In this blog, I’ll walk through how I took a raw dataset on AI’s impact on student life, broke it into a star schema, and connected it to a user story.

I began with a flat Excel dataset that looked something like this:

Student_ID
Age
Major
Primary_AI_Tool
Task_Frequency_Daily
Main_Usage_Case
GPA_Baseline
GPA_Post_AI
Time_Saved_Hours_Weekly
AI_Ethics_Concern
Career_Confidence_Score

At first glance, this looks analysis-ready, but it’s actually not ideal for scalable analysis.

The problem with flat data:

Repeated values (e.g., Major, AI Tool)
Harder to maintain consistency
Not optimized for tools like Tableau
Limited flexibility for future questions

Step 1: Defining the User Story

To make this feel real, I framed it around a stakeholder:

As a university program director, I want to understand how students are using AI tools and how that impacts their academic performance, time efficiency, and career confidence, so that I can guide policies, resources, and training around responsible and effective AI usage.

Once you define the user story, everything else becomes clearer:

What tables you need
What metrics matter
What relationships are important

Step 2: Breaking the Data into a Star Schema

I started by splitting the dataset into a fact table and multiple dimension tables.

Fact Table: Fact_Student_AI_Usage

This is the center of everything, where the measurable data lives.

Granularity: One row per student (future: per student per semester)

Columns include:

Student_ID
AI_Tool_ID
Usage_ID
Perception_ID
Major_ID
GPA_Baseline
GPA_Post_AI
Time_Saved_Hours_Weekly
Task_Frequency_Daily

This is where all the metrics sit.

Dimension Tables

These store descriptive attributes and help avoid duplication.

1. Dim_AI_Tool

AI_Tool_ID (HASH key)
Primary_AI_Tool (e.g., ChatGPT, Claude, Gemini)

2. Dim_Usage

Usage_ID (HASH key)
Main_Usage_Case (e.g., Essay Writing, Debugging)

3. Dim_Major

Major_ID (HASH key)
Major (e.g., Biology, Data Science)

4. Dim_Perception

Perception_ID (HASH key)
AI_Ethics_Concern (Low, Medium, High)

Each dimension table connects back to the fact table using IDs:

Dim_AI_Tool : AI_Tool_ID

Dim_Usage: Usage_ID

Dim_Major: Major_ID

Dim_Perception: Perception_ID

Thinking Ahead (Future State)

One thing I started thinking about (and would definitely implement next):

Tracking this data over time (semester-by-semester)

That would change the granularity to:

Student + Semester

And allow analysis like:

How GPA changes over time with AI usage
Whether students switch majors
How behavior evolves semester to semester

This is where adding a Date/Semester dimension becomes powerful.

Final Thoughts

The biggest takeaway for me:

Taking the time to:

Create a proper user story
Break data into dimensions
Define a fact table

Making everything downstream (User Story, SQL, Tableau, dashboards) way easier and more meaningful.

Author:

Vivek Patel

View Profile