Data Structuring DIM and Fact Tables

In this blog, I’ll walk through how I took a raw dataset on AI’s impact on student life, broke it into a star schema, and connected it to a user story.

I began with a flat Excel dataset that looked something like this:

  • Student_ID
  • Age
  • Major
  • Primary_AI_Tool
  • Task_Frequency_Daily
  • Main_Usage_Case
  • GPA_Baseline
  • GPA_Post_AI
  • Time_Saved_Hours_Weekly
  • AI_Ethics_Concern
  • Career_Confidence_Score

At first glance, this looks analysis-ready, but it’s actually not ideal for scalable analysis.

The problem with flat data:

  • Repeated values (e.g., Major, AI Tool)
  • Harder to maintain consistency
  • Not optimized for tools like Tableau
  • Limited flexibility for future questions

Step 1: Defining the User Story

To make this feel real, I framed it around a stakeholder:

As a university program director, I want to understand how students are using AI tools and how that impacts their academic performance, time efficiency, and career confidence, so that I can guide policies, resources, and training around responsible and effective AI usage.

Once you define the user story, everything else becomes clearer:

  • What tables you need
  • What metrics matter
  • What relationships are important

Step 2: Breaking the Data into a Star Schema

I started by splitting the dataset into a fact table and multiple dimension tables.

Fact Table: Fact_Student_AI_Usage

This is the center of everything, where the measurable data lives.

Granularity: One row per student (future: per student per semester)

Columns include:

  • Student_ID
  • AI_Tool_ID
  • Usage_ID
  • Perception_ID
  • Major_ID
  • GPA_Baseline
  • GPA_Post_AI
  • Time_Saved_Hours_Weekly
  • Task_Frequency_Daily

This is where all the metrics sit.

Dimension Tables

These store descriptive attributes and help avoid duplication.

1. Dim_AI_Tool

  • AI_Tool_ID (HASH key)
  • Primary_AI_Tool (e.g., ChatGPT, Claude, Gemini)

2. Dim_Usage

  • Usage_ID (HASH key)
  • Main_Usage_Case (e.g., Essay Writing, Debugging)

3. Dim_Major

  • Major_ID (HASH key)
  • Major (e.g., Biology, Data Science)

4. Dim_Perception

  • Perception_ID (HASH key)
  • AI_Ethics_Concern (Low, Medium, High)

Each dimension table connects back to the fact table using IDs:

Dim_AI_Tool : AI_Tool_ID

Dim_Usage: Usage_ID

Dim_Major: Major_ID

Dim_Perception: Perception_ID

Thinking Ahead (Future State)

One thing I started thinking about (and would definitely implement next):

Tracking this data over time (semester-by-semester)

That would change the granularity to:

  • Student + Semester

And allow analysis like:

  • How GPA changes over time with AI usage
  • Whether students switch majors
  • How behavior evolves semester to semester

This is where adding a Date/Semester dimension becomes powerful.

Final Thoughts

The biggest takeaway for me:

Taking the time to:

  • Create a proper user story
  • Break data into dimensions
  • Define a fact table

Making everything downstream (User Story, SQL, Tableau, dashboards) way easier and more meaningful.

Author:
Vivek Patel
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2026 The Information Lab