Keeping a tidy DBT project

Starting your first DBT project can feel like a daunting task. In order for the process to feel smoother, and allow for easy transition to other users if needed, we need to ensure we have a standardised way to ensure our project is kept tidy and easy to follow, so that we can use our time to delve into and understand the logic, rather than waste hours wondering what that model name really means.

When creating a DBT project, for the majority of the time you will be creating models that transform and clean tables to the desired state. Models are .sql files and they live in the models folder of your project.

Modularity is the degree to which system components can be separated and recombined. This allows for flexible work, where specific models can be reused in multiple different contexts, preventing unnecessary repetition of the same SQL statements. An example of this would be to create two separate models to clean up an orders and customers table. We can then build a model that refers to the clean orders/customers table to build a final customers dimension table.

Naming conventions:

In order to make your project easy to understand for any user that may have to pick up your work, there are some standardised naming conventions that are typically used. I have described those naming conventions below.

Sources: (abb src) this refers to the raw data in the table that has been built in the warehouse.

Staging: (abb stg) this refers to models build directly from sources. They have a one to one relationship with the sources table. Usually used for simple transformations, to clean and standardise the data before further more complex transformation.

Intermediate: (abb int) this refers to any models inbetween final fact and dimension tables. These are built on staging models instead of on source tables. This is to help leverage data cleaning done in the staging model.

Fact: (abb fct) refers to data that represents something that has or will happen. For example transactions, orders, votes etc.

Dimension: (abb dim) refers to data that represents physical things such as people, employees, store locations etc.

When using these naming conventions, they would typically follow this format: abb_tablename.sql. For example if I had a fact table of orders I would name it fct_orders.sql.

Folders: When creating a project a set of folders will automatically be produced. We can use these folders to leverage our project organisation. When using DBT run you are then able to run on certain folders only. For example dbt run -s staging will run all models that are inside the models/staging folder.

A good starting point for organisation would be to have a marts and a staging folder. The marts folder can contain all intermediate, fact and dimension models. Utilising subfolders to further separate data by business function will help organise the marts folder even further. The staging folder can be used to hold all staging models and source configurations. Once again the implementation of further subfolders to divide the models will improve organisations, in this case it would be preferable to separate this by the data source being called upon in the models.

example image of what your folder organisation could look like

If you are interested in learning more about how to use DBT, and get greater detail on project organisation I have provided a link to a free DBT course that covers content covered in this blog plus much more!

Author:

Bethan Donovan

View Profile