SQL vs dbt

When beginning my journey with DBT, I started to get confused when we should use this especially as it seemed quite similar to SQL. Especially in the early stages, it seems like more effort to use DBT for simple query's such as selecting a table when snowflake seems to do the job.

However, once I began some more complicated SQL using CTEs and joining tables together, I began to see how it broke down the complex transformations.

DBT and SQL are both useful however they work best when used in different contexts and to their advantage. This blog will go into the advantages of DBT and when is best to use this.

Modularisation

DBT allows users to break down complex data transformation into smaller and reusable components. These are called models. Each model performs a specific transformation on the data and therefore, can be reused in other parts of the worklow. You would simply reference this model in another workflow and it saves the hassle of looking back over a huge amount of SQL trying to figure out what it does or where you went wrong.

SQL: regarding complex queries, SQL will often result in large scripts. CTEs do help to break up large pieces of code however this could still be confusing.

Data Testing

DBT has the ability to define tests on your data models. You can write tests to ensure there are no duplicates in your data as well as any nulls. This can also ensure data quality.

SQL: doesn't have built-in mechanisms for data validation or testing. SQL requires users to manually write queries to check the data integrity.

Documentation

DBT has a docs feature that generates a web page that shows models and their relationships. This helps with other users understanding their work or passing this on between colleagues.

SQL: queries don't include automatic documentation, users would have to manually leave comments between the code. This can make SQL quite hard to digest without these comments.

Model Management

In DBT, models can depend on other models. DBT automatically handles the order in which models should be run based on these dependencies. This automatic management makes it easier to work with complex data pipelines as the user doesn't need to worry about the execution order.

SQL: typically requires users to manage dependencies manually, which can become confusing as number of transformations grow.

SQL Optimisation

DBT complies your SQL queries to optimise them for execution in your target database. It can also handle materialisations (e.g. tables, views etc.) to make your queries more efficient.

SQL: the user is responsible for optimising their queries and also managing the creation of tables and views.

Conclusion

DBT can add a layer of ease for complex transformations. While SQL is essential for querying and transforming data, DBT can provide better documentation and a more visual aspect than working with SQL alone.

Author:

Priya Kondola

View Profile