Messy dbt projects slow down your entire data team and make collaboration a nightmare. Clean, consistent dbt naming conventions and coding standards transform chaotic data pipelines into well-organized, maintainable systems that scale with your business.
This guide is for data engineers, analytics engineers, and data teams who want to build professional dbt projects that don’t fall apart as they grow. You’ll learn practical standards that real teams use to keep their data transformation workflows running smoothly.
We’ll cover essential dbt naming conventions that make your models instantly recognizable, plus file organization standards that help your team find and update code without wasting time digging through folders. You’ll also discover code formatting rules and documentation standards that turn your dbt project into a valuable business asset instead of technical debt.
Essential dbt Naming Conventions for Scalable Data Projects
Model naming patterns that improve team collaboration
Effective dbt naming conventions start with clear, predictable model names that every team member can understand at a glance. The most successful data teams adopt a standardized format that immediately communicates a model’s purpose and position within the data pipeline.
A proven approach combines business domain, data granularity, and processing stage into each model name. For example, finance_daily_revenue_summary
tells the story: it belongs to the finance domain, contains daily-level data, focuses on revenue metrics, and represents summarized information. This pattern eliminates guesswork and reduces onboarding time for new team members.
Avoid generic names like model_1
or temp_table
that provide zero context. Instead, choose descriptive names that reflect the actual business concepts: customer_lifetime_value
, product_performance_metrics
, or sales_territory_assignments
. These names create self-documenting code that stays meaningful even months later.
Team collaboration improves dramatically when everyone follows the same naming logic. Establish naming conventions early in your dbt project and document them clearly. Consider creating a naming reference guide that includes approved abbreviations, forbidden words, and domain-specific terminology that aligns with your organization’s business language.
Layer-specific prefixes for organized data architecture
Scalable data projects require clear architectural boundaries, and layer-specific prefixes make these boundaries visible throughout your dbt codebase. The most common approach divides models into staging, intermediate, and mart layers, each with distinct prefixes that signal their role in the transformation pipeline.
Staging models typically use stg_
prefixes and focus on basic cleaning and standardization of raw source data. These models handle data type conversions, column renaming, and simple filtering without complex business logic. Examples include stg_salesforce_accounts
or stg_stripe_payments
.
Intermediate models carry int_
prefixes and contain reusable business logic that multiple downstream models need. These models perform joins, aggregations, and calculations that serve as building blocks for final outputs. Think int_customer_subscription_history
or int_product_category_mapping
.
Mart models use prefixes like mart_
or dim_
/fact_
for dimensional modeling approaches. These represent the final, business-ready datasets that analysts and stakeholders consume. Examples include mart_monthly_sales_performance
or dim_customer_segments
.
Layer | Prefix | Purpose | Example |
---|---|---|---|
Staging | stg_ |
Raw data cleaning | stg_hubspot_contacts |
Intermediate | int_ |
Reusable logic | int_customer_metrics |
Marts | mart_ |
Business-ready data | mart_executive_dashboard |
Descriptive suffixes that clarify model purpose
Strategic use of suffixes transforms model names from simple labels into powerful communication tools. Well-chosen suffixes immediately communicate a model’s intended use, data granularity, and update frequency without requiring users to examine the actual code or documentation.
Common suffixes include _summary
for aggregated data, _detail
for row-level information, and _snapshot
for point-in-time captures. Time-based suffixes like _daily
, _weekly
, or _monthly
signal the expected refresh cadence and data grain. Purpose-driven suffixes such as _metrics
, _dimensions
, or _events
categorize models by their analytical function.
Consider these examples: sales_performance_monthly_summary
clearly indicates aggregated sales data at monthly intervals, while customer_transactions_detail
suggests granular, transaction-level information. The suffix _latest
works well for models that maintain current state, like inventory_levels_latest
or customer_preferences_latest
.
Consistency across your dbt project organization prevents confusion and supports automated processes. Document your suffix conventions and include guidelines for when to use each one. This systematic approach helps teams locate the right models quickly and understand their characteristics before diving into the code.
Consistent variable and column naming strategies
Column naming consistency across your dbt models creates a unified data experience that reduces cognitive load for analysts and downstream consumers. Establish clear rules for column naming that everyone follows religiously, treating this as non-negotiable infrastructure for your modern data transformation workflows.
Primary keys should follow a standard pattern like [table_name]_id
or [table_name]_key
. This makes joins obvious and prevents confusion when working with multiple models. Foreign keys mirror this pattern with names like customer_id
or product_key
that immediately signal their relationship to other tables.
Date and timestamp columns benefit from descriptive prefixes or suffixes that clarify their meaning. Use patterns like created_at
, updated_at
, effective_date
, or expiration_date
rather than generic names like date
or timestamp
. This specificity prevents errors and makes SQL more readable.
Boolean columns work best with clear is_
or has_
prefixes: is_active
, has_subscription
, is_deleted
. These names make the data type obvious and create self-explanatory conditional logic in your models.
Avoid abbreviated column names that save a few characters but cost hours in comprehension. Choose customer_acquisition_cost
over cac
, and monthly_recurring_revenue
over mrr
. Your future self and your teammates will thank you for the clarity.
File Organization Standards That Boost Development Efficiency
Directory Structure Best Practices for Large dbt Projects
Building a robust dbt project organization requires thinking like an architect. Start with a logical top-level structure that separates concerns clearly. Your main models directory should contain subdirectories that reflect different business domains or data layers rather than technical groupings.
Create separate folders for staging, intermediate, and mart models. Staging models (staging/
) house raw data transformations from source systems. Intermediate models (intermediate/
) contain business logic and calculations. Mart models (marts/
) represent final tables ready for business consumption.
Within each layer, organize by business domain. For example, under marts/
, create folders like finance/
, marketing/
, and operations/
. This dbt project organization approach scales naturally as your data warehouse grows and new teams join the project.
Consider this proven structure:
models/
├── staging/
│ ├── salesforce/
│ ├── hubspot/
│ └── stripe/
├── intermediate/
│ ├── customer/
│ ├── revenue/
│ └── product/
└── marts/
├── finance/
├── marketing/
└── core/
Grouping Related Models Using Strategic Folder Hierarchies
Smart folder hierarchies prevent your dbt file structure from becoming a chaotic mess. Group models by their business purpose first, then by their technical dependencies. This creates intuitive pathways for team members to find and understand model relationships.
Use consistent prefixes within each folder to indicate model purpose. Staging models should start with stg_
, intermediate models with int_
, and marts with the business domain abbreviation. This dbt naming conventions approach makes dependencies crystal clear at a glance.
Build folder hierarchies that mirror your company’s organizational structure. If your business has distinct product lines, create separate folders for each. Marketing teams should easily locate customer acquisition models, while finance teams find revenue recognition models in their dedicated space.
Model Type | Prefix | Example | Purpose |
---|---|---|---|
Staging | stg_ |
stg_salesforce_accounts |
Raw data cleanup |
Intermediate | int_ |
int_customer_metrics |
Business logic |
Marts | Domain prefix | fin_revenue_daily |
Final consumption |
Avoid deeply nested folder structures that require excessive clicking to navigate. Three levels deep typically provides enough organization without becoming cumbersome.
Documentation File Placement for Maximum Accessibility
Strategic placement of documentation files transforms them from afterthoughts into powerful development tools. Position schema files alongside their corresponding models to maintain tight coupling between code and documentation. This proximity encourages developers to update documentation during model changes.
Create a central docs/
directory for business glossaries, project overviews, and architectural decisions. This becomes your single source of truth for project-wide documentation that transcends individual models. Place README files at each directory level to explain the purpose and contents of that folder.
Use descriptive filenames that match your dbt documentation standards. A file named customer_metrics_schema.yml
immediately indicates its purpose and scope. Avoid generic names like schema.yml
in favor of specific, searchable filenames.
Your documentation structure should support different audiences. Technical documentation belongs near the code, while business-focused documentation deserves prominent placement in easily discoverable locations. Consider creating separate folders for different stakeholder groups:
docs/business/
– Business logic explanations and data definitionsdocs/technical/
– Implementation details and architectural decisionsdocs/onboarding/
– Getting started guides for new team members
Place data quality tests alongside their corresponding models to make testing requirements visible during development. This proximity helps maintain data quality standards as part of the natural development workflow rather than as an afterthought.
Code Formatting Rules That Enhance Readability
SQL Style Guidelines for Consistent Query Structure
Writing clean, readable SQL in your dbt models starts with establishing clear structural patterns. The most widely adopted approach follows a consistent order: SELECT clauses first, followed by FROM, joins, WHERE conditions, GROUP BY, HAVING, and ORDER BY statements. This predictable flow helps team members quickly scan and understand query logic.
When structuring SELECT statements, place each column on its own line with trailing commas. This approach makes code reviews easier and reduces merge conflicts when adding or removing columns. For complex transformations, wrap calculated fields in meaningful aliases that clearly indicate their purpose.
Table aliases should be short but descriptive – use meaningful abbreviations rather than random letters. For example, use customers c
instead of customers a
. When joining multiple tables, maintain consistent alias patterns throughout your project to build muscle memory for your development team.
Indentation and Spacing Standards Across Team Members
Consistent indentation transforms chaotic SQL into readable, maintainable code. Use four spaces for each indentation level – this provides clear visual hierarchy without making lines too long. Avoid mixing tabs and spaces, as this creates formatting inconsistencies across different editors and team members.
Align related elements vertically when possible. Line up column names in SELECT statements, and indent JOIN conditions to match the indentation of the JOIN keyword. This vertical alignment creates visual blocks that make complex queries easier to parse.
Element | Indentation Rule | Example |
---|---|---|
Main clauses | No indentation | SELECT, FROM, WHERE |
Column lists | 4 spaces | Column names under SELECT |
JOIN conditions | 4 spaces | ON conditions under JOIN |
Subqueries | 8 spaces | Nested SELECT statements |
Add blank lines between major query sections to create visual breathing room. Separate CTEs (Common Table Expressions) with single blank lines, and use double blank lines before the final SELECT statement in models with multiple CTEs.
Comment Formatting That Adds Meaningful Context
Strategic commenting transforms your dbt models from code into documentation. Use SQL comments (--
) for single-line explanations and block comments (/* */
) for multi-line descriptions. Place comments above the code they describe, not inline, to maintain clean formatting.
Focus comments on the “why” rather than the “what.” Instead of commenting “– Select customer ID,” write “– Filter to active customers only to exclude churned accounts.” This context helps future developers understand business logic and decision-making.
For complex business rules or unusual data transformations, include brief explanations of the logic. When dealing with data quality issues or edge cases, document these scenarios clearly. Your future self and teammates will appreciate understanding why certain filters or transformations exist.
Add header comments to models explaining their purpose, dependencies, and any important assumptions. This documentation becomes especially valuable in large projects where models build upon each other in complex ways.
Line Length Limits for Optimal Code Review Experience
Setting reasonable line length limits significantly improves code review efficiency and readability across different screen sizes. The sweet spot for SQL code sits between 80-100 characters per line. This length works well on most monitors and allows for side-by-side code comparisons during reviews.
When lines exceed this limit, break them at logical points. For long SELECT lists, place each column on its own line. For complex WHERE conditions, break after AND/OR operators. Long function calls should break after commas, with parameters aligned vertically.
-- Good: Readable line breaks
SELECT
customer_id,
first_name,
last_name,
email_address
FROM {{ ref('raw_customers') }}
WHERE subscription_status = 'active'
AND created_at >= '2023-01-01'
AND email_verified = true
Use your IDE’s line length indicators to maintain consistency. Most modern editors can display vertical rulers at specific character counts, making it easy to spot overly long lines while writing code. This proactive approach prevents lengthy refactoring sessions later in the development process.
Version Control Integration Strategies for dbt Projects
Git branching workflows optimized for data transformation
Data teams need branching strategies that reflect the unique challenges of dbt version control. Unlike traditional software development, data transformations often require testing against production-like datasets and coordinating changes across multiple interconnected models.
The feature branch workflow works exceptionally well for dbt best practices. Create dedicated branches for each model or feature set you’re developing. This approach lets team members work independently on different data models without stepping on each other’s toes. When building a new customer segmentation model, branch off from main, develop your transformations, test thoroughly, then merge back.
For larger organizations, consider the Git Flow model with some modifications. Maintain a main branch for production-ready code and a develop branch for integration testing. Feature branches merge into develop first, allowing comprehensive testing of model interactions before promoting to main. This extra layer proves valuable when dealing with complex dbt project organization involving hundreds of interconnected models.
Environment-specific branching adds another dimension to consider. Some teams maintain separate branches for different data environments (dev, staging, production), though this can create merge complexity. A better approach often involves using dbt’s built-in target configurations while keeping code unified across a single main branch.
Release branches become particularly important for scalable data projects with scheduled deployment windows. Create a release branch when preparing for production deployment, allowing final testing and bug fixes without blocking ongoing feature development.
Commit message conventions for tracking model changes
Effective commit messages transform your git history into a readable changelog of data transformation evolution. Data transformation best practices demand clear communication about what changed and why, especially when model modifications affect downstream dependencies.
Follow the conventional commits specification adapted for dbt projects. Start with a type prefix that immediately communicates the nature of your change:
Prefix | Usage | Example |
---|---|---|
feat: |
New models or major enhancements | feat: add customer lifetime value model |
fix: |
Bug fixes in existing transformations | fix: correct revenue calculation in mart_sales |
refactor: |
Code improvements without logic changes | refactor: extract common date logic into macro |
docs: |
Documentation updates | docs: add business context to user engagement models |
test: |
Adding or modifying data tests | test: add unique constraint to customer_id |
Include the scope in parentheses to specify which area of your dbt project the change affects. This becomes invaluable when searching through commit history for specific model changes. Examples include feat(staging): add new source table
or fix(marts): resolve duplicate records in customer summary
.
Write commit messages that explain the business impact, not just the technical change. Instead of “update SQL,” write “fix: correct churn calculation to exclude trial users.” This approach helps both technical and business stakeholders understand the evolution of your data models.
For breaking changes that affect downstream consumers, use the BREAKING CHANGE:
footer in your commit message. This signals to other team members that they need to review and potentially update dependent models or reports.
Pull request templates that ensure code quality
Pull request templates standardize the review process and ensure consistency across your dbt coding standards. A well-designed template guides contributors through essential checks while providing reviewers with the context they need to evaluate changes effectively.
Create templates that address the unique aspects of data transformation work. Include sections for data impact assessment, performance considerations, and backward compatibility. Your template should prompt contributors to explain not just what changed, but how the changes affect data quality, processing time, and downstream dependencies.
Essential sections for dbt pull request templates include:
Change Summary: Brief description of what models were added, modified, or removed
Business Context: Why these changes were necessary and what problem they solve
Data Impact: Which tables/views are affected and potential breaking changes
Testing Performed: Results from dbt test runs and any manual validation
Performance Impact: Query execution time changes for modified models
Include checkboxes for common quality gates. Require confirmation that tests pass, documentation is updated, and schema changes are backward compatible where possible. This creates a consistent review process regardless of who’s submitting the change.
Pre-populate sections with helpful reminders. For example, remind contributors to run dbt docs generate
if they’ve added new models or updated descriptions. Include links to your team’s dbt documentation standards and coding guidelines for easy reference.
Consider different templates for different types of changes. A template for adding new staging models might focus heavily on source data validation, while a template for mart model changes emphasizes business logic verification and downstream impact assessment.
Merge conflict resolution in collaborative environments
Merge conflicts in dbt projects present unique challenges compared to traditional software development. Model dependencies create cascading effects where resolving conflicts in one file might require updates across multiple related models.
Schema conflicts represent the most common merge challenge in collaborative dbt environments. When multiple team members modify the same base model, conflicts often emerge in column selections, transformations, or joins. The key lies in understanding the business logic behind each change rather than mechanically choosing one version over another.
Establish clear conflict resolution protocols that prioritize data integrity. When facing conflicts in critical business logic, involve domain experts who understand the underlying data relationships. Don’t let technical team members make arbitrary decisions about revenue calculations or customer segmentation rules without business context.
Use dbt’s lineage capabilities to understand conflict impact. Before resolving any merge conflict, run dbt docs generate
on both branches to visualize how the changes affect downstream models. This helps identify which resolution approach minimizes disruption across your data pipeline.
Create a conflict resolution checklist that teams can follow consistently. Include steps like verifying test coverage, checking for breaking changes in downstream models, and validating that the merged result produces expected business outcomes. Document common conflict patterns and their preferred resolutions to speed up future merge processes.
For complex conflicts involving multiple interconnected models, consider using feature flags or gradual rollout strategies. Instead of forcing an immediate resolution, introduce changes incrementally while maintaining parallel versions during transition periods. This approach reduces risk while allowing thorough validation of the merged solution.
Documentation Standards That Drive Business Value
Model Description Templates for Stakeholder Clarity
Creating standardized model descriptions transforms dbt documentation from technical jargon into business-friendly explanations. A well-structured template helps data engineers communicate value while ensuring non-technical stakeholders can quickly understand what each model does and why it matters.
Start with a concise summary that answers “what” and “why” in plain English. Follow with the business purpose, data sources, key transformations, and refresh frequency. This template approach creates consistency across your entire dbt project organization while reducing the cognitive load for both developers and business users.
# Example model description template
models:
- name: customer_lifetime_value
description: |
**Business Purpose**: Calculates total revenue potential per customer
**Data Sources**: Orders, customers, product catalog
**Key Logic**: 12-month rolling revenue + predicted future value
**Refresh**: Daily at 6 AM EST
**Owner**: Analytics Team
The template should include ownership information, making it clear who maintains the model and where to direct questions. This prevents the common scenario where stakeholders spend hours hunting down the right person to explain data discrepancies.
Column-Level Documentation That Prevents Data Confusion
Column documentation serves as your first line of defense against misinterpretation. Every column should include its definition, possible values, and any business rules that govern its calculation. This level of detail prevents the frustration that comes when analysts discover their “revenue” differs from finance’s “revenue” calculation.
Focus on edge cases and assumptions that might not be obvious. Document null value handling, date ranges, and any filtering applied during transformation. For calculated fields, explain the logic in business terms rather than SQL syntax.
Documentation Element | Purpose | Example |
---|---|---|
Business Definition | Clarifies meaning | “Monthly recurring revenue from active subscriptions” |
Data Type & Format | Sets expectations | “DECIMAL(10,2) representing USD amounts” |
Null Value Meaning | Prevents misinterpretation | “NULL indicates customer has no subscription history” |
Valid Range | Identifies anomalies | “Values typically range from $10 to $10,000” |
Column-level dbt documentation standards become especially critical when dealing with dimensional data or metrics that have specific business definitions. Create a glossary of common terms and reference it consistently across models.
Business Logic Explanations That Enable Self-Service Analytics
Transform your dbt models into self-documenting assets by explaining the “why” behind complex business logic. When analysts can understand the reasoning behind transformations, they make better decisions about using the data and feel confident building on top of your work.
Break down complex calculations into digestible steps. Instead of showing a massive CASE statement, explain each condition and why it exists. Document the business scenarios that drive different logic branches, making it easier for others to modify or extend the model later.
# Business logic documentation example
description: |
Customer segmentation based on purchase behavior:
**High Value**: >$1000 annual spend + >5 orders
**Medium Value**: $500-$1000 spend OR 3-5 orders
**Low Value**: <$500 spend AND <3 orders
**Special Cases**:
- New customers (<90 days) always marked as "New"
- Enterprise accounts override normal segmentation
- Churned customers maintain last active segment
Include decision trees, business rules, and the rationale behind threshold values. This context helps analysts understand when the logic might need adjustment and empowers them to make informed decisions about data usage. Good business logic documentation turns your dbt project into a knowledge base that grows more valuable over time.
Document any assumptions about data quality, seasonal adjustments, or business rule exceptions. This transparency builds trust and helps users understand the limitations and appropriate use cases for each model.
Establishing consistent naming conventions, file organization, and code formatting standards creates the foundation for any successful dbt project. When your team follows clear guidelines for naming models, organizing directories, and structuring code, everyone can quickly understand and contribute to the project. These practices become even more valuable as your data team grows and new members join the project.
The real magic happens when you combine these standards with solid version control practices and comprehensive documentation. Your future self will thank you for writing clear model descriptions and maintaining consistent code structure. Start implementing these best practices on your next dbt project, even if it’s just you working on it right now. Good habits formed early will pay dividends when your data transformation needs inevitably grow more complex.