:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

Overview

dbt-pre-commit

pre-commit-dbt

CI black black

List of pre-commit hooks to ensure the quality of your dbt projects.

BETA NOTICE: This tool is still BETA and may have some bugs, so please be forgiving!

Goal

Quick ensure the quality of your dbt projects.

dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.

If this is the case, pre-commit-dbt is here to help you!

List of pre-commit-dbt hooks

💡 Click on hook name to view the details.

Model checks:

Script checks:

Source checks:

Modifiers:

dbt commands:


If you have an idea for a new hook or you found a bug, let us know

Install

For detailed installation and usage, instructions see pre-commit.com site.

pip install pre-commit

Setup

  1. Create a file named .pre-commit-config.yaml in your dbt root folder.
  2. Add list of hooks you want to run befor every commit. E.g.:
repos:
- repo: https://github.com/offbi/pre-commit-dbt
  rev: v0.1.1
  hooks:
  - id: check-script-semicolon
  - id: check-script-has-no-table-name
  - id: dbt-test
  - id: dbt-docs-generate
  - id: check-model-has-all-columns
    name: Check columns - core
    files: ^models/core
  - id: check-model-has-all-columns
    name: Check columns - mart
    files: ^models/mart
  - id: check-model-columns-have-desc
    files: ^models/mart
  1. Optionally, run pre-commit install to set up the git hook scripts. With this, pre-commit will run automatically on git commit! You can also manually run pre-commit run after you stage all files you want to run. Or pre-commit run --all-files to run the hooks against all of the files (not only staged).

Run as Github Action

Unfortunately, you cannot natively use pre-commit-dbt if you are using dbt Cloud. But you can run checks after you push changes into Github.

To do that, make a file .github/workflows/pre-commit.yml.

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - uses: pre-commit/[email protected]

To run only changed files:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}

To be able to run modifiers you need to use only private repository and change your .github/workflows/pre-commit.yml to:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 with:
 fetch-depth: 0
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}
 token: ${{ secrets.GITHUB_TOKEN }}

For more informations about pre-commit/action visit https://github.com/pre-commit/action.

Comments
  • add argument support to dbt commands (dbt clean & dbt deps)

    add argument support to dbt commands (dbt clean & dbt deps)

    This enhancement will allow the dbt-clean and dbt-deps hooks to take arguments for global and cmd flags. This addresses issue 39 by allowing the project-dir flag to be added to the dbt clean and dbt deps commands Example:

    • id: dbt-clean args: ["--cmd-flags", "++project-dir", "./transform/dbt"]
    opened by Aostojic 8
  • Docker image pull down image action fails

    Docker image pull down image action fails

    Describe the bug When hooking to the docker file, getting error invalid reference format when the build runs the step to pull the action image.

    Pull down action image 'offbi:pre-commit-dbt:v1.0.0' /usr/bin/docker pull offbi:pre-commit-dbt:v1.0.0 invalid reference format Warning: Docker pull failed with exit code 1, back off 3.275 seconds before retry.

    To Reproduce Steps to reproduce the behavior:

    1. add new file .github/workflows/pre-commit-dbt.yml:
    name: pre-commit
    
    on:
      pull_request:
        branches: [master]
    
    jobs:
      pre-commit:
        runs-on: ubuntu-latest
        steps:
        - uses: actions/[email protected]
        - uses: actions/[email protected]
        - id: file_changes
          uses: trilom/[email protected]
          with:
            output: ' '
        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    
    1. add the following code to existing (or create new) .pre-commit-config.yaml
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-script-has-no-table-name
        files: ^dbt
    
    1. commit changes and create a pull request to trigger test build.

    Expected behavior Build test completes successfully.

    Version: v1.0.0 (dbt version 0.21.0)

    Additional context I think it's odd that the yml file clearly shows offbi/[email protected] for the step action, but when the docker pull command is passed the syntax is changed to offbi:pre-commit-dbt:v1.0.0 (notice colons in place of slash & @). Nowhere in the documentation does it list how to avoid this. Also note: The "test" completes successfully if the following block is removed (all other steps run correctly):

        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    

    So it seems that it's this specific docker image causing the issue.

    bug 
    opened by grace-cityblock 6
  • add prj_root argument to dbt commands

    add prj_root argument to dbt commands

    This enhancement enables user to benefit from dbt-command hooks whenever their dbt project is not at the root level of the repo. It works by providing an additional argument to the dbt commands. Example: hooks: - id: dbt-docs-generate files: ^projects/ args: ["--prj-root","./common/dbt"] - id: dbt-deps files: ^projects/ args: ["--prj-root","./common/dbt"]

    In this example, the root of the dbt project is located inside the folder './commond/dbt'.

    Addresses issue https://github.com/offbi/pre-commit-dbt/issues/39

    opened by joaobernardopa 5
  • Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Describe the bug Any hook that tries to read target/manifest.json results in Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json') if dbt project directory (contains target/) != git project root.

    To Reproduce Steps to reproduce the behavior:

    1. Create the following project structure
    my_project/
    ├── .git/
    └── dbt_project/
         ├── dbt_project.yml
         ├── target/
         ├── models/
         └── ...
    
    1. Run any hook that requires the manifest

    Expected behavior I guess the current behavior is expected but there could be an option to specify the dbt project directory.

    Workaround: cp -r dbt_project/target target

    Version: v1.0.0

    bug 
    opened by stumelius 5
  • Cannot use the action in a workflow

    Cannot use the action in a workflow

    Describe the bug Hi there, first of all thanks for the efforts in developing this library. We are trying to include this step in our workflow, but we are facing an issue related to the docker image. It cannot pull it:

    Screenshot 2021-04-23 at 10 31 04

    Might the issue be related to the actual image defined in action.yaml ?

    To Reproduce Steps to reproduce the behavior:

    1. follow either the readme example or the snippet as reported in the marketplace

    Example configuration:

         - id: dbt_checks
            uses: offbi/[email protected]
            env:
              [ ... ]
            with:
              args: run --files ${{ steps.file_changes.outputs.files }}
    

    Version: v1.0.0

    bug 
    opened by Sh1n 5
  • check-column-desc-are-same fails with a Python error

    check-column-desc-are-same fails with a Python error

    Describe the bug When trying to use the check-column-desc-are-same hook, I'm getting the following Python error. Other hooks I've tried so far are working:

    Check column descriptions are same.......................................Failed
    - hook id: check-column-desc-are-same
    - exit code: 1
    
    Traceback (most recent call last):
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/bin/check-column-desc-are-same", line 8, in <module>
        sys.exit(main())
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 80, in main
        return check_column_desc(paths=args.filenames, ignore=args.ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 55, in check_column_desc
        grouped = get_grouped(paths, ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 48, in get_grouped
        sorted(columns, key=lambda x: x.column_name), lambda x: x.column_name
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 29, in get_all_columns
        for item in schemas:
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/utils.py", line 134, in get_model_schemas
        model_name = model.get("name")
    AttributeError: 'str' object has no attribute 'get'
    

    Version: v0.1.1 Python 3.7.9

    bug 
    opened by MartinGuindon 5
  • check-script-has-no-table-name is failing when using lateral flatten

    check-script-has-no-table-name is failing when using lateral flatten

    Describe the bug See updated description of the bug.

    ~~The check-script-has-no-table-name pre-commit hook is confusing CTEs with tables, and fails with code like this:~~

    with source as (
    
        select * from {{ source('stripe', 'payments') }}
    
    ),
    
    renamed as (
    
        select
            id as payment_id,
            order_id,
            payment_method,
    
            --`amount` is currently stored in cents, so we convert it to dollars
            amount / 100 as amount
    
        from source
    
    )
    
    select * from renamed
    

    ~~It reports that "source" and "renamed" are tables even though they are not, even though it looks the same from a code perspective.~~

    ~~I think this hook should perhaps fail only at the presence of schema.table or database.schema.table references, unless we can make this hook smarter by being aware of the CTEs defined in the model.~~

    Version: v0.1.1

    bug 
    opened by MartinGuindon 5
  • Fix Github Action docker reference

    Fix Github Action docker reference

    According to Github Actions documentation, we should reference the docker image directly from the project: https://docs.github.com/pt/actions/creating-actions/creating-a-docker-container-action#creating-an-action-metadata-file

    This fixes: https://github.com/offbi/pre-commit-dbt/issues/26

    opened by tlfbrito 3
  • Add check-model-name-contract hook

    Add check-model-name-contract hook

    There's now a check-model-name-contract hook (similar to check-model-name-contract) that is used to ensure that models follow naming convention.

    To-do before merge

    • [x] Write unit tests for the hook
    opened by stumelius 3
  • check-model-has-properties-file fails on macro with a valid properties yml

    check-model-has-properties-file fails on macro with a valid properties yml

    Describe the bug When running the test check-model-has-properties-file with a macro, the test fails with the following error.

    Check the model has properties file......................................Failed
    - hook id: check-model-has-properties-file
    - exit code: 1
    
    macros/grant_select_on_schemas.sql: does not have model properties defined in any .yml file.
    

    The .pre-commit-config.yaml includes the rule:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 607cb07a1918442f5963662a9aa19da8984931e6
      hooks:
      - id: check-model-has-properties-file
    

    And the macro has the following .yml file (the filename is the same as the macro name and is stored within the macros folder):

    
    macros:
      - name: grant_select_on_schemas
        description: "Grants privileges to groups after dbt run"
        docs:
          show: false
    

    Hope we can get this fixed soon as this is a really useful test to include

    bug 
    opened by andrewlee-trouva 3
  • wip: Added hook: check_model_has_tests_by_group

    wip: Added hook: check_model_has_tests_by_group

    This PR adds a hook that checks if a model has a sufficient number of tests pulled out of a group of acceptable tests, e.g. this model has 1 of unique, unique_where, or unique_threshold.

    opened by jtalmi 3
  • `check_macro_arguments_have_desc` hook fails to parse arguments

    `check_macro_arguments_have_desc` hook fails to parse arguments

    Describe the bug

    check_macro_arguments_have_desc hook raises the following error even though the content of the files is ok. Other hooks are working correctly, including check_macro_has_description.

    Traceback (most recent call last):
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/bin/check-macro-arguments-have-desc", line 8, in <module>
        sys.exit(main())
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 90, in main
        status_code, _ = check_argument_desc(paths=args.filenames, manifest=manifest)
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 52, in check_argument_desc
        for key, value in item.macro.get("arguments", {}).items()
    AttributeError: 'list' object has no attribute 'items'
    

    To Reproduce

    Steps to reproduce the behavior using getdbt examples:

    1. macros/schema.yml
    version: 2
    
    macros:
      - name: cents_to_dollars
        description: A macro to convert cents to dollars
        arguments:
          - name: column_name
            type: string
            description: The name of the column you want to convert
          - name: precision
            type: integer
            description: Number of decimal places. Defaults to 2.
    
    1. macros/cents_to_dollars.sql
    {% macro cents_to_dollars(column_name, precision=3) %}
        COALESCE (TRUNC(CAST({{ column_name }}/100 AS numeric), {{ precision }}), 0)
    {% endmacro %}
    
    1. Execute the following command after dbt deps, dbt compile and dbt docs generate: pre-commit run check-macro-arguments-have-desc --files macros/cents_to_dollars.sql

    Expected behavior The hook should pass successfully.

    Version: commit 34a2341234675d7a6b61766b2c33bdd5c33d090b (current latest commit)

    Additional context It looks like the problem is here: for key, value in item.macro.get("arguments", {}).items() The hook is guessing the arguments key contains a dictionary while this is a list of dictionaries. https://github.com/offbi/pre-commit-dbt/blob/main/pre_commit_dbt/check_macro_arguments_have_desc.py#L52

    bug 
    opened by hacherix 0
  • check-model-parents-and-childs for zero child check never runs

    check-model-parents-and-childs for zero child check never runs

    Attempting to use check-model-parents-and-childs hook to ensure that our data consumption layer models do not have any children does not fail when models DO have children

    Steps to reproduce the behavior:

    1. dbt project with a parent and child models
    2. Add check-model-parents-and-childs with --max-child-cnt of zero
      - id: check-model-parents-and-childs
        name: Check for child models in data consumption layers
        # manifest.json required
        args: ["--manifest","./pipelines/target/manifest.json","--max-child-cnt","0","--"]
        files: models/self_service/
    

    Expected outcome:

    Model that has a child is raised as failure

    Actual outcome:

    Hook passes

    Version:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 34a2341234675d7a6b61766b2c33bdd5c33d090b
    

    Additional info:

    Offending code appears to be checking for default (None) for the --max-child-cnt returning false for zero value i.e.

     if req_cnt and req_operator(real_value, req_cnt):
         status_code = 1
         print(
    ...
    

    may need to explicitly check for None?

     if req_cnt is not None and req_operator(real_value, req_cnt):
       status_code = 1
         print(
    ...
    
    bug 
    opened by PeteCorbettWS 0
  • `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    Describe the bug Using an EXTRACT date function will recognize the column reference as a table reference.

    Check the script has not table name......................................Failed
    - hook id: check-script-has-no-table-name
    - exit code: 1
    
    models/example.sql: does not use source() or ref() macros for tables:
    - order_date
    

    To Reproduce Based on the jaffle shop dbt example, create a model with the following content:

    SELECT
        *,
        EXTRACT(YEAR FROM ORDER_DATE) as ORDER_YEAR
    FROM source('jaffle_shop', 'orders')
    

    Expected behavior Using an EXTRACT date function should not make the check-script-has-no-table-name fail.

    Version: v1.0.0

    Additional context Link to EXTRACT function documentation for different warehouses:

    bug 
    opened by nasseredine 0
  • Feature Request: `check-model-has-column`

    Feature Request: `check-model-has-column`

    Describe the feature you'd like Add a hook which asserts that a given column with a given type exists in a model.

    Additional context We generally require that every model has an audit timestamp column called _updated_at and it would be nice to be able to enforce that.

    enhancement 
    opened by huptonbirdsall 0
  • `Check model name contract` hook problem with version

    `Check model name contract` hook problem with version

    When using the following pre commit config file

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-model-name-contract
        args: [--pattern, "(rep__).*"]
        files: models/reporting
    

    I get the following response [ERROR] check-model-name-contract is not present in repository https://github.com/offbi/pre-commit-dbt. Typo? Perhaps it is introduced in a newer version? Often pre-commit autoupdate fixes this.

    Even if I run the autoupdate command the error message is the same. Am I missing something?

    bug 
    opened by papost 2
Releases(v1.0.0)
Owner
Offbi
Data engineering with ❤️
Offbi
This application is made solely for entertainment purposes

Timepass This application is made solely for entertainment purposes helps you find things to do when you're bored ! tells jokes guaranteed to bring on

Omkar Pramod Hankare 2 Nov 24, 2021
IPO Checker for NEPSE

IPO Checker Checks more than one account for an IPO. Usage: ipo_checker.py [-h] --file FILE IPO Checker for a list. optional arguments: -h, --help

Sagar Tamang 4 Sep 20, 2022
Sequence clustering and database creation using mmseqs, from local fasta files

Sequence clustering and database creation using mmseqs, from local fasta files

Ana Julia Velez Rueda 3 Oct 27, 2022
Chalice - A tool to facilitate Python based lambda deployment

Chalice is a tool to facilitate Python based lambda deployment. This repo contains the output of my basic exploration of this tool.

Csilla Bessenyei 1 Feb 03, 2022
A chain of stores wants a 3-month demand forecast for its 10 different stores and 50 different products.

Demand Forecasting Objective A chain store wants a machine learning project for a 3-month demand forecast for 10 different stores and 50 different pro

2 Jan 06, 2022
Completed task 1 and task 2 at LetsGrowMore as a data science intern.

LetsGrowMore-Internship Completed task 1 and task 2 at LetsGrowMore as a data science intern. Task 1- Task 2- Creating a Decision Tree classifier and

Sanjyot Panure 1 Jan 16, 2022
A bunch of codes for procedurally modeling and texturing futuristic cities.

Procedural Futuristic City This is our final project for CPSC 479. We created a procedural futuristic city complete with Worley noise procedural textu

1 Dec 22, 2021
Animations made using manim-ce

ManimCE Animations Animations made using manim-ce The code turned out to be a bit complicated than expected.. It can be greatly simplified, will work

sparshg 17 Jan 06, 2023
Integration between the awesome window manager and the firefox web browser.

Integration between the awesome window manager and the firefox web browser.

contribuewwt 3 Feb 02, 2022
Fetch data from an excel file and create HTML file

excel-to-html Problem Statement! - Fetch data from excel file and create html file Excel.xlsx file contain the information.in multiple rows that is ne

Vivek Kashyap 1 Oct 25, 2021
libvcs - abstraction layer for vcs, powers vcspull.

libvcs - abstraction layer for vcs, powers vcspull. Setup $ pip install libvcs Open up python: $ python # or for nice autocomplete and syntax highlig

python utilities for version control 46 Dec 14, 2022
Neptune client library - integrate your Python scripts with Neptune

Lightweight experiment tracking tool for AI/ML individuals and teams. Fits any workflow. Neptune is a lightweight experiment logging/tracking tool tha

neptune.ai 353 Jan 04, 2023
OCR-ID-Card VietNamese (new id-card)

OCR-ID-Card VietNamese (new id-card) run project: download 2 file weights and pu

12 Jun 15, 2022
A Brainfuck interpreter written in Python.

A Brainfuck interpreter written in Python.

Ethan Evans 1 Dec 05, 2021
ticguide: quick + painless TESS observing information

ticguide: quick + painless TESS observing information Complementary to the TESS observing tool tvguide (see also WTV), which tells you if your target

Ashley Chontos 5 Nov 05, 2022
Some shitty programs just to brush up on my understanding of binary conversions.

Binary Converters Some shitty programs just to brush up on my understanding of binary conversions. Supported conversions formats = "unsigned-binary" |

Tim 2 Jan 09, 2022
Assembly example for CadQuery

Spindle and vacuum attachment This is a model of the vacuum attachment for my Workbee CNC router. There is a mist spray coming from the left hand side

Marcus Boyd 20 Sep 16, 2022
Make pack up python files easier.

python-easy-pack make pack up python files easier. 目前只提供了中文环境 如何使用? 将index.py复制到你的项目文件夹,或者把.py文件拷贝到这个文件夹。 打开你的cmd或者powershell 切换到程序所在目录,输入python index

2 Dec 15, 2021
This is the core of the program which takes 5k SYMBOLS and looks back N years to pull in the daily OHLC data of those symbols and saves them to disc.

This is the core of the program which takes 5k SYMBOLS and looks back N years to pull in the daily OHLC data of those symbols and saves them to disc.

Daniel Caine 1 Jan 31, 2022
Prop-based map editor for the Apex Legends mod, R5Reloaded

R5R Map Editor A tool to build maps out of props in the Apex Legends mod, R5Reloaded Instuctions Install R5R Download this program Get the prop spawne

7 Dec 16, 2022