dbt_helper¶
A basic CLI to autogenerate dbt data test configurations.
Attributes¶
Classes¶
Define yaml structure of a dbt column. |
|
Define yaml structure of a dbt table. |
|
Define basic dbt yml structure to add a pudl table as a dbt source. |
|
Define basic structure of a dbt models yaml file. |
|
Define a single class to collect the args for all table update commands. |
Functions¶
Check if the DeepDiff includes any removals or modifications. |
|
|
Print old and new YAML, and summary of schema changes. |
|
Return all changes in a DeepDiff between two schemas as a string. |
|
Return data source for a table or 'output' if there's more than one source. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Generate updated row counts per partition and write to csv file within dbt project. |
|
Generate and write out a schema.yaml file defining a new or updated table. |
|
|
|
|
|
Add or update dbt schema configs and row count expectations for PUDL tables. |
|
Validate a selection of DBT nodes. |
Script for auto-generating dbt configuration and migrating existing tests. |
Module Contents¶
- class dbt_helper.DbtColumn(/, **data: Any)[source]¶
Bases:
pydantic.BaseModelDefine yaml structure of a dbt column.
- class dbt_helper.DbtTable(/, **data: Any)[source]¶
Bases:
pydantic.BaseModelDefine yaml structure of a dbt table.
- add_column_tests(column_tests: dict[str, list]) DbtSource[source]¶
Add data tests to columns in dbt config.
- class dbt_helper.DbtSource(/, **data: Any)[source]¶
Bases:
pydantic.BaseModelDefine basic dbt yml structure to add a pudl table as a dbt source.
- class dbt_helper.DbtSchema(/, **data: Any)[source]¶
Bases:
pydantic.BaseModelDefine basic structure of a dbt models yaml file.
- add_source_tests(source_tests: list, model_name: str | None = None) DbtSchema[source]¶
Add data tests to source in dbt config.
- add_column_tests(column_tests: dict[list], model_name: str | None = None) DbtSchema[source]¶
Add data tests to columns in dbt config.
- classmethod from_table_name(table_name: str, partition_column: str) DbtSchema[source]¶
Construct configuration defining table from PUDL metadata.
- classmethod from_yaml(schema_path: pathlib.Path) DbtSchema[source]¶
Load a DbtSchema object from a YAML file.
- classmethod to_yaml(schema_path: pathlib.Path)[source]¶
Write DbtSchema object to YAML file.
- dbt_helper.schema_has_removals_or_modifications(diff: deepdiff.DeepDiff) bool[source]¶
Check if the DeepDiff includes any removals or modifications.
- dbt_helper._log_schema_diff(diff: deepdiff.DeepDiff, old_schema: DbtSchema, new_schema: DbtSchema)[source]¶
Print old and new YAML, and summary of schema changes.
- dbt_helper._schema_diff_summary(diff: deepdiff.DeepDiff) str[source]¶
Return all changes in a DeepDiff between two schemas as a string.
- dbt_helper.get_data_source(table_name: str) str[source]¶
Return data source for a table or ‘output’ if there’s more than one source.
- dbt_helper._get_model_path(table_name: str, data_source: str) pathlib.Path[source]¶
- dbt_helper._get_row_count_csv_path(target: str = 'etl-full') pathlib.Path[source]¶
- dbt_helper._get_existing_row_counts(target: str = 'etl-full') pandas.DataFrame[source]¶
- dbt_helper._calculate_row_counts(table_name: str, partition_column: str = 'report_year') pandas.DataFrame[source]¶
- dbt_helper._combine_row_counts(existing: pandas.DataFrame, new: pandas.DataFrame) pandas.DataFrame[source]¶
- dbt_helper._write_row_counts(row_counts: pandas.DataFrame, target: str = 'etl-full')[source]¶
- dbt_helper.update_row_counts(table_name: str, partition_column: str = 'report_year', target: str = 'etl-full', clobber: bool = False, update: bool = False) UpdateResult[source]¶
Generate updated row counts per partition and write to csv file within dbt project.
- dbt_helper.update_table_schema(table_name: str, data_source: str, partition_column: str = 'report_year', clobber: bool = False, update: bool = False) UpdateResult[source]¶
Generate and write out a schema.yaml file defining a new or updated table.
- dbt_helper._log_update_result(result: UpdateResult)[source]¶
- class dbt_helper.TableUpdateArgs[source]¶
Define a single class to collect the args for all table update commands.
- dbt_helper.update_tables(tables: list[str], target: str, clobber: bool, update: bool, schema: bool, row_counts: bool)[source]¶
Add or update dbt schema configs and row count expectations for PUDL tables.
The
tablesargument can be a single table name, a list of table names, or ‘all’. If ‘all’ the script will update configurations for for all PUDL tables.If
--clobberis set, existing configurations for tables will be overwritten. If--updateis set, existing configurations for tables will be updated only if this does not result in deletions.
- dbt_helper.validate(select: str = '*', exclude: str | None = None, target: str = 'etl-full') None[source]¶
Validate a selection of DBT nodes.
Wraps the
dbt buildcommand line so we can annotate the result with the actual data that was returned from the test query.
- dbt_helper.dbt_helper()[source]¶
Script for auto-generating dbt configuration and migrating existing tests.
This CLI currently provides the following sub-commands:
update-tableswhich can update or create a dbt table (model) schema.yml file under thedbt/modelsrepo. These configuration files tell dbt about the structure of the table and what data tests are specified for it. It also adds a (required) row count test by default. The script can also generate or update the expected row counts for existing tables, assuming they have been materialized to parquet files and are sitting in your $PUDL_OUT directory.validate: run validation tests for a selection of DBT nodes.
Run
dbt_helper {command} --helpfor detailed usage on each command.