pudl.dbt_schema#
Define dbt schema types and merging logic.
We generate dbt schema.yml files by translating our metadata into schema.yml format, then applying human-sourced patches to the auto-generated schemas.
Classes#
Functions#
|
Dump YAML to string that Prettier likes. |
|
Merge two DbtSchemas by applying human-schema as a patch on top of machine-schema. |
|
Perform a generic merge of two lists of dbt elements, matching by name. |
|
Match machine/human sources by name, then merge them. |
|
Merge two DbtSources by applying human-source as a patch on top of machine-source. |
|
Match machine/human tables by name, then merge them. |
|
Merge two DbtTables by applying human-table as a patch on top of machine-table. |
|
Match machine/human columns by name, then merge them. |
|
Merge two DbtColumns by applying human-column as a patch on top of machine-column. |
Module Contents#
- pudl.dbt_schema._prettier_yaml_dumps(yaml_contents: dict[str, Any]) str[source]#
Dump YAML to string that Prettier likes.
- class pudl.dbt_schema.DbtColumn(/, **data: Any)[source]#
Bases:
pydantic.BaseModelDefine yaml structure of a dbt column.
- class pudl.dbt_schema.DbtTable(/, **data: Any)[source]#
Bases:
pydantic.BaseModelDefine yaml structure of a dbt table.
- class pudl.dbt_schema.DbtSource(/, **data: Any)[source]#
Bases:
pydantic.BaseModelDefine basic dbt yml structure to add a pudl table as a dbt source.
- class pudl.dbt_schema.DbtSchema(/, **data: Any)[source]#
Bases:
pydantic.BaseModelDefine basic structure of a dbt models yaml file.
- classmethod from_table_name(table_name: str) DbtSchema[source]#
Construct configuration defining table from PUDL metadata.
- classmethod from_yaml(schema_path: pathlib.Path) DbtSchema[source]#
Load a DbtSchema object from a YAML file.
- to_yaml(schema_path: pathlib.Path)[source]#
Write DbtSchema object to YAML file.
- validate_humanity()[source]#
Make sure the human schema matches expectations.
We expect that all human overrides on source tables are data tests or column-level data tests. We allow the ‘name’ field so we can match human tables/columns with machine ones.
We do not have any expectations about model definitions since those are human-only.
- pudl.dbt_schema.merge_schema(machine_schema: DbtSchema, human_schema: DbtSchema) DbtSchema[source]#
Merge two DbtSchemas by applying human-schema as a patch on top of machine-schema.
Empty merged sources will be stored in the DbtSchema model as None to avoid serializing them.
- pudl.dbt_schema.merge_by_name(machine_elements: list, human_elements: list, merger: collections.abc.Callable, element_factory: collections.abc.Callable) list[source]#
Perform a generic merge of two lists of dbt elements, matching by name.
- Parameters:
machine_elements – can be empty list.
human_elements – can be empty list.
merger – callable that takes two elements of the same dbt type (source, table, column) and returns a new element that is the merged version.
element_factory – callable that takes the element name and returns an empty instance - used if e.g. the human element doesn’t exist.
- pudl.dbt_schema.merge_sources_by_name(machine_sources: list[DbtSource], human_sources: list[DbtSource]) list[DbtSource][source]#
Match machine/human sources by name, then merge them.
- pudl.dbt_schema.merge_source(machine_source: DbtSource, human_source: DbtSource) DbtSource[source]#
Merge two DbtSources by applying human-source as a patch on top of machine-source.
Returns a deep copy of the machine source to avoid aliasing, updating with tables as the merge of the tables of the machine and human sources.
- pudl.dbt_schema.merge_tables_by_name(machine_tables: list[DbtTable], human_tables: list[DbtTable]) list[DbtTable][source]#
Match machine/human tables by name, then merge them.
- pudl.dbt_schema.merge_table(machine_table: DbtTable, human_table: DbtTable) DbtTable[source]#
Merge two DbtTables by applying human-table as a patch on top of machine-table.
Returns a deep copy of the machine table to avoid aliasing, updating with columns and table-level data tests as the merge of the respective machine and human data.
- pudl.dbt_schema.merge_columns_by_name(machine_columns: list[DbtColumn], human_columns: list[DbtColumn]) list[DbtColumn][source]#
Match machine/human columns by name, then merge them.
- pudl.dbt_schema.merge_column(machine_column: DbtColumn, human_column: DbtColumn) DbtColumn[source]#
Merge two DbtColumns by applying human-column as a patch on top of machine-column.
Returns a deep copy of the machine column to avoid aliasing, updating with data tests as the merge of the data tests of the machine and human columns.
Does not update any other attributes (descriptions, etc.).