pudl.validate.dbt#

Wrap DBT invocations so we can get custom behavior.

Attributes#

logger

Classes#

`NodeContext`	Associate a node's name with information describing what went wrong.
`BuildResult`	Combine overall result with any useful failure context.

Functions#

`_preserve_logging_propagation`()	Restore logging propagation settings after a dbt invocation.
`install_dbt_deps`(→ dbt.cli.main.dbtRunner)	Ensure dbt package dependencies are installed in the project directory.
`__get_failed_nodes`(...)	Get test node output from tests that failed.
`__get_quantile_contexts`(→ list[NodeContext])	Run debug_quantile_constraints macro for failed quantile constraints.
`__get_compiled_sql_contexts`(→ list[NodeContext])	Run the compiled SQL against duckdb to get failure contexts.
`build_with_context`(→ BuildResult)	Run the DBT build and get failure information back.
`dagster_to_dbt_selection`(→ str)	Translate dagster asset selection to db node selection.

Module Contents#

pudl.validate.dbt.logger[source]#

pudl.validate.dbt._preserve_logging_propagation()[source]#

Restore logging propagation settings after a dbt invocation.

Invoking dbt via dbtRunner triggers Dagster’s logging initialization, which resets logging.getLogger("dagster").propagate to False. This context manager saves and restores the setting so callers don’t experience unexpected side effects on the global logging configuration.

class pudl.validate.dbt.NodeContext[source]#

Bases: NamedTuple

Associate a node’s name with information describing what went wrong.

name: str[source]#

context: str[source]#

pretty_print()[source]#: Nice output for logging to stdout.

class pudl.validate.dbt.BuildResult[source]#

Bases: NamedTuple

Combine overall result with any useful failure context.

success: bool[source]#

failure_contexts: list[NodeContext][source]#

format_failure_contexts() → str[source]#: Nice legible output for logs.

pudl.validate.dbt.install_dbt_deps(dbt: dbt.cli.main.dbtRunner | None = None) → dbt.cli.main.dbtRunner[source]#: Ensure dbt package dependencies are installed in the project directory.

pudl.validate.dbt.__get_failed_nodes(results: dbt.artifacts.schemas.run.RunExecutionResult) → list[dbt.contracts.graph.nodes.GenericTestNode][source]#: Get test node output from tests that failed.

pudl.validate.dbt.__get_quantile_contexts(nodes: list[dbt.contracts.graph.nodes.GenericTestNode], dbt: dbt.cli.main.dbtRunner, dbt_dir: pathlib.Path) → list[NodeContext][source]#

Run debug_quantile_constraints macro for failed quantile constraints.

This is a little tricky because the macro output is just logged to stdout, and not stored in the dbt.invoke result. So, for each node, we:

redirect stdout
run the macro based on node information
parse stdout to get the context

Also, if a node has multiple parents, we don’t know which table to pass into debug_quantile_constraints so we just skip it.

pudl.validate.dbt.__get_compiled_sql_contexts(nodes: list[dbt.contracts.graph.nodes.GenericTestNode]) → list[NodeContext][source]#: Run the compiled SQL against duckdb to get failure contexts.

pudl.validate.dbt.build_with_context(node_selection: str, dbt_target: str, node_exclusion: str | None = None) → BuildResult[source]#

Run the DBT build and get failure information back.

run the DBT build using our selection, returning test failures
split the test failures by type - for most, we will just run the compiled SQL, but other tests such as the weighted quantile tests need extra handling
get contexts for various test failure types
print out test failure context

pudl.validate.dbt.dagster_to_dbt_selection(selection: str, defs: dagster.Definitions, manifest=None) → str[source]#

Translate dagster asset selection to db node selection.

We use the dbt manifest to determine which sources are defined in dbt so that we can map them to dagster assets. So, we need to generate a fresh dbt manifest via dbt parse whenever we run this function.

turn asset selection into asset keys
turn asset keys into node names
turn node names into selection string