pudl.validate.dbt#

Wrap DBT invocations so we can get custom behavior.

Attributes#

Classes#

NodeContext

Associate a node's name with information describing what went wrong.

BuildResult

Combine overall result with any useful failure context.

Functions#

_preserve_logging_propagation()

Restore logging propagation settings after a dbt invocation.

install_dbt_deps(→ dbt.cli.main.dbtRunner)

Ensure dbt package dependencies are installed in the project directory.

__get_failed_nodes(...)

Get test node output from tests that failed.

__get_quantile_contexts(→ list[NodeContext])

Run debug_quantile_constraints macro for failed quantile constraints.

__get_compiled_sql_contexts(→ list[NodeContext])

Run the compiled SQL against duckdb to get failure contexts.

build_with_context(→ BuildResult)

Run the DBT build and get failure information back.

dagster_to_dbt_selection(→ str)

Translate dagster asset selection to db node selection.

Module Contents#

pudl.validate.dbt.logger[source]#
pudl.validate.dbt._preserve_logging_propagation()[source]#

Restore logging propagation settings after a dbt invocation.

Invoking dbt via dbtRunner triggers Dagster’s logging initialization, which resets logging.getLogger("dagster").propagate to False. This context manager saves and restores the setting so callers don’t experience unexpected side effects on the global logging configuration.

class pudl.validate.dbt.NodeContext[source]#

Bases: NamedTuple

Associate a node’s name with information describing what went wrong.

name: str[source]#
context: str[source]#
pretty_print()[source]#

Nice output for logging to stdout.

class pudl.validate.dbt.BuildResult[source]#

Bases: NamedTuple

Combine overall result with any useful failure context.

success: bool[source]#
failure_contexts: list[NodeContext][source]#
format_failure_contexts() str[source]#

Nice legible output for logs.

pudl.validate.dbt.install_dbt_deps(dbt: dbt.cli.main.dbtRunner | None = None) dbt.cli.main.dbtRunner[source]#

Ensure dbt package dependencies are installed in the project directory.

pudl.validate.dbt.__get_failed_nodes(results: dbt.artifacts.schemas.run.RunExecutionResult) list[dbt.contracts.graph.nodes.GenericTestNode][source]#

Get test node output from tests that failed.

pudl.validate.dbt.__get_quantile_contexts(nodes: list[dbt.contracts.graph.nodes.GenericTestNode], dbt: dbt.cli.main.dbtRunner, dbt_dir: pathlib.Path) list[NodeContext][source]#

Run debug_quantile_constraints macro for failed quantile constraints.

This is a little tricky because the macro output is just logged to stdout, and not stored in the dbt.invoke result. So, for each node, we:

  • redirect stdout

  • run the macro based on node information

  • parse stdout to get the context

Also, if a node has multiple parents, we don’t know which table to pass into debug_quantile_constraints so we just skip it.

pudl.validate.dbt.__get_compiled_sql_contexts(nodes: list[dbt.contracts.graph.nodes.GenericTestNode]) list[NodeContext][source]#

Run the compiled SQL against duckdb to get failure contexts.

pudl.validate.dbt.build_with_context(node_selection: str, dbt_target: str, node_exclusion: str | None = None) BuildResult[source]#

Run the DBT build and get failure information back.

  • run the DBT build using our selection, returning test failures

  • split the test failures by type - for most, we will just run the compiled SQL, but other tests such as the weighted quantile tests need extra handling

  • get contexts for various test failure types

  • print out test failure context

pudl.validate.dbt.dagster_to_dbt_selection(selection: str, defs: dagster.Definitions, manifest=None) str[source]#

Translate dagster asset selection to db node selection.

We use the dbt manifest to determine which sources are defined in dbt so that we can map them to dagster assets. So, we need to generate a fresh dbt manifest via dbt parse whenever we run this function.

  • turn asset selection into asset keys

  • turn asset keys into node names

  • turn node names into selection string