pudl.etl.asset_checks

Programmatically defined Dagster asset checks for PUDL.

We primarily use Dagster asset checks to validate the schemas of PUDL tables. We use Pandera to programmatically define dataframe schemas based on the PUDL metadata with the asset check factory asset_check_from_schema() defined below.

For data validation we almost entirely rely on dbt data tests.

Functions

_collect_asset_metadata(→ dict[str, Any])

Collect basic metadata about the asset.

_collect_dtype_metadata(→ dict[str, Any])

Collect comprehensive column and data type information for comparison.

_collect_geometry_metadata(→ dict[str, Any])

Collect GeoPandas-specific metadata.

_process_schema_errors(→ dict[str, Any])

Process Pandera schema errors into structured metadata.

asset_check_from_schema(...)

Create a dagster asset check based on the resource schema, if defined.

Module Contents

pudl.etl.asset_checks._collect_asset_metadata(asset_value) dict[str, Any][source]

Collect basic metadata about the asset.

pudl.etl.asset_checks._collect_dtype_metadata(asset_value, resource: pudl.metadata.classes.Resource) dict[str, Any][source]

Collect comprehensive column and data type information for comparison.

pudl.etl.asset_checks._collect_geometry_metadata(asset_value) dict[str, Any][source]

Collect GeoPandas-specific metadata.

pudl.etl.asset_checks._process_schema_errors(schema_errors: pandera.errors.SchemaErrors) dict[str, Any][source]

Process Pandera schema errors into structured metadata.

pudl.etl.asset_checks.asset_check_from_schema(asset_key: dagster.AssetKey, package: pudl.metadata.classes.Package) dagster.AssetChecksDefinition | None[source]

Create a dagster asset check based on the resource schema, if defined.