pudl.etl.asset_checks¶
Programmatically defined Dagster asset checks for PUDL.
We primarily use Dagster asset checks to validate the schemas of PUDL tables. We use
Pandera to programmatically define dataframe schemas based on the PUDL metadata with the
asset check factory asset_check_from_schema() defined below.
For data validation we almost entirely rely on dbt data tests.
Functions¶
|
Collect basic metadata about the asset. |
|
Collect comprehensive column and data type information for comparison. |
|
Collect GeoPandas-specific metadata. |
|
Process Pandera schema errors into structured metadata. |
Create a dagster asset check based on the resource schema, if defined. |
Module Contents¶
- pudl.etl.asset_checks._collect_asset_metadata(asset_value) dict[str, Any][source]¶
Collect basic metadata about the asset.
- pudl.etl.asset_checks._collect_dtype_metadata(asset_value, resource: pudl.metadata.classes.Resource) dict[str, Any][source]¶
Collect comprehensive column and data type information for comparison.
- pudl.etl.asset_checks._collect_geometry_metadata(asset_value) dict[str, Any][source]¶
Collect GeoPandas-specific metadata.
- pudl.etl.asset_checks._process_schema_errors(schema_errors: pandera.errors.SchemaErrors) dict[str, Any][source]¶
Process Pandera schema errors into structured metadata.
- pudl.etl.asset_checks.asset_check_from_schema(asset_key: dagster.AssetKey, package: pudl.metadata.classes.Package) dagster.AssetChecksDefinition | None[source]¶
Create a dagster asset check based on the resource schema, if defined.