pudl.scripts#
Command line interface scripts for PUDL.
All PUDL console scripts (CLI entry points) should be defined in this subpackage. Each script should live in its own module and expose a single Click command or group as its public entry point.
Guidelines for contributors and agents#
One module per script: name the module after the command it provides (e.g.
pudl_datastore.py→pudl_datastore).Always name the entry point
main: the Click command or group must be calledmainin every module. The registered console script name (e.g.pudl_datastore) comes frompyproject.toml, not from the Python function name. This keeps entry point wiring uniform and avoids name collisions between the CLI name and the Python symbol.Thin wrappers only: the module should contain the Click option/argument declarations and just enough glue code to call into the real implementation living elsewhere in the
pudlpackage. Avoid putting substantive business logic here.Register the entry point: after adding a new module, add a corresponding line under
[project.scripts]inpyproject.toml, following the pattern:my_command = "pudl.scripts.my_command:main"
CLI-only helpers are fine here: small helpers that exist solely to support the CLI (e.g. Click callbacks, output-formatting utilities) may live in the same module as the command they serve. General-purpose helpers belong in the module that owns the underlying functionality, not here.
Decoupling aliases: if the Click command is imported by name from other modules (e.g. tests), keep the existing name as an alias after defining
main:main = <click-decorated function> legacy_name = main # imported by some_other_module
Always support
-hand--help: every Click command must setcontext_settings={"help_option_names": ["-h", "--help"]}so users can get help with either flag without waiting for heavy imports to finish.Use one process-exit boundary: command bodies should return integer status codes (
0for success) and should not callsys.exit()directly. Reservesys.exit(main())for the module launcher underif __name__ == "__main__":.
Keeping --help fast: defer heavy imports inside main()#
pudl/__init__.py eagerly imports the entire pudl package tree (analysis,
extract, metadata, transform, …), which adds roughly 7-8 seconds to any
import pudl.* that appears at module level. Click processes --help and
-h before calling the decorated function body, so placing heavy imports
inside main() is the correct approach and the code is structured this way
throughout the scripts subpackage. However, this does not yet deliver fast
help output because pudl/__init__.py still runs eagerly — the improvement
will only take effect once pudl/__init__.py is thinned to avoid importing
the full package tree at startup (tracked as a separate task).
Pattern to follow:
import click # lightweight — module level is fine
@click.command(context_settings={"help_option_names": ["-h", "--help"]})
@click.option(...)
def main(...):
# Deferred to keep --help fast; see pudl/scripts/__init__.py for rationale.
import pudl # noqa: PLC0415
from pudl.some.module import SomeClass # noqa: PLC0415
...
The # noqa: PLC0415 suppresses the ruff/pylint “import not at top of file”
warning for each deferred import. Add it to every import inside a function body.
If a module-level constant must be computed from a pudl import (e.g. for a
click.Choice list), prefer lightweight class-level introspection over
constructing full objects. For example, use sorted(SomeSettings.model_fields)
rather than SomeClass().get_known_values() to avoid triggering I/O or
heavyweight initialisation at decoration time.
Submodules#
- pudl.scripts.auto_match_utilities
- pudl.scripts.dbt_helper
- pudl.scripts.deploy
- pudl.scripts.dghome
- pudl.scripts.generate_pudl_duckdb
- pudl.scripts.metadata_to_rst
- pudl.scripts.pudl_check_fks
- pudl.scripts.pudl_datastore
- pudl.scripts.pudl_null_cols
- pudl.scripts.pudl_service_territories
- pudl.scripts.resource_description
- pudl.scripts.update_zenodo_dois
- pudl.scripts.zenodo_data_release