pudl.extract.eia930#

Extract EIA Form 930 data from CSVs.

EIA Form 930 is reported in half-year increments. Each half-year has three separate pages, which are stored as separate CSVs: “balance”, “interchange”, and “subregion.” See https://docs.catalyst.coop/pudl/en/latest/data_sources/eia930.html for more information.

We extract these CSVs into DuckDB, rename the columns as per the column map, and dump out the concatenated pages to Parquet.

Attributes#

logger

Functions#

`raw_eia930__balance`(→ pudl.helpers.ParquetData)	Raw balance page.
`raw_eia930__interchange`(→ pudl.helpers.ParquetData)	Raw interchange page.
`raw_eia930__subregion`(→ pudl.helpers.ParquetData)	Raw subregion page - only exists after 2018h2.
`extract_page`(→ pudl.helpers.ParquetData)	Pull data for a page across many half-years into a Parquet file.
`extract_half_year_page`(→ str)	Extract data from a single CSV.

Module Contents#

pudl.extract.eia930.logger[source]#

pudl.extract.eia930.raw_eia930__balance(context) → pudl.helpers.ParquetData[source]#: Raw balance page.

pudl.extract.eia930.raw_eia930__interchange(context) → pudl.helpers.ParquetData[source]#: Raw interchange page.

pudl.extract.eia930.raw_eia930__subregion(context) → pudl.helpers.ParquetData[source]#: Raw subregion page - only exists after 2018h2.

pudl.extract.eia930.extract_page(datastore: pudl.workspace.datastore.Datastore, page: str, half_years: list[str]) → pudl.helpers.ParquetData[source]#

Pull data for a page across many half-years into a Parquet file.

This involves reading each half-year, of course, but also concatenating them together and expanding the schema to fit all the columns we see.

Parameters:

datastore – the Datastore we use to actually access the raw data.
page – the name of the page we’re extracting.
half_years – the set of half-year segments we’re extracting.

Returns:

ParquetData pointing to parquet file with raw table data.

pudl.extract.eia930.extract_half_year_page(con: duckdb.DuckDBPyConnection, datastore: pudl.workspace.datastore.Datastore, half_year: str, page: str) → str[source]#

Extract data from a single CSV.

Reads into DuckDB for speed and memory use. To avoid reading the whole CSV into memory, we’re extracting directly to a temporary directory.

Parameters:

con – DuckDB connection.
datastore – the Datastore we use to actually access the input data.
half_year – the half-year we’re reading in.
page – the name of the page we’re reading.

Returns:

name of DuckDB view that represents the read & renamed CSV.

Return type:

view_name