pudl.extract.eia930#
Extract EIA Form 930 data from CSVs.
EIA Form 930 is reported in half-year increments. Each half-year has three separate pages, which are stored as separate CSVs: “balance”, “interchange”, and “subregion.” See https://docs.catalyst.coop/pudl/en/latest/data_sources/eia930.html for more information.
We extract these CSVs into DuckDB, rename the columns as per the column map, and dump out the concatenated pages to Parquet.
Attributes#
Functions#
|
Raw balance page. |
|
Raw interchange page. |
|
Raw subregion page - only exists after 2018h2. |
|
Pull data for a page across many half-years into a Parquet file. |
|
Extract data from a single CSV. |
Module Contents#
- pudl.extract.eia930.raw_eia930__balance(context) pudl.helpers.ParquetData[source]#
Raw balance page.
- pudl.extract.eia930.raw_eia930__interchange(context) pudl.helpers.ParquetData[source]#
Raw interchange page.
- pudl.extract.eia930.raw_eia930__subregion(context) pudl.helpers.ParquetData[source]#
Raw subregion page - only exists after 2018h2.
- pudl.extract.eia930.extract_page(datastore: pudl.workspace.datastore.Datastore, page: str, half_years: list[str]) pudl.helpers.ParquetData[source]#
Pull data for a page across many half-years into a Parquet file.
This involves reading each half-year, of course, but also concatenating them together and expanding the schema to fit all the columns we see.
- Parameters:
datastore – the Datastore we use to actually access the raw data.
page – the name of the page we’re extracting.
half_years – the set of half-year segments we’re extracting.
- Returns:
ParquetData pointing to parquet file with raw table data.
- pudl.extract.eia930.extract_half_year_page(con: duckdb.DuckDBPyConnection, datastore: pudl.workspace.datastore.Datastore, half_year: str, page: str) str[source]#
Extract data from a single CSV.
Reads into DuckDB for speed and memory use. To avoid reading the whole CSV into memory, we’re extracting directly to a temporary directory.
- Parameters:
con – DuckDB connection.
datastore – the Datastore we use to actually access the input data.
half_year – the half-year we’re reading in.
page – the name of the page we’re reading.
- Returns:
name of DuckDB view that represents the read & renamed CSV.
- Return type:
view_name