pudl.workspace.resource_cache

Implementations of datastore resource caches.

Attributes

Classes

PudlResourceKey

Uniquely identifies a specific resource.

AbstractCache

Defines interaface for the generic resource caching layer.

UPathCache

Implements file cache using UPath for unified access to multiple storage backends.

LayeredCache

Implements multi-layered system of caches.

Module Contents

pudl.workspace.resource_cache.logger[source]
class pudl.workspace.resource_cache.PudlResourceKey[source]

Bases: NamedTuple

Uniquely identifies a specific resource.

dataset: str[source]
doi: str[source]
name: str[source]
__repr__() str[source]

Returns string representation of PudlResourceKey.

get_local_path() pathlib.Path[source]

Returns (relative) path that should be used when caching this resource.

class pudl.workspace.resource_cache.AbstractCache(read_only: bool = False)[source]

Bases: abc.ABC

Defines interaface for the generic resource caching layer.

_read_only = False[source]
is_read_only() bool[source]

Returns true if the cache is read-only and should not be modified.

abstractmethod get(resource: PudlResourceKey) bytes[source]

Retrieves content of given resource or throws KeyError.

abstractmethod add(resource: PudlResourceKey, content: bytes) None[source]

Adds resource to the cache and sets the content.

abstractmethod delete(resource: PudlResourceKey) None[source]

Removes the resource from cache.

abstractmethod contains(resource: PudlResourceKey) bool[source]

Returns True if the resource is present in the cache.

class pudl.workspace.resource_cache.UPathCache(storage_upath: upath.UPath, **kwargs: Any)[source]

Bases: AbstractCache

Implements file cache using UPath for unified access to multiple storage backends.

This cache uses universal_pathlib’s UPath to provide a unified interface for accessing data stored in S3, GCS, or local filesystems. It handles backend-specific authentication and credential management internally.

Requires UPath objects with explicit protocols:
supported_protocols: set[str][source]
_protocol[source]
_storage_options[source]
_base_path[source]
__repr__() str[source]

Returns string representation of UPathCache.

_setup_credentials() dict[str, Any][source]

Set up backend-specific credentials and storage options.

This should be the only place where backend-specific logic is required.

Returns:

Dictionary of storage options to pass to UPath

_resource_path(resource: PudlResourceKey) upath.UPath[source]

Get the UPath for a given resource.

Parameters:

resource – The resource to get the path for

Returns:

UPath object pointing to the resource location

is_anonymous() bool[source]

Returns True if the cache is using anonymous access (no credentials).

get(resource: PudlResourceKey) bytes[source]

Retrieves value associated with given resource.

Parameters:

resource – The resource to retrieve

Returns:

The content of the resource as bytes

Raises:
  • KeyError – if the resource doesn’t exist

  • Exception – for other storage backend errors

add(resource: PudlResourceKey, content: bytes)[source]

Adds (or updates) resource to the cache with given content.

Parameters:
  • resource – The resource to add

  • content – The content to store

Raises:

RuntimeError – if cache is read-only or credentials are insufficient

delete(resource: PudlResourceKey)[source]

Deletes resource from the cache.

Parameters:

resource – The resource to delete

Raises:

RuntimeError – if cache is read-only or credentials are insufficient

contains(resource: PudlResourceKey) bool[source]

Returns True if resource is present in the cache.

Parameters:

resource – The resource to check

Returns:

True if the resource exists, False otherwise

class pudl.workspace.resource_cache.LayeredCache(*caches: AbstractCache, **kwargs: Any)[source]

Bases: AbstractCache

Implements multi-layered system of caches.

This allows building multi-layered system of caches. The idea is that you can have faster local caches with fall-back to the more remote or expensive caches that can be accessed in case of missing content.

Only the closest layer is being written to (set, delete), while all remaining layers are read-only (get).

_caches: list[AbstractCache] = [][source]
add_cache_layer(cache: AbstractCache)[source]

Adds caching layer.

The priority is below all other.

num_layers()[source]

Returns number of caching layers that are in this LayeredCache.

get(resource: PudlResourceKey) bytes[source]

Returns content of a given resource.

When a resource is found in a distant cache layer, it is automatically populated into all closer (higher-priority) cache layers that are writable. This ensures optimal cache performance for subsequent accesses.

add(resource: PudlResourceKey, content)[source]

Adds (or replaces) resource into the cache with given content.

delete(resource: PudlResourceKey)[source]

Removes resource from the cache if the cache is not in the read_only mode.

contains(resource: PudlResourceKey) bool[source]

Returns True if resource is present in the cache.

is_optimally_cached(resource: PudlResourceKey) bool[source]

Return True if resource is contained in the closest write-enabled layer.