pudl.workspace.resource_cache¶
Implementations of datastore resource caches.
Attributes¶
Classes¶
Uniquely identifies a specific resource. |
|
Defines interaface for the generic resource caching layer. |
|
Implements file cache using UPath for unified access to multiple storage backends. |
|
Implements multi-layered system of caches. |
Module Contents¶
- class pudl.workspace.resource_cache.PudlResourceKey[source]¶
Bases:
NamedTupleUniquely identifies a specific resource.
- get_local_path() pathlib.Path[source]¶
Returns (relative) path that should be used when caching this resource.
- class pudl.workspace.resource_cache.AbstractCache(read_only: bool = False)[source]¶
Bases:
abc.ABCDefines interaface for the generic resource caching layer.
- abstractmethod get(resource: PudlResourceKey) bytes[source]¶
Retrieves content of given resource or throws KeyError.
- abstractmethod add(resource: PudlResourceKey, content: bytes) None[source]¶
Adds resource to the cache and sets the content.
- abstractmethod delete(resource: PudlResourceKey) None[source]¶
Removes the resource from cache.
- abstractmethod contains(resource: PudlResourceKey) bool[source]¶
Returns True if the resource is present in the cache.
- class pudl.workspace.resource_cache.UPathCache(storage_upath: upath.UPath, **kwargs: Any)[source]¶
Bases:
AbstractCacheImplements file cache using UPath for unified access to multiple storage backends.
This cache uses universal_pathlib’s UPath to provide a unified interface for accessing data stored in S3, GCS, or local filesystems. It handles backend-specific authentication and credential management internally.
- Requires UPath objects with explicit protocols:
s3://bucket-name/path/prefix
gs://bucket-name/path/prefix
- _setup_credentials() dict[str, Any][source]¶
Set up backend-specific credentials and storage options.
This should be the only place where backend-specific logic is required.
- Returns:
Dictionary of storage options to pass to UPath
- _resource_path(resource: PudlResourceKey) upath.UPath[source]¶
Get the UPath for a given resource.
- Parameters:
resource – The resource to get the path for
- Returns:
UPath object pointing to the resource location
- get(resource: PudlResourceKey) bytes[source]¶
Retrieves value associated with given resource.
- add(resource: PudlResourceKey, content: bytes)[source]¶
Adds (or updates) resource to the cache with given content.
- Parameters:
resource – The resource to add
content – The content to store
- Raises:
RuntimeError – if cache is read-only or credentials are insufficient
- delete(resource: PudlResourceKey)[source]¶
Deletes resource from the cache.
- Parameters:
resource – The resource to delete
- Raises:
RuntimeError – if cache is read-only or credentials are insufficient
- contains(resource: PudlResourceKey) bool[source]¶
Returns True if resource is present in the cache.
- Parameters:
resource – The resource to check
- Returns:
True if the resource exists, False otherwise
- class pudl.workspace.resource_cache.LayeredCache(*caches: AbstractCache, **kwargs: Any)[source]¶
Bases:
AbstractCacheImplements multi-layered system of caches.
This allows building multi-layered system of caches. The idea is that you can have faster local caches with fall-back to the more remote or expensive caches that can be accessed in case of missing content.
Only the closest layer is being written to (set, delete), while all remaining layers are read-only (get).
- _caches: list[AbstractCache] = [][source]¶
- add_cache_layer(cache: AbstractCache)[source]¶
Adds caching layer.
The priority is below all other.
- get(resource: PudlResourceKey) bytes[source]¶
Returns content of a given resource.
When a resource is found in a distant cache layer, it is automatically populated into all closer (higher-priority) cache layers that are writable. This ensures optimal cache performance for subsequent accesses.
- add(resource: PudlResourceKey, content)[source]¶
Adds (or replaces) resource into the cache with given content.
- delete(resource: PudlResourceKey)[source]¶
Removes resource from the cache if the cache is not in the read_only mode.
- contains(resource: PudlResourceKey) bool[source]¶
Returns True if resource is present in the cache.
- is_optimally_cached(resource: PudlResourceKey) bool[source]¶
Return True if resource is contained in the closest write-enabled layer.