Welcome to caboodle’s documentation!

class caboodle.artifacts.Artifact(key: str, content: object, deserialize=False)

Bases: object

Represents an artifact which can be passed between steps in a distributed workflow. In general, an artifact can be any object, but we add additional metadata so standardize the serialization / deserialization process. An artifact has a key, which is its name, and content, which is the actual data to be stored. The key is used to refer to the artifact in the storage system.


alias of builtins.object

deserialize(path: Union[str, Type[io.BufferedIOBase]])

Loads the artifact from a given file path.

class caboodle.artifacts.AvroArtifact(key: str, content: object, deserialize=False)

Bases: caboodle.artifacts.Artifact

Serializes an Avro object to file.


alias of builtins.object

class caboodle.artifacts.BinaryArtifact(key: str, content: object, deserialize=False)

Bases: caboodle.artifacts.Artifact

Serializes binary directly to file.


alias of builtins.bytes

deserialize(path_or_buffer: Union[str, Type[io.BufferedIOBase]])

Loads the artifact from a given file path.

class caboodle.artifacts.PickleArtifact(key: str, content: object, deserialize=False)

Bases: caboodle.artifacts.Artifact

Represens a Pickled object as an artifact.


alias of builtins.object

deserialize(path_or_buffer: Union[str, Type[io.BufferedIOBase]])

Loads the artifact from a given file path.

class caboodle.artifacts.get_buffer(path_or_buffer: Union[str, Type[io.BufferedIOBase]], direction='read')

Bases: object

Given an object of type PathOrBuffer, returns a BytesIO buffer either by opening the file or returning the original argument if it is already a BytesIO. Additionally, the direction can be set to ‘read’ or ‘write’ to specify how the file should be opened.

class caboodle.coffer.Coffer

Bases: object

Represents multiple artifacts stored in a single location (GCS bucket, etc.) by the output of or input to a pipeline step on Argo / Kubeflow.

download(local_path: str)

Downloads the Artifacts in the coffer to a local path.

save_artifacts(path: str, artifacts: List[caboodle.artifacts.Artifact]) → str

Serializes a list of artifacts into local disc under a folder at the given path. Returns the name of the randomly seeded subfolder containing the saved artifacts.

serialize_artifacts(artifacts: List[caboodle.artifacts.Artifact]) → Dict[str, bytes]

Serializes a list of artifacts and returns a dictionary mapping their keys to their binary representations.

upload(artifacts: List[caboodle.artifacts.Artifact])

Uploads the artifacts provided to the coffer.

class caboodle.coffer.DebugCoffer

Bases: caboodle.coffer.Coffer

This coffer saves artifacts in memory and is useful for testing.

download() → List[Type[caboodle.artifacts.Artifact]]

Downloads the Artifacts in the coffer to a local path.

upload(artifacts: List[Type[caboodle.artifacts.Artifact]])

Uploads the artifacts provided to the coffer.

class caboodle.coffer.GCSCoffer(gcs_path, storage_client=None)

Bases: caboodle.coffer.Coffer

Represents multiple artifacts stored in a folder in a GCS bucket.

download() → List[Type[caboodle.artifacts.Artifact]]

Downloads the Artifacts in the coffer to a local path.

get(artifact_name) → Type[caboodle.artifacts.Artifact]

Returns an artifact by name.

upload(artifacts: List[Type[caboodle.artifacts.Artifact]])

Uploads the artifacts provided to the coffer.


Returns the artifact type to use for a given filename.

caboodle.gcs.check_for_files(gcs_path: str, artifact_names: list, storage_client=None)

Checks to see if the specified file names are present in the gcs directory.

caboodle.gcs.download_file_to_memory(bucket_name: str, file_name: str, buffer_type: str = None, storage_client=None)

Downloads a file hosted in a bucket into a buffer.

caboodle.gcs.download_folder_to_path(bucket_name: str, folder: str, path: str, suffix: str = None, storage_client=None)

Downloads a folder hosted in a bucket to the chosen path.


Instantiates a storage client by reading the environment variable GOOGLE_APPLICATION_CREDENTIALS.

caboodle.gcs.parse_gcs_path(gcs_path: str) → Tuple[str, str]

Parses a gcs path string of the form gs://{bucket-name}/{path} into bucket and path components.

caboodle.gcs.upload_all(path: str, bucket_name: str, folder_name: str, verbose: bool = True, replace: bool = True, use_filepaths: bool = True, storage_client=None)

This uploads all files under the given path. If path is a directory, this function will traverse it; if path points to a file, only that file will be uploaded. This uses the Google Cloud storage client referred to by the environment variable GOOGLE_APPLICATION_CREDENTIALS Args:

path: Path to upload from. When uploading, the directory names will be stripped except for the last one. bucket_name: Name of bucket to use folder_name: Name of folder to upload under verbose (default True): Whether or not to print info about upload. replace (default True): If False, then all files that already exist in the bucket will not be uploaded.
caboodle.gcs.upload_string(string: str, bucket_name: str, path: str, verbose: bool = True, replace: bool = True, storage_client=None)

Uploads the contents of string to a GCS bucket at the given path.

Indices and tables