Welcome to caboodle’s documentation!

class caboodle.artifacts.Artifact(key: str, content: object, deserialize=False)

Bases: object

Represents an artifact which can be passed between steps in a distributed workflow. In general, an artifact can be any object, but we add additional metadata so standardize the serialization / deserialization process. An artifact has a key, which is its name, and content, which is the actual data to be stored. The key is used to refer to the artifact in the storage system.

artifact_type

alias of builtins.object

deserialize(path: Union[str, Type[io.BufferedIOBase]])

Loads the artifact from a given file path.

class caboodle.artifacts.AvroArtifact(key: str, content: object, deserialize=False)

Bases: caboodle.artifacts.Artifact

Serializes an Avro object to file.

artifact_type

alias of builtins.object

class caboodle.artifacts.BinaryArtifact(key: str, content: object, deserialize=False)

Bases: caboodle.artifacts.Artifact

Serializes binary directly to file.

artifact_type

alias of builtins.bytes

deserialize(path_or_buffer: Union[str, Type[io.BufferedIOBase]])

Loads the artifact from a given file path.

class caboodle.artifacts.PickleArtifact(key: str, content: object, deserialize=False)

Bases: caboodle.artifacts.Artifact

Represens a Pickled object as an artifact.

artifact_type

alias of builtins.object

deserialize(path_or_buffer: Union[str, Type[io.BufferedIOBase]])

Loads the artifact from a given file path.

class caboodle.artifacts.get_buffer(path_or_buffer: Union[str, Type[io.BufferedIOBase]], direction='read')

Bases: object

Given an object of type PathOrBuffer, returns a BytesIO buffer either by opening the file or returning the original argument if it is already a BytesIO. Additionally, the direction can be set to ‘read’ or ‘write’ to specify how the file should be opened.

class caboodle.coffer.Coffer

Bases: object

Represents multiple artifacts stored in a single location (GCS bucket, etc.) by the output of or input to a pipeline step on Argo / Kubeflow.

download(local_path: str)

Downloads the Artifacts in the coffer to a local path.

save_artifacts(path: str, artifacts: List[caboodle.artifacts.Artifact]) → str

Serializes a list of artifacts into local disc under a folder at the given path. Returns the name of the randomly seeded subfolder containing the saved artifacts.

serialize_artifacts(artifacts: List[caboodle.artifacts.Artifact]) → Dict[str, bytes]

Serializes a list of artifacts and returns a dictionary mapping their keys to their binary representations.

upload(artifacts: List[caboodle.artifacts.Artifact])

Uploads the artifacts provided to the coffer.

class caboodle.coffer.DebugCoffer

Bases: caboodle.coffer.Coffer

This coffer saves artifacts in memory and is useful for testing.

download() → List[Type[caboodle.artifacts.Artifact]]

Downloads the Artifacts in the coffer to a local path.

upload(artifacts: List[Type[caboodle.artifacts.Artifact]])

Uploads the artifacts provided to the coffer.

class caboodle.coffer.GCSCoffer(gcs_path, storage_client=None)

Bases: caboodle.coffer.Coffer

Represents multiple artifacts stored in a folder in a GCS bucket.

download() → List[Type[caboodle.artifacts.Artifact]]

Downloads the Artifacts in the coffer to a local path.

get(artifact_name) → Type[caboodle.artifacts.Artifact]

Returns an artifact by name.

upload(artifacts: List[Type[caboodle.artifacts.Artifact]])

Uploads the artifacts provided to the coffer.

caboodle.coffer.infer_type(name)

Returns the artifact type to use for a given filename.

caboodle.gcs.check_for_files(gcs_path: str, artifact_names: list, storage_client=None)

Checks to see if the specified file names are present in the gcs directory.

caboodle.gcs.download_file_to_memory(bucket_name: str, file_name: str, buffer_type: str = None, storage_client=None)

Downloads a file hosted in a bucket into a buffer.

caboodle.gcs.download_folder_to_path(bucket_name: str, folder: str, path: str, suffix: str = None, storage_client=None)

Downloads a folder hosted in a bucket to the chosen path.

caboodle.gcs.get_storage_client()

Instantiates a storage client by reading the environment variable GOOGLE_APPLICATION_CREDENTIALS.

caboodle.gcs.parse_gcs_path(gcs_path: str) → Tuple[str, str]

Parses a gcs path string of the form gs://{bucket-name}/{path} into bucket and path components.

caboodle.gcs.upload_all(path: str, bucket_name: str, folder_name: str, verbose: bool = True, replace: bool = True, use_filepaths: bool = True, storage_client=None)

This uploads all files under the given path. If path is a directory, this function will traverse it; if path points to a file, only that file will be uploaded. This uses the Google Cloud storage client referred to by the environment variable GOOGLE_APPLICATION_CREDENTIALS Args:

path: Path to upload from. When uploading, the directory names will be stripped except for the last one. bucket_name: Name of bucket to use folder_name: Name of folder to upload under verbose (default True): Whether or not to print info about upload. replace (default True): If False, then all files that already exist in the bucket will not be uploaded.
caboodle.gcs.upload_string(string: str, bucket_name: str, path: str, verbose: bool = True, replace: bool = True, storage_client=None)

Uploads the contents of string to a GCS bucket at the given path.

Indices and tables