Welcome to caboodle’s documentation!¶
-
class
caboodle.artifacts.Artifact(key: str, content: object, deserialize=False)¶ Bases:
objectRepresents an artifact which can be passed between steps in a distributed workflow. In general, an artifact can be any object, but we add additional metadata so standardize the serialization / deserialization process. An artifact has a key, which is its name, and content, which is the actual data to be stored. The key is used to refer to the artifact in the storage system.
-
artifact_type¶ alias of
builtins.object
-
deserialize(path: Union[str, Type[io.BufferedIOBase]])¶ Loads the artifact from a given file path.
-
-
class
caboodle.artifacts.AvroArtifact(key: str, content: object, deserialize=False)¶ Bases:
caboodle.artifacts.ArtifactSerializes an Avro object to file.
-
artifact_type¶ alias of
builtins.object
-
-
class
caboodle.artifacts.BinaryArtifact(key: str, content: object, deserialize=False)¶ Bases:
caboodle.artifacts.ArtifactSerializes binary directly to file.
-
artifact_type¶ alias of
builtins.bytes
-
deserialize(path_or_buffer: Union[str, Type[io.BufferedIOBase]])¶ Loads the artifact from a given file path.
-
-
class
caboodle.artifacts.PickleArtifact(key: str, content: object, deserialize=False)¶ Bases:
caboodle.artifacts.ArtifactRepresens a Pickled object as an artifact.
-
artifact_type¶ alias of
builtins.object
-
deserialize(path_or_buffer: Union[str, Type[io.BufferedIOBase]])¶ Loads the artifact from a given file path.
-
-
class
caboodle.artifacts.get_buffer(path_or_buffer: Union[str, Type[io.BufferedIOBase]], direction='read')¶ Bases:
objectGiven an object of type PathOrBuffer, returns a BytesIO buffer either by opening the file or returning the original argument if it is already a BytesIO. Additionally, the direction can be set to ‘read’ or ‘write’ to specify how the file should be opened.
-
class
caboodle.coffer.Coffer¶ Bases:
objectRepresents multiple artifacts stored in a single location (GCS bucket, etc.) by the output of or input to a pipeline step on Argo / Kubeflow.
-
download(local_path: str)¶ Downloads the Artifacts in the coffer to a local path.
-
save_artifacts(path: str, artifacts: List[caboodle.artifacts.Artifact]) → str¶ Serializes a list of artifacts into local disc under a folder at the given path. Returns the name of the randomly seeded subfolder containing the saved artifacts.
-
serialize_artifacts(artifacts: List[caboodle.artifacts.Artifact]) → Dict[str, bytes]¶ Serializes a list of artifacts and returns a dictionary mapping their keys to their binary representations.
-
upload(artifacts: List[caboodle.artifacts.Artifact])¶ Uploads the artifacts provided to the coffer.
-
-
class
caboodle.coffer.DebugCoffer¶ Bases:
caboodle.coffer.CofferThis coffer saves artifacts in memory and is useful for testing.
-
download() → List[Type[caboodle.artifacts.Artifact]]¶ Downloads the Artifacts in the coffer to a local path.
-
upload(artifacts: List[Type[caboodle.artifacts.Artifact]])¶ Uploads the artifacts provided to the coffer.
-
-
class
caboodle.coffer.GCSCoffer(gcs_path, storage_client=None)¶ Bases:
caboodle.coffer.CofferRepresents multiple artifacts stored in a folder in a GCS bucket.
-
download() → List[Type[caboodle.artifacts.Artifact]]¶ Downloads the Artifacts in the coffer to a local path.
-
get(artifact_name) → Type[caboodle.artifacts.Artifact]¶ Returns an artifact by name.
-
upload(artifacts: List[Type[caboodle.artifacts.Artifact]])¶ Uploads the artifacts provided to the coffer.
-
-
caboodle.coffer.infer_type(name)¶ Returns the artifact type to use for a given filename.
-
caboodle.gcs.check_for_files(gcs_path: str, artifact_names: list, storage_client=None)¶ Checks to see if the specified file names are present in the gcs directory.
-
caboodle.gcs.download_file_to_memory(bucket_name: str, file_name: str, buffer_type: str = None, storage_client=None)¶ Downloads a file hosted in a bucket into a buffer.
-
caboodle.gcs.download_folder_to_path(bucket_name: str, folder: str, path: str, suffix: str = None, storage_client=None)¶ Downloads a folder hosted in a bucket to the chosen path.
-
caboodle.gcs.get_storage_client()¶ Instantiates a storage client by reading the environment variable GOOGLE_APPLICATION_CREDENTIALS.
-
caboodle.gcs.parse_gcs_path(gcs_path: str) → Tuple[str, str]¶ Parses a gcs path string of the form gs://{bucket-name}/{path} into bucket and path components.
-
caboodle.gcs.upload_all(path: str, bucket_name: str, folder_name: str, verbose: bool = True, replace: bool = True, use_filepaths: bool = True, storage_client=None)¶ This uploads all files under the given path. If path is a directory, this function will traverse it; if path points to a file, only that file will be uploaded. This uses the Google Cloud storage client referred to by the environment variable GOOGLE_APPLICATION_CREDENTIALS Args:
path: Path to upload from. When uploading, the directory names will be stripped except for the last one. bucket_name: Name of bucket to use folder_name: Name of folder to upload under verbose (default True): Whether or not to print info about upload. replace (default True): If False, then all files that already exist in the bucket will not be uploaded.
-
caboodle.gcs.upload_string(string: str, bucket_name: str, path: str, verbose: bool = True, replace: bool = True, storage_client=None)¶ Uploads the contents of string to a GCS bucket at the given path.