Welcome to caboodle’s documentation!¶
-
class
caboodle.artifacts.
Artifact
(key: str, content: object, deserialize=False)¶ Bases:
object
Represents an artifact which can be passed between steps in a distributed workflow. In general, an artifact can be any object, but we add additional metadata so standardize the serialization / deserialization process. An artifact has a key, which is its name, and content, which is the actual data to be stored. The key is used to refer to the artifact in the storage system.
-
artifact_type
¶ alias of
builtins.object
-
deserialize
(path: Union[str, Type[io.BufferedIOBase]])¶ Loads the artifact from a given file path.
-
-
class
caboodle.artifacts.
AvroArtifact
(key: str, content: object, deserialize=False)¶ Bases:
caboodle.artifacts.Artifact
Serializes an Avro object to file.
-
artifact_type
¶ alias of
builtins.object
-
-
class
caboodle.artifacts.
BinaryArtifact
(key: str, content: object, deserialize=False)¶ Bases:
caboodle.artifacts.Artifact
Serializes binary directly to file.
-
artifact_type
¶ alias of
builtins.bytes
-
deserialize
(path_or_buffer: Union[str, Type[io.BufferedIOBase]])¶ Loads the artifact from a given file path.
-
-
class
caboodle.artifacts.
PickleArtifact
(key: str, content: object, deserialize=False)¶ Bases:
caboodle.artifacts.Artifact
Represens a Pickled object as an artifact.
-
artifact_type
¶ alias of
builtins.object
-
deserialize
(path_or_buffer: Union[str, Type[io.BufferedIOBase]])¶ Loads the artifact from a given file path.
-
-
class
caboodle.artifacts.
get_buffer
(path_or_buffer: Union[str, Type[io.BufferedIOBase]], direction='read')¶ Bases:
object
Given an object of type PathOrBuffer, returns a BytesIO buffer either by opening the file or returning the original argument if it is already a BytesIO. Additionally, the direction can be set to ‘read’ or ‘write’ to specify how the file should be opened.
-
class
caboodle.coffer.
Coffer
¶ Bases:
object
Represents multiple artifacts stored in a single location (GCS bucket, etc.) by the output of or input to a pipeline step on Argo / Kubeflow.
-
download
(local_path: str)¶ Downloads the Artifacts in the coffer to a local path.
-
save_artifacts
(path: str, artifacts: List[caboodle.artifacts.Artifact]) → str¶ Serializes a list of artifacts into local disc under a folder at the given path. Returns the name of the randomly seeded subfolder containing the saved artifacts.
-
serialize_artifacts
(artifacts: List[caboodle.artifacts.Artifact]) → Dict[str, bytes]¶ Serializes a list of artifacts and returns a dictionary mapping their keys to their binary representations.
-
upload
(artifacts: List[caboodle.artifacts.Artifact])¶ Uploads the artifacts provided to the coffer.
-
-
class
caboodle.coffer.
DebugCoffer
¶ Bases:
caboodle.coffer.Coffer
This coffer saves artifacts in memory and is useful for testing.
-
download
() → List[Type[caboodle.artifacts.Artifact]]¶ Downloads the Artifacts in the coffer to a local path.
-
upload
(artifacts: List[Type[caboodle.artifacts.Artifact]])¶ Uploads the artifacts provided to the coffer.
-
-
class
caboodle.coffer.
GCSCoffer
(gcs_path, storage_client=None)¶ Bases:
caboodle.coffer.Coffer
Represents multiple artifacts stored in a folder in a GCS bucket.
-
download
() → List[Type[caboodle.artifacts.Artifact]]¶ Downloads the Artifacts in the coffer to a local path.
-
get
(artifact_name) → Type[caboodle.artifacts.Artifact]¶ Returns an artifact by name.
-
upload
(artifacts: List[Type[caboodle.artifacts.Artifact]])¶ Uploads the artifacts provided to the coffer.
-
-
caboodle.coffer.
infer_type
(name)¶ Returns the artifact type to use for a given filename.
-
caboodle.gcs.
check_for_files
(gcs_path: str, artifact_names: list, storage_client=None)¶ Checks to see if the specified file names are present in the gcs directory.
-
caboodle.gcs.
download_file_to_memory
(bucket_name: str, file_name: str, buffer_type: str = None, storage_client=None)¶ Downloads a file hosted in a bucket into a buffer.
-
caboodle.gcs.
download_folder_to_path
(bucket_name: str, folder: str, path: str, suffix: str = None, storage_client=None)¶ Downloads a folder hosted in a bucket to the chosen path.
-
caboodle.gcs.
get_storage_client
()¶ Instantiates a storage client by reading the environment variable GOOGLE_APPLICATION_CREDENTIALS.
-
caboodle.gcs.
parse_gcs_path
(gcs_path: str) → Tuple[str, str]¶ Parses a gcs path string of the form gs://{bucket-name}/{path} into bucket and path components.
-
caboodle.gcs.
upload_all
(path: str, bucket_name: str, folder_name: str, verbose: bool = True, replace: bool = True, use_filepaths: bool = True, storage_client=None)¶ This uploads all files under the given path. If path is a directory, this function will traverse it; if path points to a file, only that file will be uploaded. This uses the Google Cloud storage client referred to by the environment variable GOOGLE_APPLICATION_CREDENTIALS Args:
path: Path to upload from. When uploading, the directory names will be stripped except for the last one. bucket_name: Name of bucket to use folder_name: Name of folder to upload under verbose (default True): Whether or not to print info about upload. replace (default True): If False, then all files that already exist in the bucket will not be uploaded.
-
caboodle.gcs.
upload_string
(string: str, bucket_name: str, path: str, verbose: bool = True, replace: bool = True, storage_client=None)¶ Uploads the contents of string to a GCS bucket at the given path.