Skip to main content

Artifacts

Use W&B Artifacts to track and version any serialized data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and trained model as output. In addition to logging hyper-parameters and metadata to a run, you can use an artifact to log the dataset used to train the model as input and the resulting model checkpoints as outputs. You will always be able answer the question “what version of my dataset was this model trained on”.

In summary, with W&B Artifacts, you can:

The diagram above demonstrates how you can use artifacts throughout your entire ML workflow; as inputs and outputs of runs.

How it works

Create an artifact with four lines of code:

  1. Create a W&B run.
  2. Create an artifact object with the wandb.Artifact API.
  3. Add one or more files, such as a model file or dataset, to your artifact object.
  4. Log your artifact to W&B.
run = wandb.init(project="artifacts-example", job_type="add-dataset")
artifact = wandb.Artifact(name="my_data", type="dataset")
artifact.add_dir(local_path="./dataset.h5") # Add dataset directory to artifact
run.log_artifact(artifact) # Logs the artifact version "my_data:v0"
tip

The preceding code snippet, and the colab linked on this page, show how to track files by uploading them to W&B. See the track external files page for information on how to add references to files or directories that are stored in external object storage (for example, in an Amazon S3 bucket).

How to get started

Depending on your use case, explore the following resources to get started with W&B Artifacts:

Was this page helpful?👍👎