Serverless SFT - Weights & Biases Documentation

Use Serverless SFT to fine-tune LLMs with supervised learning on curated datasets. Serverless SFT is in public preview. W&B provisions the training infrastructure (on CoreWeave) for you and gives you full flexibility to set up your environment. You get instant access to a managed training cluster that auto-scales to handle your training workloads. Serverless SFT is ideal for tasks such as:

Distillation: Transferring knowledge from a larger, more capable model into a smaller, faster one.
Teaching output style and format: Training a model to follow specific response formats, tone, or structure.
Warmup before RL: Pre-training a model with supervised examples before applying reinforcement learning for further refinement.

Serverless SFT trains low-rank adapters (LoRAs) to specialize a model for your specific task. W&B automatically stores the LoRAs you train as artifacts in your account. You can also save them locally or to a third party for backup. Serverless Inference also automatically hosts models that you train through Serverless SFT. To begin training a model with Serverless SFT, see the ART Serverless SFT docs.

Why Serverless SFT

Supervised fine-tuning (SFT) is a training technique where a model learns from curated input-output examples. Serverless SFT on W&B provides the following advantages:

Lower training costs: Serverless SFT multiplexes shared infrastructure across many users, skips the setup process for each job, and scales your GPU costs down to zero when you aren’t actively training. This reduces training costs.
Faster training time: Serverless SFT immediately provisions training infrastructure when you need it. This speeds up your training jobs and lets you iterate faster.
Automatic deployment: Serverless SFT automatically deploys every checkpoint you train, so you don’t need to manually set up hosting infrastructure. You can access and test trained models immediately in local, staging, or production environments.

How Serverless SFT uses W&B services

Serverless SFT uses a combination of the following W&B components to operate:

Inference: To run your models.
Models: To track performance metrics during the LoRA adapter’s training.
Artifacts: To store and version the LoRA adapters.
Weave (Optional): To gain observability into how the model responds at each step of the training loop.

Serverless SFT is in public preview. During the preview, W&B charges you only for inference usage and artifact storage. W&B doesn’t charge for adapter training during the preview period.

Use your trained models

How to use Serverless SFT

⌘I

Documentation Index

​Why Serverless SFT

​How Serverless SFT uses W&B services

Why Serverless SFT

How Serverless SFT uses W&B services