W&B Skills are reusable instruction sets that teach coding agents how to use W&B effectively. Instead of manually guiding your agent through W&B APIs and best practices, install Skills so that the agent can work with experiment tracking, tracing, evaluations, and monitoring on its own. Skills work with several major coding agents, including:Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-docs-2661.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Claude Code
- Codex
- Cursor
- GitHub Copilot
- Gemini CLI
W&B Skills capabilities
Skills covers both the W&B Models SDK (training runs, metrics, artifacts, sweeps) and the Weave SDK (traces, evaluations, scorers). It includes helper libraries, reference docs, and data analysis patterns so your agent can handle the following workflows.| Workflow | Capabilities |
|---|---|
| Model training |
|
| Agent building |
|
Prerequisites
Skills requires the following:- Node.js (for the
npxcommand). - A W&B API key. Create one at wandb.ai/authorize and then set it as an environment variable. Replace
[YOUR-API-KEY]with your API key: - Optional: Set your W&B project name as a
WANDB_PROJECTenvironment variable. This lets your agent target the correct W&B project without you specifying it each time.
Install W&B Skills
Choose a global installation to make Skills available to all your projects, or a project-specific installation to scope Skills to a single project. To install W&B Skills globally for all your projects, use the--global flag:
--global flag:
--agent flag:
--agent and --skill options, see the skills CLI documentation.
After installation completes, your agent has access to W&B Skills and is ready to handle W&B-related tasks.
Use W&B Skills
Once installed, you can ask the agent to perform W&B-related tasks for your project. The following example prompts demonstrate some of the tasks your agent can do with W&B Skills:- “Log training metrics for my PyTorch model to W&B.”
- “Analyze the loss curves for my last 10 runs and identify the best performing configuration.”
- “Trace my LangChain agent and log the results to Weave.”
- “Run an evaluation on my agent using the test dataset and summarize the results.”
- “Find the failure modes in my last evaluation and classify them.”
- “Compare the configs of run A and run B and show me the differences.”
Usage tips
Skills performs better with specific queries than with broad, open-ended questions. The following table compares recommended prompts with prompts that are too vague.| Recommended | Not recommended |
|---|---|
| ”What is the final validation loss for my last 5 runs?" | "How is my model doing?" |
| "Summarize the token usage across my last 10 traces." | "Show me all my traces." |
| "Compare the configs of run A and run B." | "What are my best runs?" |
| "What eval had the highest F1 score?" | "How are my evaluations going?” |