Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-docs-2661.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page describes the pricing, usage limits, and account restrictions that apply to Serverless Inference. Use this information to plan your usage and avoid unexpected charges or interruptions. Review it before you send production traffic, especially if you manage billing or operate at higher concurrency.
If you have questions about pricing, limits, or your account that this page doesn’t answer, contact Support to discuss your requirements.

Pricing

For detailed model pricing information, visit Serverless Inference pricing.

Purchase more credits

Serverless Inference credits come with Free, Pro, and Academic plans for a limited time. Enterprise availability may vary. When credits run out:
  • Free accounts must activate pay-as-you-go inference on the Billing tab, or upgrade to a paid plan to continue using Serverless Inference. Activate pay-as-you-go or upgrade.
  • W&B bills Pro plan users for overages monthly, based on model-specific pricing.
  • Enterprise accounts should contact their account executive.

Account tiers and default usage caps

Each account tier has a default spending cap to help manage costs and prevent unexpected charges. W&B requires prepayment for paid Inference access. The following table shows the default cap for each tier and how to request a change. If you need to change your cap, contact your account executive or Support to adjust your limit.
Account tierDefault capHow to change limit
Free$100/monthUpgrade to Pro or Enterprise
Pro$6,000/monthContact your account executive or support for manual review
Enterprise$700,000/yearContact your account executive or support for manual review

Concurrency limits

Concurrency limits protect service quality by capping how many requests a project or user can have in flight at once. If you exceed the concurrency limit, the API returns a 429 Concurrency limit reached for requests response. To fix this error, reduce the number of concurrent requests. W&B applies concurrency limits per W&B project and per user. For example, if you have three projects in a team, each project has its own concurrency limit quota. If your use case requires increased limits, contact Support to discuss your requirements.

Geographic restrictions

The Inference service is only available from supported geographic locations. For more information, see the Terms of Service.

Next steps

Now that you understand pricing, caps, and concurrency limits, continue to set up your account: