Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-docs-2661.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

A 429 error with the message “Concurrency limit reached for requests” means you’re sending too many concurrent requests to the Serverless Inference API. This page explains why the error occurs and how to resolve it so your requests succeed.

Why this happens

Serverless Inference enforces concurrency limits to maintain fair usage and service stability. When the number of simultaneous requests from your account exceeds the allowed limit, additional requests are rejected with a 429 status code.

What you can do

To resolve the error, choose one or both of the following approaches based on your workload and plan.
  • Reduce concurrent requests to stay within your current limit:
    • Implement request queuing or throttling in your application.
    • Use exponential backoff when retrying failed requests.
  • Increase your limits if your workload requires more capacity. Review your plan’s concurrency limits and upgrade if needed.
For more information, see Usage information and limits.
Quotas & Rate Limits