Rate limiting
Configure token bucket rate limits on MCPServer resources to control how many tool invocations users can make. Rate limiting prevents individual users from monopolizing shared servers and protects downstream services from traffic spikes.
ToolHive supports two scopes of rate limiting:
- Shared limits cap total requests across all users.
- Per-user limits cap requests independently for each authenticated user.
Both scopes can be applied at the server level and overridden per tool. A request must pass all applicable limits to proceed.
Before you begin, ensure you have:
- A Kubernetes cluster with the ToolHive Operator installed
- Redis deployed in your cluster — rate limiting stores token bucket counters in Redis (see Redis Sentinel session storage for deployment instructions)
- For per-user limits: authentication enabled on the MCPServer (
oidcConfig,oidcConfigRef, orexternalAuthConfigRef)
If you need help with these prerequisites, see:
How rate limiting works
Rate limits use a token bucket algorithm. Each bucket has a capacity
(maxTokens) and a refill period (refillPeriod). The bucket starts full and
each tools/call request consumes one token. When the bucket is empty, requests
are rejected until tokens refill. The refill rate is maxTokens / refillPeriod
tokens per second.
Only tools/call requests are rate-limited. Lifecycle methods (initialize,
ping) and discovery methods (tools/list, prompts/list) pass through
unconditionally.
When a request is rejected, the proxy returns:
- HTTP 429 with a
Retry-Afterheader (seconds until a token is available) - A JSON-RPC error with code
-32029andretryAfterSecondsin the error data
If Redis is unreachable, rate limiting fails open and all requests are allowed through.
Configure shared rate limits
Shared limits apply a single token bucket across all users. Use them to cap total throughput to protect downstream services.
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
name: weather-server
spec:
image: ghcr.io/stackloklabs/weather-mcp/server
transport: streamable-http
sessionStorage:
provider: redis
address: <YOUR_REDIS_ADDRESS>
rateLimiting:
shared:
maxTokens: 1000
refillPeriod: 1m0s
This allows 1,000 total tools/call requests per minute across all users.
Configure per-user rate limits
Per-user limits give each authenticated user their own independent token bucket. This prevents a single user from consuming the entire server capacity.
Per-user limits require authentication to be enabled. The proxy identifies
users by the sub claim from their JWT token.
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
name: weather-server
spec:
image: ghcr.io/stackloklabs/weather-mcp/server
transport: streamable-http
oidcConfig:
type: inline
inline:
issuer: https://my-idp.example.com
audience: my-audience
sessionStorage:
provider: redis
address: <YOUR_REDIS_ADDRESS>
rateLimiting:
perUser:
maxTokens: 100
refillPeriod: 1m0s
This allows each user 100 tools/call requests per minute independently.
Combine shared and per-user limits
You can configure both scopes together. A request must pass all applicable limits. This lets you set a per-user ceiling while also capping total server throughput.
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
name: weather-server
spec:
image: ghcr.io/stackloklabs/weather-mcp/server
transport: streamable-http
oidcConfig:
type: inline
inline:
issuer: https://my-idp.example.com
audience: my-audience
sessionStorage:
provider: redis
address: <YOUR_REDIS_ADDRESS>
rateLimiting:
shared:
maxTokens: 1000
refillPeriod: 1m0s
perUser:
maxTokens: 100
refillPeriod: 1m0s
Add per-tool overrides
Individual tools can have tighter limits than the server default. Per-tool limits are enforced in addition to server-level limits.
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
name: weather-server
spec:
image: ghcr.io/stackloklabs/weather-mcp/server
transport: streamable-http
oidcConfig:
type: inline
inline:
issuer: https://my-idp.example.com
audience: my-audience
sessionStorage:
provider: redis
address: <YOUR_REDIS_ADDRESS>
rateLimiting:
perUser:
maxTokens: 100
refillPeriod: 1m0s
tools:
- name: expensive_search
perUser:
maxTokens: 10
refillPeriod: 1m0s
- name: shared_resource
shared:
maxTokens: 50
refillPeriod: 1m0s
In this example:
- Each user can make 100 total tool calls per minute.
- Each user can make at most 10
expensive_searchcalls per minute (and those also count toward the 100 server-level limit). - All users combined can make 50
shared_resourcecalls per minute.
Next steps
- Token exchange to configure token exchange for upstream service authentication
- CRD reference for complete field definitions