Tool Resilience¶
Tools can fail repeatedly.
That might mean:
- an MCP server is down
- an external API is rate-limiting
- a network dependency is unstable
- a tool endpoint is timing out
Spectra can wrap tools with a circuit breaker so unhealthy tools stop being called for a while instead of failing over and over.
Note
This page is about resilience for tool execution. For resilience on LLM calls, see Retry & Timeout and Provider Fallback.
Enable tool resilience¶
services.AddSpectra(builder =>
{
builder.AddToolResilience(opts =>
{
opts.FailureThreshold = 3;
opts.CooldownPeriod = TimeSpan.FromSeconds(30);
});
});
When enabled, Spectra wraps each registered tool with a resilience decorator that tracks failures and manages circuit state.
How the circuit breaker works¶
flowchart LR
A[Closed] -->|N consecutive failures| B[Open]
B -->|Cooldown expires| C[HalfOpen]
C -->|Probe succeeds| A
C -->|Probe fails| B
States¶
| State | Behavior |
|---|---|
Closed |
Normal operation. Tool calls run normally |
Open |
Calls are rejected immediately without executing the tool |
HalfOpen |
A small number of probe calls are allowed to test recovery |
A simple mental model:
- Closed = healthy
- Open = stop calling it
- HalfOpen = test whether it recovered
Configuration¶
builder.AddToolResilience(opts =>
{
opts.FailureThreshold = 5;
opts.CooldownPeriod = TimeSpan.FromSeconds(60);
opts.HalfOpenMaxAttempts = 1;
opts.SuccessThresholdToClose = 1;
});
Options¶
| Option | Default | Description |
|---|---|---|
FailureThreshold |
5 |
Consecutive failures before opening the circuit |
CooldownPeriod |
60s |
How long the circuit stays open before probing again |
HalfOpenMaxAttempts |
1 |
Number of probe calls allowed in half-open state |
SuccessThresholdToClose |
1 |
Successful probe calls required to close the circuit |
Fallback tools¶
You can also map one tool to another fallback tool.
builder.AddToolResilience(opts =>
{
opts.FailureThreshold = 3;
opts.FallbackTools["mcp:weather-api:get_forecast"] = "mcp:backup-weather:get_forecast";
opts.FallbackTools["external_search"] = "local_search";
});
When the primary tool's circuit is open, Spectra routes the call to the fallback tool instead.
This is useful when you have:
- a backup MCP server
- a local replacement for a remote API
- a degraded but still usable alternative
The agent does not need to change how it calls the tool.
What happens during failure¶
With tool resilience enabled, the normal flow is:
- the tool fails repeatedly
- the circuit opens after the configured threshold
- future calls are rejected immediately
- after the cooldown, Spectra allows probe calls
- if the tool recovers, the circuit closes
- if not, it opens again
If a fallback tool is configured, Spectra can route to that tool when the primary is open.
Events¶
Circuit state changes emit events through IEventSink.
Use these to:
- detect unhealthy tools
- alert on repeated failures
- monitor fallback usage
- track recovery over time
A practical mental model¶
Tool resilience answers one question:
Should Spectra keep calling this tool right now?
- if yes, execute it
- if no, reject it or use a fallback
That is the core behavior.
What's next?¶
- Tools Overview
Learn how tools are defined and registered.
- MCP Integration
Connect external MCP tool servers.
- Retry & Timeout
Add resilience around LLM provider calls.