Retry & Timeout¶

LLM calls fail in real systems.

Common failures include:

rate limits
transient server errors
network errors
requests that take too long

Spectra handles this with ResilientLlmClient, which adds:

retries for transient failures
per-attempt timeouts
backoff between attempts

Permanent failures fail fast.

What it does¶

ResilientLlmClient wraps an ILlmClient and retries only when the failure looks temporary.

That means:

retry on rate limits and transient server problems
retry on timeouts and network errors
do not retry bad requests or auth failures

Each retry attempt gets its own timeout window.

Configuration¶

var options = new LlmResilienceOptions
{
    MaxRetries = 3,
    BaseDelay = TimeSpan.FromSeconds(1),
    MaxDelay = TimeSpan.FromSeconds(30),
    Timeout = TimeSpan.FromSeconds(60),
    UseExponentialBackoff = true,
    RetryableStatusCodes = new HashSet<int>
    {
        429,
        500,
        502,
        503,
        504
    }
};

Options reference¶

Option	Default	Description
`MaxRetries`	`3`	Number of retry attempts after the initial failure
`BaseDelay`	`1s`	Starting delay between retries
`MaxDelay`	`30s`	Maximum delay between retries
`Timeout`	`60s`	Per-attempt timeout
`UseExponentialBackoff`	`true`	Doubles delay on each retry, with jitter
`RetryableStatusCodes`	`429, 500-504`	HTTP codes treated as transient

Set MaxRetries = 0 to disable retries.

Set Timeout = Timeout.InfiniteTimeSpan to disable per-attempt timeouts.

Backoff behavior¶

With the default settings:

BaseDelay = 1s
exponential backoff enabled

the retry timing looks roughly like this:

Attempt	Delay before attempt
1	none
2	about `1.0 – 1.25s`
3	about `2.0 – 2.5s`
4	about `4.0 – 5.0s`

Spectra adds jitter so many workflows do not all retry at exactly the same moment.

That helps reduce retry storms under load.

What gets retried¶

Failure type	Retried?
HTTP `429`	Yes
HTTP `500`, `502`, `503`, `504`	Yes
Per-attempt timeout	Yes
`HttpRequestException`	Yes
HTTP `400`	No
HTTP `401` / `403`	No
Other failures	Only if treated as retryable by configuration

A simple rule:

temporary problem → retry
bad request or auth problem → fail fast

How timeout works¶

The timeout applies per attempt, not to the full retry sequence.

So with:

Timeout = 60s
MaxRetries = 3

you could have up to four attempts total:

1 initial attempt
3 retries

Each one can run for up to 60 seconds before timing out.

That makes timeout behavior predictable and easier to tune.

Advanced usage¶

Most applications use the resilient client as part of Spectra's normal provider/client composition.

If you need manual composition:

var raw = provider.CreateClient(agent);
var resilient = new ResilientLlmClient(raw, options);

This is mainly useful for advanced customization or testing.

A practical mental model¶

ResilientLlmClient answers one question:

Was this failure temporary enough to try again?

if yes, retry with delay
if no, stop immediately

That is the core behavior.

What's next?¶

Provider Fallback

Route failures across alternative providers or models.

Fallback

Caching

Avoid repeated LLM calls for the same request.

Caching