Skip to content

Provider Fallback

Fallback lets Spectra route one logical LLM request across multiple providers or models.

Use it when you want:

  • a backup provider if the primary fails
  • load distribution across providers
  • gradual rollout to a new model
  • quality checks before accepting a fallback response

A fallback policy defines:

  • which providers are in the group
  • which routing strategy to use
  • which quality gate to apply

When to use fallback

Fallback is useful in three common situations:

Goal Example
Resilience Try Anthropic if OpenAI fails
Traffic distribution Spread requests across several providers
Migration or experimentation Send part of traffic to a new model

Define a fallback policy

A fallback policy is a named provider group with a routing strategy.

builder.AddFallbackPolicy("resilient",
    strategy: FallbackStrategy.Failover,
    entries: new[]
    {
        new FallbackProviderEntry { Provider = "openai", Model = "gpt-4o" },
        new FallbackProviderEntry { Provider = "anthropic", Model = "claude-sonnet-4-20250514" },
        new FallbackProviderEntry { Provider = "ollama", Model = "llama3" }
    },
    defaultQualityGate: new MinLengthQualityGate(50));

In this example:

  • Spectra tries providers using the selected strategy
  • if one fails, it moves to the next candidate
  • if a response fails the quality gate, it is rejected and Spectra continues

Choose a strategy

Strategy How it works Best for
Failover Try providers in order until one succeeds Clear primary/backup preference
RoundRobin Rotate which provider is tried first Even distribution
Weighted Choose the starting provider probabilistically by weight Gradual rollout or cost tuning
Split Bucket requests deterministically by percentage Predictable traffic splits

Failover

FallbackStrategy.Failover

Spectra tries providers in order.

Example:

  • OpenAI
  • then Anthropic if OpenAI fails
  • then Ollama if Anthropic fails

Use this when you have a clear preferred provider and only want backups when needed.

Round robin

FallbackStrategy.RoundRobin

Each new request starts with the next provider in the list.

Example:

  • request 1 starts with OpenAI
  • request 2 starts with Anthropic
  • request 3 starts with Ollama

Use this when you want simple load spreading across providers.

Weighted

FallbackStrategy.Weighted

The starting provider is selected probabilistically from the configured weights.

Example:

  • OpenAI weight 70
  • Anthropic weight 30

Over time, about 70% of requests start with OpenAI and 30% with Anthropic.

Use this for gradual migration or cost-based routing.

Split

FallbackStrategy.Split

Requests are assigned to providers deterministically by configured percentages.

Example:

  • OpenAI handles requests 1–70
  • Anthropic handles requests 71–100
  • then the cycle repeats

Use this when you want predictable traffic buckets instead of probabilistic selection.


Provider entries

Each provider in the policy can define its own routing and quality settings.

new FallbackProviderEntry
{
    Provider = "openai",
    Model = "gpt-4o",
    Weight = 70,
    QualityGate = new MinLengthQualityGate(100),
    MaxRequestsPerMinute = 500
}

Entry fields

Field Purpose
Provider Registered provider name
Model Model to use for this entry
Weight Used by Weighted and Split
QualityGate Per-entry quality validation
MaxRequestsPerMinute Skip this entry when its rate budget is exhausted

Quality gates

A quality gate checks whether a provider response is good enough before Spectra accepts it.

This is useful when fallback chains include weaker or cheaper models and you want to avoid silently serving poor results.

IQualityGate

public interface IQualityGate
{
    QualityGateResult Evaluate(LlmResponse response);
}

Built-in gates

MinLengthQualityGate

Rejects responses shorter than a minimum length.

new MinLengthQualityGate(minimumLength: 50)

Useful for rejecting empty, truncated, or clearly incomplete responses.

CompositeQualityGate

Combines multiple gates. All must pass.

new CompositeQualityGate(
    new MinLengthQualityGate(50),
    new MyCustomFormatGate()
)

Custom quality gate

public class JsonFormatGate : IQualityGate
{
    public QualityGateResult Evaluate(LlmResponse response)
    {
        try
        {
            JsonDocument.Parse(response.Content);
            return QualityGateResult.Pass();
        }
        catch
        {
            return QualityGateResult.Fail("Response is not valid JSON.");
        }
    }
}

Use a custom gate when your application needs a specific output format or domain-level validation.


What happens when a provider fails

When a provider attempt fails, Spectra tries the next provider based on the policy strategy.

When a response is returned but fails the quality gate, Spectra also rejects it and moves on.

If every provider fails or is rejected, the fallback client fails the request.


Events

Fallback behavior emits events for observability.

Event When
FallbackTriggeredEvent Spectra moves from one provider to the next
QualityGateRejectedEvent A response is rejected by a quality gate
FallbackExhaustedEvent All providers in the chain have been exhausted

These are useful for logs, dashboards, and reliability monitoring.


Full example

services.AddSpectra(builder =>
{
    builder.AddOpenAi(c => { c.ApiKey = openAiKey; });
    builder.AddAnthropic(c => { c.ApiKey = anthropicKey; });
    builder.AddOllama(c => { c.Model = "llama3"; });

    builder.AddFallbackPolicy("production",
        strategy: FallbackStrategy.Failover,
        entries: new[]
        {
            new FallbackProviderEntry
            {
                Provider = "openai",
                Model = "gpt-4o",
                MaxRequestsPerMinute = 500
            },
            new FallbackProviderEntry
            {
                Provider = "anthropic",
                Model = "claude-sonnet-4-20250514",
                MaxRequestsPerMinute = 300
            },
            new FallbackProviderEntry
            {
                Provider = "ollama",
                Model = "llama3"
            }
        },
        defaultQualityGate: new MinLengthQualityGate(20));
});

This setup gives you:

  • OpenAI as the preferred provider
  • Anthropic as the first backup
  • Ollama as a final fallback
  • basic response quality validation

A simple mental model

Fallback answers one question:

If this provider should not serve the response, who should try next?

That "should not" can mean:

  • the provider failed
  • the provider hit a rate limit
  • the response was not good enough

What's next?

  • Retry & Timeout

Retry transient failures before moving on to another provider.

Retry

  • Caching

Avoid duplicate LLM requests when the same input repeats.

Caching