Checkpointing¶
A checkpoint is a saved snapshot of workflow execution.
It lets Spectra resume from where a workflow stopped instead of starting again from the beginning.
This is useful when a workflow:
- crashes
- pauses for human input
- waits for a session message
- fails late in the pipeline
- needs to be resumed or inspected later
Why checkpoints matter¶
Without checkpoints, a workflow that stops near the end usually has to restart from the beginning.
With checkpoints, Spectra can resume from the latest saved point.
That helps with:
- crash recovery — resume after process restarts
- interrupts — pause for approval or human input
- sessions — keep long-running conversations alive
- debugging — inspect saved execution state
- time travel — resume or fork from an earlier point
See Time Travel for branching from old checkpoints.
Enable checkpointing¶
services.AddSpectra(builder =>
{
// In-memory
builder.AddInMemoryCheckpoints();
// File-based
builder.AddFileCheckpoints("./checkpoints");
});
Once a checkpoint store is registered, the runner saves checkpoints automatically based on the configured settings.
When checkpoints are saved¶
By default, Spectra saves a checkpoint after every node.
You can change that behavior:
builder.AddInMemoryCheckpoints(opts =>
{
opts.Frequency = CheckpointFrequency.EveryNode;
opts.CheckpointOnFailure = true;
opts.CheckpointOnInterrupt = true;
opts.CheckpointOnAwaitingInput = true;
});
| Frequency | Behavior |
|---|---|
EveryNode |
Save after every step. Safest and most complete |
StatusChangeOnly |
Save only when run status changes |
Disabled |
Do not save automatically |
For most workflows, EveryNode is the safest default.
Checkpoint status¶
Each checkpoint records the current execution status.
| Status | Meaning | How to continue |
|---|---|---|
InProgress |
Workflow was running or paused mid-run | ResumeAsync(...) |
Completed |
Workflow finished successfully | Cannot resume; fork instead |
Failed |
A step failed | Inspect, fix, then fork or rerun |
Interrupted |
Waiting for human input | ResumeWithResponseAsync(...) |
AwaitingInput |
Waiting for a session message | SendMessageAsync(...) |
This status is what tells the runner how a saved execution can continue.
Resume a workflow¶
To continue from the latest checkpoint:
The runner loads the latest checkpoint for that run, restores workflow state, and continues from the saved next node.
If the run is already completed, resume does not continue it. In that case, use forking.
What gets saved¶
A checkpoint stores everything needed to continue execution.
| Field | Purpose |
|---|---|
RunId / WorkflowId |
Identifies the run and workflow |
State |
Full WorkflowState snapshot |
LastCompletedNodeId |
The node that just finished |
NextNodeId |
The next node to execute |
StepsCompleted |
Number of completed steps |
Index |
Checkpoint number in the run |
Status |
Current lifecycle state |
PendingInterrupt |
Interrupt request waiting for a response |
ParentRunId |
Source run if this run was forked |
TenantId / UserId |
Identity context from the run |
At a practical level, this means Spectra saves both:
- where the workflow was
- what the workflow knew at that moment
Retention and cleanup¶
You can control how many checkpoints are kept and how long they stay around.
builder.AddFileCheckpoints("./checkpoints", opts =>
{
opts.MaxCheckpointCount = 50;
opts.RetentionPeriod = TimeSpan.FromDays(7);
});
You can also purge a run manually:
Use retention settings to prevent old checkpoint history from growing without bound.
Built-in stores¶
In-memory¶
InMemoryCheckpointStore is fast and simple.
Use it for:
- development
- tests
- local experiments
Data is lost when the process stops.
File-based¶
FileCheckpointStore writes checkpoints as JSON files on disk.
Use it for:
- local persistence
- development with restart safety
- simple self-hosted setups
Production stores¶
For production, implement ICheckpointStore with your own backing store such as:
- Postgres
- Redis
- Cosmos DB
- DynamoDB
See Build Your Own Checkpoint Store.
A simple mental model¶
A checkpoint is just:
- saved workflow state
- saved position in the graph
- saved status for how execution should continue
That is why Spectra can resume instead of restart.
What's next?¶
- Time Travel
Resume or fork from earlier checkpoints.
- Interrupts
See how interrupted workflows pause and resume.
- Custom Checkpoint Store
Implement your own production-ready checkpoint backend.