Picture this: You’re building an e-commerce order processing system. A customer places an order, your system needs to validate payment, check inventory, wait for warehouse confirmation, send notifications, and handle potential failures at each step. The entire process might take hours or even days, involving human approvals and external API calls.

Traditional serverless functions hit a wall here. AWS Lambda functions timeout after 15 minutes maximum. You can’t just “wait” for hours without burning through your budget or hitting service limits. You need something that can pause, persist state, and resume exactly where it left off.
Enter AWS Lambda Durable Functions – a game-changing approach that transforms how we build long-running, stateful workflows in the serverless world.
Released in late 2024, AWS Lambda Durable Functions solve the fundamental challenge of building resilient, long-running workflows without managing infrastructure. They allow your Lambda functions to pause execution, save state automatically, and resume processing when needed – all while maintaining the serverless promise of paying only for what you use.

What Are AWS Lambda Durable Functions?
AWS Lambda Durable Functions are regular Lambda functions enhanced with automatic state management capabilities. They can pause execution, checkpoint their state, and resume from exactly where they left off – even after hours, days, or up to a year later.
The Core Concept
Think of durable functions as Lambda functions with a “pause and resume” superpower:
- Regular Lambda: Executes, completes, and terminates within 15 minutes
- Durable Lambda: Can pause mid-execution, save state, and continue later
When your function encounters a wait condition (like waiting for an external callback or a scheduled delay), Lambda automatically:
- Checkpoints the current state and variables
- Stops execution (no compute charges)
- Resumes execution when the wait condition is met
- Restores all context and continues seamlessly
Key Architectural Benefits
No Infrastructure Management: No need to set up databases, queues, or state machines. Lambda handles all state persistence automatically.
Cost Efficiency: Pay only for actual execution time, not waiting time. A workflow that runs for 5 minutes but waits for 24 hours only charges for 5 minutes.
Automatic Scaling: Lambda’s built-in scaling applies to durable functions, handling thousands of concurrent workflows without configuration.
Built-in Reliability: Automatic checkpointing ensures workflows survive failures, restarts, and service interruptions.
How Durable Functions Work: The Replay Model
Understanding the replay model is crucial for effectively using durable functions. It’s the secret sauce that makes everything work reliably.
The Replay Execution Pattern
When a durable function resumes, Lambda doesn’t just continue from where it paused. Instead, it replays the entire function from the beginning – but with a twist:
export const handler = withDurableExecution(
async (event, context) => {
// Step 1: Always executes (first time) or replays (resume)
const order = await context.step('create-order', async () => {
return createOrder(event.items); // Only executes once
});
// Step 2: Wait operation (causes pause on first execution)
await context.wait({ hours: 24 });
// Step 3: Only executes after 24-hour wait completes
const notification = await context.step('send-reminder', async () => {
return sendReminderEmail(order.customerId);
});
return { orderId: order.id, status: 'completed' };
}
);
Execution Flow Breakdown
First Invocation (Day 1):
create-orderstep executes, creates order, result checkpointedwaitoperation triggers 24-hour pause- Function execution stops, state saved
Resume Invocation (Day 2):
- Function replays from the beginning
create-orderstep returns checkpointed result (doesn’t re-execute)waitoperation recognizes 24 hours have passed, continues immediatelysend-reminderstep executes for the first time- Function completes and returns result
Why Replay Works
The replay model ensures deterministic execution:
- Operations wrapped in
context.step()execute exactly once - All subsequent replays return the same checkpointed result
- Function state remains consistent across pauses and resumes
- No race conditions or duplicate operations
Core Capabilities and Features
1. Extended Execution Times
Traditional Lambda Limit: 15 minutes maximum
Durable Functions: Up to 1 year total workflow duration
Individual invocations still respect the 15-minute limit, but the workflow continues across multiple invocations seamlessly.
2. Automatic State Checkpointing
const processLargeDataset = withDurableExecution(
async (event, context) => {
const chunks = splitDataIntoChunks(event.data);
const results = [];
for (let i = 0; i < chunks.length; i++) {
// Each chunk processing is checkpointed
const result = await context.step(`process-chunk-${i}`, async () => {
return processChunk(chunks[i]);
});
results.push(result);
// Optional: Add small delays to prevent timeouts
if (i % 10 === 0) {
await context.wait({ seconds: 1 });
}
}
return combineResults(results);
}
);
If processing fails at chunk 47 out of 100, the function resumes from chunk 47 – not from the beginning.
3. Built-in Retry Logic
const reliableApiCall = withDurableExecution(
async (event, context) => {
const result = await context.step('call-external-api',
async () => {
return callExternalAPI(event.endpoint, event.data);
},
{
retryPolicy: {
maxAttempts: 5,
backoffCoefficient: 2.0,
initialInterval: { seconds: 1 },
maximumInterval: { seconds: 60 }
}
}
);
return result;
}
);
Lambda automatically handles:
- Exponential backoff between retries
- Maximum retry attempts
- Jitter to prevent thundering herd problems
- Persistent retry state across function invocations
4. Callback and Event Waiting
const approvalWorkflow = withDurableExecution(
async (event, context) => {
// Submit for approval
const approvalRequest = await context.step('submit-approval', async () => {
return submitForApproval(event.requestData);
});
// Wait for human approval (could take days)
const approvalResult = await context.waitForCallback({
callbackId: approvalRequest.id,
timeout: { days: 7 }
});
if (approvalResult.approved) {
return await context.step('process-approved', async () => {
return processApprovedRequest(event.requestData);
});
} else {
return { status: 'rejected', reason: approvalResult.reason };
}
}
);
5. Parallel Execution and Coordination
const parallelProcessing = withDurableExecution(
async (event, context) => {
// Start multiple operations in parallel
const tasks = event.items.map((item, index) =>
context.step(`process-item-${index}`, async () => {
return processItem(item);
})
);
// Wait for all to complete
const results = await Promise.all(tasks);
// Aggregate results
return await context.step('aggregate-results', async () => {
return aggregateResults(results);
});
}
);
Real-World Use Cases and Implementation Patterns
1. E-Commerce Order Processing Pipeline
A comprehensive order processing workflow that handles payment validation, inventory checks, warehouse coordination, and customer notifications:
const orderProcessingWorkflow = withDurableExecution(
async (event, context) => {
const order = event.order;
// Step 1: Validate payment
const paymentResult = await context.step('validate-payment', async () => {
return validatePayment(order.paymentInfo);
});
if (!paymentResult.valid) {
return { status: 'failed', reason: 'Payment validation failed' };
}
// Step 2: Check inventory
const inventoryCheck = await context.step('check-inventory', async () => {
return checkInventoryAvailability(order.items);
});
if (!inventoryCheck.available) {
// Wait for restocking notification
await context.waitForCallback({
callbackId: `restock-${order.id}`,
timeout: { days: 30 }
});
}
// Step 3: Reserve inventory
await context.step('reserve-inventory', async () => {
return reserveInventory(order.items);
});
// Step 4: Wait for warehouse confirmation
const warehouseConfirmation = await context.waitForCallback({
callbackId: `warehouse-${order.id}`,
timeout: { hours: 48 }
});
// Step 5: Process shipping
const shippingResult = await context.step('process-shipping', async () => {
return processShipping(order, warehouseConfirmation);
});
// Step 6: Send confirmation email
await context.step('send-confirmation', async () => {
return sendOrderConfirmation(order.customerId, shippingResult);
});
return {
orderId: order.id,
status: 'completed',
trackingNumber: shippingResult.trackingNumber
};
}
);
2. Data Processing Pipeline with Checkpoints
Process large datasets in batches with automatic checkpointing and recovery:
const dataProcessingPipeline = withDurableExecution(
async (event, context) => {
const { datasetId, processingConfig } = event;
// Step 1: Extract data
const rawData = await context.step('extract-data', async () => {
return extractDataFromSource(datasetId);
});
// Step 2: Transform in batches
const batches = chunkArray(rawData, processingConfig.batchSize);
const transformedBatches = [];
for (let i = 0; i < batches.length; i++) {
const transformedBatch = await context.step(`transform-batch-${i}`, async () => {
return transformBatch(batches[i], processingConfig.transformRules);
});
transformedBatches.push(transformedBatch);
// Checkpoint every 10 batches
if (i % 10 === 0) {
await context.step(`checkpoint-${i}`, async () => {
return saveCheckpoint(datasetId, i, transformedBatches.slice(-10));
});
}
// Small delay to prevent timeout
await context.wait({ milliseconds: 100 });
}
// Step 3: Load to destination
const loadResult = await context.step('load-data', async () => {
return loadDataToDestination(transformedBatches.flat(), processingConfig.destination);
});
return {
datasetId,
recordsProcessed: transformedBatches.flat().length,
status: 'completed'
};
}
);
3. Multi-Service Saga Pattern
Coordinate distributed transactions with automatic compensation on failure:
const bookingSagaWorkflow = withDurableExecution(
async (event, context) => {
const { userId, flightId, hotelId, carId } = event.booking;
const compensations = [];
try {
// Step 1: Book flight
const flightBooking = await context.step('book-flight', async () => {
return bookFlight(userId, flightId);
});
compensations.push(() => cancelFlight(flightBooking.id));
// Step 2: Book hotel
const hotelBooking = await context.step('book-hotel', async () => {
return bookHotel(userId, hotelId, flightBooking.dates);
});
compensations.push(() => cancelHotel(hotelBooking.id));
// Step 3: Book car
const carBooking = await context.step('book-car', async () => {
return bookCar(userId, carId, flightBooking.dates);
});
compensations.push(() => cancelCar(carBooking.id));
// Step 4: Process payment
const paymentResult = await context.step('process-payment', async () => {
const totalAmount = flightBooking.amount + hotelBooking.amount + carBooking.amount;
return processPayment(userId, totalAmount);
});
return {
bookingId: `booking-${Date.now()}`,
status: 'confirmed',
flight: flightBooking,
hotel: hotelBooking,
car: carBooking,
payment: paymentResult
};
} catch (error) {
// Compensate in reverse order
await context.step('compensate-bookings', async () => {
for (let i = compensations.length - 1; i >= 0; i--) {
try {
await compensations[i]();
} catch (compensationError) {
console.error('Compensation failed:', compensationError);
}
}
});
throw error;
}
}
);
Testing Durable Functions: Local Development and CI/CD
Local Testing with the Test Runner
The Durable Execution SDK includes a powerful local testing framework that simulates the entire durable execution environment:
import { LocalDurableTestRunner } from '@aws/durable-execution-sdk-js-testing';
import { orderProcessingWorkflow } from './order-workflow.js';
describe('Order Processing Workflow', () => {
let testRunner;
beforeEach(() => {
testRunner = new LocalDurableTestRunner({
handlerFunction: orderProcessingWorkflow,
});
});
test('should complete successful order processing', async () => {
const mockEvent = {
order: {
id: 'order-123',
items: [{ id: 'item-1', quantity: 2 }],
paymentInfo: { cardToken: 'valid-token' },
customerId: 'customer-456'
}
};
// Mock external service calls
testRunner.mockStep('validate-payment', { valid: true });
testRunner.mockStep('check-inventory', { available: true });
testRunner.mockStep('reserve-inventory', { reserved: true });
testRunner.mockCallback('warehouse-order-123', { confirmed: true });
const execution = await testRunner.run(mockEvent);
expect(execution.getStatus()).toBe('SUCCEEDED');
expect(execution.getResult()).toMatchObject({
orderId: 'order-123',
status: 'completed'
});
});
});
Deployment and Configuration
AWS SAM Template Configuration
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
OrderProcessingWorkflow:
Type: AWS::Serverless::Function
Properties:
CodeUri: src/workflows/order-processing/
Handler: index.handler
Runtime: nodejs22.x
DurableConfig:
ExecutionTimeout: 2592000 # 30 days in seconds
RetentionPeriodInDays: 90
Environment:
Variables:
PAYMENT_SERVICE_URL: !Ref PaymentServiceUrl
INVENTORY_SERVICE_URL: !Ref InventoryServiceUrl
Events:
OrderCreated:
Type: EventBridgeRule
Properties:
Pattern:
source: ["ecommerce.orders"]
detail-type: ["Order Created"]
Metadata:
BuildMethod: esbuild
BuildProperties:
EntryPoints:
- index.ts
Target: es2022
Best Practices and Performance Optimization
1. Designing Deterministic Operations
// ❌ Bad: Non-deterministic operations
const badWorkflow = withDurableExecution(
async (event, context) => {
// This will cause issues during replay
const timestamp = Date.now(); // Different on each replay
const randomId = Math.random(); // Different on each replay
await context.step('process-data', async () => {
return processData(timestamp, randomId);
});
}
);
// ✅ Good: Deterministic operations
const goodWorkflow = withDurableExecution(
async (event, context) => {
// Generate non-deterministic values inside steps
const metadata = await context.step('generate-metadata', async () => {
return {
timestamp: Date.now(),
randomId: Math.random(),
uuid: generateUUID()
};
});
await context.step('process-data', async () => {
return processData(metadata.timestamp, metadata.randomId);
});
}
);
2. Optimizing Step Granularity
Balance between too fine-grained (excessive overhead) and too coarse-grained (loss of checkpointing benefits):
// ✅ Optimal granularity: Batch related operations
const optimizedGranularity = withDurableExecution(
async (event, context) => {
const batchSize = 50;
const totalItems = 1000;
for (let batch = 0; batch < totalItems / batchSize; batch++) {
await context.step(`process-batch-${batch}`, async () => {
const startIndex = batch * batchSize;
const endIndex = Math.min(startIndex + batchSize, totalItems);
const results = [];
for (let i = startIndex; i < endIndex; i++) {
results.push(processSimpleItem(i));
}
return results;
});
}
}
);
3. Error Handling and Compensation
const robustWorkflow = withDurableExecution(
async (event, context) => {
const compensationActions = [];
try {
// Step 1: Create resource
const resource = await context.step('create-resource',
async () => {
const result = await createResource(event.resourceConfig);
compensationActions.push(() => deleteResource(result.id));
return result;
},
{
retryPolicy: {
maxAttempts: 3,
backoffCoefficient: 2.0,
initialInterval: { seconds: 1 }
}
}
);
return { resourceId: resource.id, status: 'active' };
} catch (error) {
// Execute compensation actions in reverse order
await context.step('compensate', async () => {
for (let i = compensationActions.length - 1; i >= 0; i--) {
try {
await compensationActions[i]();
} catch (compensationError) {
console.error('Compensation failed:', compensationError);
}
}
});
throw error;
}
}
);
Cost Optimization and Scaling Considerations
Understanding Durable Functions Pricing
Compute Costs: Pay only for actual execution time, not waiting time
- Standard Lambda pricing applies during active execution
- No charges during waits, callbacks, or paused states
Storage Costs: Minimal charges for state persistence
- DynamoDB storage for checkpoints and state
- Typically $0.25 per GB-month
Request Costs: Standard Lambda invocation pricing
- Each resume counts as a new invocation
- Batch operations to minimize invocations
Cost Optimization Strategies
// ✅ Cost-effective: Batched operations
const costEffectivePattern = withDurableExecution(
async (event, context) => {
const batchSize = 100;
for (let batch = 0; batch < 10; batch++) {
await context.step(`batch-${batch}`, async () => {
const operations = [];
for (let i = 0; i < batchSize; i++) {
operations.push(smallOperation(batch * batchSize + i));
}
return Promise.all(operations);
});
// Single wait per batch instead of per operation
await context.wait({ seconds: 10 });
}
}
);
Migration Strategies and Adoption Patterns
Migrating from AWS Step Functions
Durable Functions offer a code-first alternative to Step Functions' JSON-based state machines:
// Before: Step Functions state machine (JSON configuration)
{
"Comment": "Order processing workflow",
"StartAt": "ValidatePayment",
"States": {
"ValidatePayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-payment",
"Next": "CheckInventory"
},
"CheckInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:check-inventory",
"Next": "ProcessOrder"
}
}
}
// After: Durable Functions (JavaScript code)
const orderWorkflow = withDurableExecution(
async (event, context) => {
const paymentResult = await context.step('validate-payment', async () => {
return validatePayment(event.paymentInfo);
});
const inventoryResult = await context.step('check-inventory', async () => {
return checkInventory(event.items);
});
return await context.step('process-order', async () => {
return processOrder(paymentResult, inventoryResult);
});
}
);
Gradual Migration Approach
- Phase 1: Identify simple, linear workflows
- Phase 2: Migrate workflows with minimal external dependencies
- Phase 3: Convert complex workflows with callbacks and parallel execution
- Phase 4: Optimize and consolidate related workflows
Conclusion: The Future of Serverless Workflows
AWS Lambda Durable Functions represent a paradigm shift in how we build long-running, stateful workflows in the cloud. They eliminate the complexity of managing state, retries, and coordination while maintaining the serverless promise of paying only for what you use.
Key Takeaways
Simplicity: Write workflows as straightforward async code instead of complex state machine configurations.
Reliability: Automatic checkpointing and replay ensure workflows survive failures and continue exactly where they left off.
Cost Efficiency: Pay only for execution time, not waiting time, making long-running workflows economically viable.
Scalability: Leverage Lambda's automatic scaling to handle thousands of concurrent workflows without infrastructure management.
Developer Experience: Local testing, familiar programming patterns, and comprehensive tooling make development and debugging straightforward.
When to Choose Durable Functions
Perfect for:
- Multi-step workflows with waits or callbacks
- Processes requiring human approval
- Data pipelines with checkpointing needs
- Saga patterns and distributed transactions
- Event-driven workflows with external dependencies
Consider alternatives for:
- Simple, fast operations (regular Lambda)
- Complex branching logic (Step Functions might be clearer)
- Workflows requiring visual design tools
- High-frequency, low-latency operations
Getting Started Today
- Start small: Begin with a simple workflow to understand the patterns
- Test locally: Use the testing framework for rapid development
- Monitor closely: Set up proper observability from day one
- Optimize gradually: Focus on correctness first, then optimize for cost and performance
- Plan for scale: Design with batching and efficient checkpointing in mind
AWS Lambda Durable Functions are more than just a new feature – they're a new way of thinking about serverless workflows. By combining the simplicity of code with the power of automatic state management, they open up possibilities that were previously complex or expensive to implement.
The future of serverless is not just about functions that scale to zero, but functions that can pause, persist, and resume – giving us the best of both worlds: the simplicity of serverless with the power of long-running processes.
Resources and Next Steps
Official Documentation
- AWS Lambda Durable Functions Developer Guide
- Durable Execution SDK for JavaScript
- AWS SAM Durable Functions Template
Learning Resources
- AWS Lambda Durable Functions Workshop
- Serverless Patterns for Durable Functions
- AWS re:Invent 2024 Sessions on Durable Functions
Community and Support
Have questions about implementing durable functions in your architecture? Connect with us through the comments or reach out directly for consultation on your specific use cases.
