Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM
- **Thread safety**, **concurrency model**, **code review checklist**, **Powertools compatibility**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md)
- **Before/after code examples**, **runtime-specific migration** (Node.js, Python, Java, .NET), or **connection pooling** -> see [references/migration-patterns.md](references/migration-patterns.md)
- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, **CDK example**, or **scheduled scaling setup (EventBridge Scheduler)** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh)
- **Errors**, **throttling**, **debugging**, or **stuck deployments** -> see [references/troubleshooting.md](references/troubleshooting.md)
- **Errors**, **throttling**, **debugging**, **stuck deployments**, **tuning configuration**, or **adjusting after deployment** -> see [references/troubleshooting.md](references/troubleshooting.md)

## Quick Decision: Is LMI Right for This Workload?

Expand Down Expand Up @@ -55,6 +55,38 @@ Gather these signals before recommending:
6. **Concurrency readiness**: Thread safety (Node.js/Java/.NET)? Shared `/tmp` paths? Per-invocation DB connections?
7. **VPC**: Already in a VPC? Private resource access needed?

#### Deriving LMI Configuration from Metrics

If Lambda Insights is enabled on the function, use these metrics to calculate your starting configuration. If Lambda Insights is not enabled, suggest adding it to gather accurate workload data — but only proceed with the user's explicit confirmation, as adding the Insights layer may affect function performance or cold start times.

To check if Lambda Insights is enabled, look for a LambdaInsightsExtension layer on the function. To add it, find the latest layer ARN for your region from the [Lambda Insights documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versionsx.html) and attach the `CloudWatchLambdaInsightsExecutionRolePolicy` managed policy to the function's execution role.

**Target max concurrency** (from `cpu_total_time` and `Duration`):

```
PerExecutionEnvironmentMaxConcurrency = floor((0.5 × Duration) / cpu_total_time)
```

This targets 50% CPU utilization at full concurrency, leaving headroom for scaling.

**Memory allocation** (from `memory_utilization` and current memory):

```
MemorySize = max(2048, MaxConcurrency × (memory_utilization / 100) × current_allocated_memory)
```

This overestimates (assumes no shared base memory) but provides a safe starting point.

**Minimum execution environments** (from baseline `ConcurrentExecutions`):

```
MinExecutionEnvironments = max(3, ceil(baseline_concurrent_executions × 2 / MaxConcurrency))
```

Targets 50% concurrency utilization to leave headroom for traffic bursts.

**Without Lambda Insights:** Start with the runtime's default max concurrency, 2 GB memory, and MinExecutionEnvironments = 3. Adjust during testing.

### Step 2: Build the Cost Comparison

REQUIRED: Present a cost comparison before recommending LMI. Compare at minimum:
Expand All @@ -72,7 +104,7 @@ For discount analysis (Savings Plans, Reserved Instances), refer users to the [A

**Instance families** (~450 types): C-series (compute, .xlarge+), M-series (general, .large+), R-series (memory, .large+). ARM (Graviton) for best price-performance.

**Memory-to-vCPU ratios**: 2:1 (compute), 4:1 (general, default), 8:1 (memory). Min 2 GB, max 32 GB.
**Memory-to-vCPU ratios**: 2:1 (default, CPU-bound work), 4:1 (general/mixed workloads), 8:1 (memory-heavy or Python apps). Min 2 GB, max 32 GB.

**Multi-concurrency defaults/vCPU**: Node.js 64, Java 32, .NET 32, Python 16.

Expand Down Expand Up @@ -108,16 +140,16 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for
### Step 6: Validate and Cut Over

1. Deploy to a non-production environment first
2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate
3. Gradual traffic shift with weighted aliases (10% → 50% → 100%)
2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate. If you observe low CPU utilization or ongoing throttles, see [references/troubleshooting.md](references/troubleshooting.md) for metric-specific adjustment guidance.
3. Shift traffic to the LMI function (note: weighted alias shifting between LMI and non-LMI functions is not currently supported)
4. Compare costs after 1-2 weeks of production data
5. Decommission standard Lambda once stable

## Best Practices

### Configuration

- Do: Start with 4:1 ratio and runtime default concurrency
- Do: Start with 2:1 ratio and runtime default concurrency
- Do: Use ARM (Graviton) unless x86 dependencies exist
- Do: Let Lambda choose instance types unless specific hardware needed
- Do: Set MaxVCpuCount to control cost ceiling
Expand All @@ -128,7 +160,7 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for

- Do: Start with I/O-heavy functions (benefit most from multi-concurrency; CPU-bound functions compete for same CPU)
- Do: Review code for concurrency safety before attaching to capacity provider (thread safety for Node.js/Java/.NET; `/tmp` and memory for Python)
- Do: Use weighted aliases for gradual traffic shift
- Do: Plan traffic shifting strategy based on your invocation source (weighted alias shifting between LMI and non-LMI functions is not currently supported)
- Do: Include request IDs in all log statements
- Do: Initialize DB pools and SDK clients outside the handler
- Do: Estimate total `/tmp` usage under max concurrency
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@
- **CPU-intensive** (encoding, ML, compression) → C-series, 2:1 ratio, concurrency=1/vCPU
- **Memory-intensive** (caching, large datasets) → R-series, 8:1 ratio
- **Network-intensive** (streaming, data transfer) → Use AllowedInstanceTypes for n-suffix types, 4:1 ratio
- **General/balanced** (web APIs, microservices) → M-series, 4:1 ratio, default concurrency
- **General/balanced** (web APIs, microservices) → M-series, 2:1 ratio (default), default concurrency

Architecture: ARM (Graviton, g-suffix) for price-performance. x86 (i=Intel, a=AMD) when dependencies require it.

## Memory-to-vCPU Ratios

| Ratio | Profile | When to use | Memory examples |
| ----- | ------- | -------------------------- | --------------------- |
| 2:1 | Compute | CPU-bound work | 2GB/1vCPU, 4GB/2vCPU |
| 4:1 | General | Most workloads (default) | 4GB/1vCPU, 8GB/2vCPU |
| 8:1 | Memory | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU |
| Ratio | Profile | When to use | Memory examples |
| ----- | ------- | -------------------------------- | --------------------- |
| 2:1 | Compute | CPU-bound work (default) | 2GB/1vCPU, 4GB/2vCPU |
| 4:1 | General | Mixed CPU/memory-heavy workloads | 4GB/1vCPU, 8GB/2vCPU |
| 8:1 | Memory | Memory-heavy or Python apps | 8GB/1vCPU, 16GB/2vCPU |

Min: 2 GB / 1 vCPU. Max: 32 GB. Memory must align with ratio multiples.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,35 @@
# LMI Troubleshooting

## Testing Phase: Monitor and Adjust

After deploying your LMI function with a test workload, check these metrics and adjust:

**Duration increased vs. existing function:**
- This indicates the concurrency estimations used during setup may be off. Investigate by:
- Checking ExecutionEnvironmentCPUUtilization and ExecutionEnvironmentMemoryUtilization for saturation
- Reducing PerExecutionEnvironmentMaxConcurrency to see if duration improves
- Reviewing instance types — switching to larger or more powerful instances may help if resources are constrained
- If reducing concurrency doesn't help, check throttle metrics below

**Low ExecutionEnvironmentCPUUtilization (below 10%):**
- Increase PerExecutionEnvironmentMaxConcurrency to improve utilization
- Or lower MemorySize to reduce vCPUs per execution environment
- If memory utilization is also high, increase ExecutionEnvironmentMemoryGiBPerVCpu ratio instead

**Ongoing CPUThrottles:**
- Switch capacity provider to Manual scaling mode with a lower CPU utilization target (e.g., 25%)

**Ongoing MemoryThrottles:**
- Increase MemorySize
- To maintain the same vCPU count, adjust ratio proportionally (e.g., 4GB/2:1 → 8GB/4:1 keeps 2 vCPUs)

**Ongoing DiskThrottles:**
- Reduce per-invocation /tmp usage or reduce PerExecutionEnvironmentMaxConcurrency

**Ongoing ConcurrencyThrottles:**
- Increase PerExecutionEnvironmentMaxConcurrency (if CPU and memory have headroom)
- Check if MaxExecutionEnvironments or MaxVCpuCount is capping scale-out

## Common Issues

| Issue | Cause | Resolution |
Expand Down