From 7d6eeee97d9846474e6a5163cc80ff877452f3f5 Mon Sep 17 00:00:00 2001 From: Craig Phillips Date: Wed, 24 Jun 2026 23:51:44 +0000 Subject: [PATCH] Add migration assessment formulas and fix LMI configuration defaults - Add formulas for deriving MaxConcurrency, MemorySize, and MinExecutionEnvironments from Lambda Insights metrics (cpu_total_time, memory_utilization, ConcurrentExecutions) - Add fallback guidance when Lambda Insights is not available - Add Lambda Insights enablement instructions - Fix default memory-to-vCPU ratio from 4:1 to 2:1 (matches public docs and CDK defaults) - Fix weighted alias traffic shifting claim (not currently supported for LMI) - Add testing phase troubleshooting guidance (CPUThrottles, MemoryThrottles, DiskThrottles, ConcurrencyThrottles, low CPU utilization) - Update Best Practices to align with corrected defaults --- .../aws-lambda-managed-instances/SKILL.md | 44 ++++++++++++++++--- .../references/configuration-guide.md | 12 ++--- .../references/troubleshooting.md | 30 +++++++++++++ 3 files changed, 74 insertions(+), 12 deletions(-) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md index 22d675e9..2e6659ab 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md @@ -27,7 +27,7 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM - **Thread safety**, **concurrency model**, **code review checklist**, **Powertools compatibility**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md) - **Before/after code examples**, **runtime-specific migration** (Node.js, Python, Java, .NET), or **connection pooling** -> see [references/migration-patterns.md](references/migration-patterns.md) - **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, **CDK example**, or **scheduled scaling setup (EventBridge Scheduler)** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh) -- **Errors**, **throttling**, **debugging**, or **stuck deployments** -> see [references/troubleshooting.md](references/troubleshooting.md) +- **Errors**, **throttling**, **debugging**, **stuck deployments**, **tuning configuration**, or **adjusting after deployment** -> see [references/troubleshooting.md](references/troubleshooting.md) ## Quick Decision: Is LMI Right for This Workload? @@ -55,6 +55,38 @@ Gather these signals before recommending: 6. **Concurrency readiness**: Thread safety (Node.js/Java/.NET)? Shared `/tmp` paths? Per-invocation DB connections? 7. **VPC**: Already in a VPC? Private resource access needed? +#### Deriving LMI Configuration from Metrics + +If Lambda Insights is enabled on the function, use these metrics to calculate your starting configuration. If Lambda Insights is not enabled, suggest adding it to gather accurate workload data — but only proceed with the user's explicit confirmation, as adding the Insights layer may affect function performance or cold start times. + +To check if Lambda Insights is enabled, look for a LambdaInsightsExtension layer on the function. To add it, find the latest layer ARN for your region from the [Lambda Insights documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versionsx.html) and attach the `CloudWatchLambdaInsightsExecutionRolePolicy` managed policy to the function's execution role. + +**Target max concurrency** (from `cpu_total_time` and `Duration`): + +``` +PerExecutionEnvironmentMaxConcurrency = floor((0.5 × Duration) / cpu_total_time) +``` + +This targets 50% CPU utilization at full concurrency, leaving headroom for scaling. + +**Memory allocation** (from `memory_utilization` and current memory): + +``` +MemorySize = max(2048, MaxConcurrency × (memory_utilization / 100) × current_allocated_memory) +``` + +This overestimates (assumes no shared base memory) but provides a safe starting point. + +**Minimum execution environments** (from baseline `ConcurrentExecutions`): + +``` +MinExecutionEnvironments = max(3, ceil(baseline_concurrent_executions × 2 / MaxConcurrency)) +``` + +Targets 50% concurrency utilization to leave headroom for traffic bursts. + +**Without Lambda Insights:** Start with the runtime's default max concurrency, 2 GB memory, and MinExecutionEnvironments = 3. Adjust during testing. + ### Step 2: Build the Cost Comparison REQUIRED: Present a cost comparison before recommending LMI. Compare at minimum: @@ -72,7 +104,7 @@ For discount analysis (Savings Plans, Reserved Instances), refer users to the [A **Instance families** (~450 types): C-series (compute, .xlarge+), M-series (general, .large+), R-series (memory, .large+). ARM (Graviton) for best price-performance. -**Memory-to-vCPU ratios**: 2:1 (compute), 4:1 (general, default), 8:1 (memory). Min 2 GB, max 32 GB. +**Memory-to-vCPU ratios**: 2:1 (default, CPU-bound work), 4:1 (general/mixed workloads), 8:1 (memory-heavy or Python apps). Min 2 GB, max 32 GB. **Multi-concurrency defaults/vCPU**: Node.js 64, Java 32, .NET 32, Python 16. @@ -108,8 +140,8 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for ### Step 6: Validate and Cut Over 1. Deploy to a non-production environment first -2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate -3. Gradual traffic shift with weighted aliases (10% → 50% → 100%) +2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate. If you observe low CPU utilization or ongoing throttles, see [references/troubleshooting.md](references/troubleshooting.md) for metric-specific adjustment guidance. +3. Shift traffic to the LMI function (note: weighted alias shifting between LMI and non-LMI functions is not currently supported) 4. Compare costs after 1-2 weeks of production data 5. Decommission standard Lambda once stable @@ -117,7 +149,7 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for ### Configuration -- Do: Start with 4:1 ratio and runtime default concurrency +- Do: Start with 2:1 ratio and runtime default concurrency - Do: Use ARM (Graviton) unless x86 dependencies exist - Do: Let Lambda choose instance types unless specific hardware needed - Do: Set MaxVCpuCount to control cost ceiling @@ -128,7 +160,7 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for - Do: Start with I/O-heavy functions (benefit most from multi-concurrency; CPU-bound functions compete for same CPU) - Do: Review code for concurrency safety before attaching to capacity provider (thread safety for Node.js/Java/.NET; `/tmp` and memory for Python) -- Do: Use weighted aliases for gradual traffic shift +- Do: Plan traffic shifting strategy based on your invocation source (weighted alias shifting between LMI and non-LMI functions is not currently supported) - Do: Include request IDs in all log statements - Do: Initialize DB pools and SDK clients outside the handler - Do: Estimate total `/tmp` usage under max concurrency diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md index dcff8362..8e66a5be 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md @@ -5,17 +5,17 @@ - **CPU-intensive** (encoding, ML, compression) → C-series, 2:1 ratio, concurrency=1/vCPU - **Memory-intensive** (caching, large datasets) → R-series, 8:1 ratio - **Network-intensive** (streaming, data transfer) → Use AllowedInstanceTypes for n-suffix types, 4:1 ratio -- **General/balanced** (web APIs, microservices) → M-series, 4:1 ratio, default concurrency +- **General/balanced** (web APIs, microservices) → M-series, 2:1 ratio (default), default concurrency Architecture: ARM (Graviton, g-suffix) for price-performance. x86 (i=Intel, a=AMD) when dependencies require it. ## Memory-to-vCPU Ratios -| Ratio | Profile | When to use | Memory examples | -| ----- | ------- | -------------------------- | --------------------- | -| 2:1 | Compute | CPU-bound work | 2GB/1vCPU, 4GB/2vCPU | -| 4:1 | General | Most workloads (default) | 4GB/1vCPU, 8GB/2vCPU | -| 8:1 | Memory | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU | +| Ratio | Profile | When to use | Memory examples | +| ----- | ------- | -------------------------------- | --------------------- | +| 2:1 | Compute | CPU-bound work (default) | 2GB/1vCPU, 4GB/2vCPU | +| 4:1 | General | Mixed CPU/memory-heavy workloads | 4GB/1vCPU, 8GB/2vCPU | +| 8:1 | Memory | Memory-heavy or Python apps | 8GB/1vCPU, 16GB/2vCPU | Min: 2 GB / 1 vCPU. Max: 32 GB. Memory must align with ratio multiples. diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md index d1055c7b..67f10be5 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md @@ -1,5 +1,35 @@ # LMI Troubleshooting +## Testing Phase: Monitor and Adjust + +After deploying your LMI function with a test workload, check these metrics and adjust: + +**Duration increased vs. existing function:** +- This indicates the concurrency estimations used during setup may be off. Investigate by: + - Checking ExecutionEnvironmentCPUUtilization and ExecutionEnvironmentMemoryUtilization for saturation + - Reducing PerExecutionEnvironmentMaxConcurrency to see if duration improves + - Reviewing instance types — switching to larger or more powerful instances may help if resources are constrained +- If reducing concurrency doesn't help, check throttle metrics below + +**Low ExecutionEnvironmentCPUUtilization (below 10%):** +- Increase PerExecutionEnvironmentMaxConcurrency to improve utilization +- Or lower MemorySize to reduce vCPUs per execution environment +- If memory utilization is also high, increase ExecutionEnvironmentMemoryGiBPerVCpu ratio instead + +**Ongoing CPUThrottles:** +- Switch capacity provider to Manual scaling mode with a lower CPU utilization target (e.g., 25%) + +**Ongoing MemoryThrottles:** +- Increase MemorySize +- To maintain the same vCPU count, adjust ratio proportionally (e.g., 4GB/2:1 → 8GB/4:1 keeps 2 vCPUs) + +**Ongoing DiskThrottles:** +- Reduce per-invocation /tmp usage or reduce PerExecutionEnvironmentMaxConcurrency + +**Ongoing ConcurrencyThrottles:** +- Increase PerExecutionEnvironmentMaxConcurrency (if CPU and memory have headroom) +- Check if MaxExecutionEnvironments or MaxVCpuCount is capping scale-out + ## Common Issues | Issue | Cause | Resolution |