sbbhimji · pcrai-aws · Jun 24, 2026
diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md
@@ -27,7 +27,7 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM
 - **Thread safety**, **concurrency model**, **code review checklist**, **Powertools compatibility**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md)
 - **Before/after code examples**, **runtime-specific migration** (Node.js, Python, Java, .NET), or **connection pooling** -> see [references/migration-patterns.md](references/migration-patterns.md)
 - **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, **CDK example**, or **scheduled scaling setup (EventBridge Scheduler)** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh)
-- **Errors**, **throttling**, **debugging**, or **stuck deployments** -> see [references/troubleshooting.md](references/troubleshooting.md)
+- **Errors**, **throttling**, **debugging**, **stuck deployments**, **tuning configuration**, or **adjusting after deployment** -> see [references/troubleshooting.md](references/troubleshooting.md)
 
 ## Quick Decision: Is LMI Right for This Workload?
 
@@ -55,6 +55,38 @@ Gather these signals before recommending:
 6. **Concurrency readiness**: Thread safety (Node.js/Java/.NET)? Shared `/tmp` paths? Per-invocation DB connections?
 7. **VPC**: Already in a VPC? Private resource access needed?
 
+#### Deriving LMI Configuration from Metrics
+
+If Lambda Insights is enabled on the function, use these metrics to calculate your starting configuration. If Lambda Insights is not enabled, suggest adding it to gather accurate workload data — but only proceed with the user's explicit confirmation, as adding the Insights layer may affect function performance or cold start times.
+
+To check if Lambda Insights is enabled, look for a LambdaInsightsExtension layer on the function. To add it, find the latest layer ARN for your region from the [Lambda Insights documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versionsx.html) and attach the `CloudWatchLambdaInsightsExecutionRolePolicy` managed policy to the function's execution role.
+
+**Target max concurrency** (from `cpu_total_time` and `Duration`):
+
+```
+PerExecutionEnvironmentMaxConcurrency = floor((0.5 × Duration) / cpu_total_time)
+```
+
+This targets 50% CPU utilization at full concurrency, leaving headroom for scaling.
+
+**Memory allocation** (from `memory_utilization` and current memory):
+
+```
+MemorySize = max(2048, MaxConcurrency × (memory_utilization / 100) × current_allocated_memory)
+```
+
+This overestimates (assumes no shared base memory) but provides a safe starting point.
+
+**Minimum execution environments** (from baseline `ConcurrentExecutions`):
+
+```
+MinExecutionEnvironments = max(3, ceil(baseline_concurrent_executions × 2 / MaxConcurrency))
+```
+
+Targets 50% concurrency utilization to leave headroom for traffic bursts.
+
+**Without Lambda Insights:** Start with the runtime's default max concurrency, 2 GB memory, and MinExecutionEnvironments = 3. Adjust during testing.
+
 ### Step 2: Build the Cost Comparison
 
 REQUIRED: Present a cost comparison before recommending LMI. Compare at minimum:
@@ -72,7 +104,7 @@ For discount analysis (Savings Plans, Reserved Instances), refer users to the [A
 
 **Instance families** (~450 types): C-series (compute, .xlarge+), M-series (general, .large+), R-series (memory, .large+). ARM (Graviton) for best price-performance.
 
-**Memory-to-vCPU ratios**: 2:1 (compute), 4:1 (general, default), 8:1 (memory). Min 2 GB, max 32 GB.
+**Memory-to-vCPU ratios**: 2:1 (default, CPU-bound work), 4:1 (general/mixed workloads), 8:1 (memory-heavy or Python apps). Min 2 GB, max 32 GB.
 
 **Multi-concurrency defaults/vCPU**: Node.js 64, Java 32, .NET 32, Python 16.
 
@@ -108,16 +140,16 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for
 ### Step 6: Validate and Cut Over
 
 1. Deploy to a non-production environment first
-2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate
-3. Gradual traffic shift with weighted aliases (10% → 50% → 100%)
+2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate. If you observe low CPU utilization or ongoing throttles, see [references/troubleshooting.md](references/troubleshooting.md) for metric-specific adjustment guidance.
+3. Shift traffic to the LMI function (note: weighted alias shifting between LMI and non-LMI functions is not currently supported)
 4. Compare costs after 1-2 weeks of production data
 5. Decommission standard Lambda once stable
 
 ## Best Practices
 
 ### Configuration
 
-- Do: Start with 4:1 ratio and runtime default concurrency
+- Do: Start with 2:1 ratio and runtime default concurrency
 - Do: Use ARM (Graviton) unless x86 dependencies exist
 - Do: Let Lambda choose instance types unless specific hardware needed
 - Do: Set MaxVCpuCount to control cost ceiling
@@ -128,7 +160,7 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for
 
 - Do: Start with I/O-heavy functions (benefit most from multi-concurrency; CPU-bound functions compete for same CPU)
 - Do: Review code for concurrency safety before attaching to capacity provider (thread safety for Node.js/Java/.NET; `/tmp` and memory for Python)
-- Do: Use weighted aliases for gradual traffic shift
+- Do: Plan traffic shifting strategy based on your invocation source (weighted alias shifting between LMI and non-LMI functions is not currently supported)
 - Do: Include request IDs in all log statements
 - Do: Initialize DB pools and SDK clients outside the handler
 - Do: Estimate total `/tmp` usage under max concurrency

diff --git a/...erverless/skills/aws-lambda-managed-instances/references/configuration-guide.md b/...erverless/skills/aws-lambda-managed-instances/references/configuration-guide.md
@@ -5,17 +5,17 @@
 - **CPU-intensive** (encoding, ML, compression) → C-series, 2:1 ratio, concurrency=1/vCPU
 - **Memory-intensive** (caching, large datasets) → R-series, 8:1 ratio
 - **Network-intensive** (streaming, data transfer) → Use AllowedInstanceTypes for n-suffix types, 4:1 ratio
-- **General/balanced** (web APIs, microservices) → M-series, 4:1 ratio, default concurrency
+- **General/balanced** (web APIs, microservices) → M-series, 2:1 ratio (default), default concurrency
 
 Architecture: ARM (Graviton, g-suffix) for price-performance. x86 (i=Intel, a=AMD) when dependencies require it.
 
 ## Memory-to-vCPU Ratios
 
-| Ratio | Profile | When to use                | Memory examples       |
-| ----- | ------- | -------------------------- | --------------------- |
-| 2:1   | Compute | CPU-bound work             | 2GB/1vCPU, 4GB/2vCPU  |
-| 4:1   | General | Most workloads (default)   | 4GB/1vCPU, 8GB/2vCPU  |
-| 8:1   | Memory  | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU |
+| Ratio | Profile | When to use                      | Memory examples       |
+| ----- | ------- | -------------------------------- | --------------------- |
+| 2:1   | Compute | CPU-bound work (default)         | 2GB/1vCPU, 4GB/2vCPU  |
+| 4:1   | General | Mixed CPU/memory-heavy workloads | 4GB/1vCPU, 8GB/2vCPU  |
+| 8:1   | Memory  | Memory-heavy or Python apps      | 8GB/1vCPU, 16GB/2vCPU |
 
 Min: 2 GB / 1 vCPU. Max: 32 GB. Memory must align with ratio multiples.
 

diff --git a/...ws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md b/...ws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md
@@ -1,5 +1,35 @@
 # LMI Troubleshooting
 
+## Testing Phase: Monitor and Adjust
+
+After deploying your LMI function with a test workload, check these metrics and adjust:
+
+**Duration increased vs. existing function:**
+- This indicates the concurrency estimations used during setup may be off. Investigate by:
+  - Checking ExecutionEnvironmentCPUUtilization and ExecutionEnvironmentMemoryUtilization for saturation
+  - Reducing PerExecutionEnvironmentMaxConcurrency to see if duration improves
+  - Reviewing instance types — switching to larger or more powerful instances may help if resources are constrained
+- If reducing concurrency doesn't help, check throttle metrics below
+
+**Low ExecutionEnvironmentCPUUtilization (below 10%):**
+- Increase PerExecutionEnvironmentMaxConcurrency to improve utilization
+- Or lower MemorySize to reduce vCPUs per execution environment
+- If memory utilization is also high, increase ExecutionEnvironmentMemoryGiBPerVCpu ratio instead
+
+**Ongoing CPUThrottles:**
+- Switch capacity provider to Manual scaling mode with a lower CPU utilization target (e.g., 25%)
+
+**Ongoing MemoryThrottles:**
+- Increase MemorySize
+- To maintain the same vCPU count, adjust ratio proportionally (e.g., 4GB/2:1 → 8GB/4:1 keeps 2 vCPUs)
+
+**Ongoing DiskThrottles:**
+- Reduce per-invocation /tmp usage or reduce PerExecutionEnvironmentMaxConcurrency
+
+**Ongoing ConcurrencyThrottles:**
+- Increase PerExecutionEnvironmentMaxConcurrency (if CPU and memory have headroom)
+- Check if MaxExecutionEnvironments or MaxVCpuCount is capping scale-out
+
 ## Common Issues
 
 | Issue                          | Cause                                            | Resolution                                                                          |