[AzureBackup] Fix telemetry misclassification, improve error UX, add command-boundary validation#2880
Conversation
- Update argument-hint to cover both telemetry and customer reports - Align icmdataro cluster name to icmdataro.centralus (matches query syntax) - Add field casing note to kql-customer-queries.md with cross-ref to kql-queries.md
…command-boundary validation - DppBackupOperations: InvalidOperationException -> KeyNotFoundException for policy-not-found fallback (fixes telemetry misclassification as MCP Bug instead of Customer error) - BackupStatusCommand: Improve 403 error message with RBAC hint (backupStatus/action permission, Backup Reader role, vault_get+protecteditem_get alternative) - VaultGetCommand: Add 403 error handler with RBAC guidance - GovernanceFindUnprotectedCommand: Add GetErrorMessage override with 403 RBAC hint - PolicyCreateValidator: Reject unknown --workload-type at command boundary with actionable list of supported types (fixes microsoft#2727) - ProtectedItemProtectCommand: Add --datasource-type validator checking both RSV and DPP registries at command boundary (fixes microsoft#2727) Telemetry context: Jun 9-15 analysis showed 16 MCP Tool Bugs. After PR microsoft#2805 (merged), the remaining InvalidOperationException in DppBackupOperations was the last misclassified throw. External customer failures (Antares Capital 6x 403 on backup_status, SIEMENS mcp-soc-scanner ValidationError) drove the error UX improvements.
There was a problem hiding this comment.
Pull request overview
This PR updates the Azure Backup MCP tool to (1) reduce telemetry “MCP Tool Bug” misclassification by using more appropriate exception types, (2) provide more actionable customer-facing authorization errors, and (3) add command-boundary validation for workload/datasource tokens to prevent service-layer ArgumentException surfacing as 500s.
Changes:
- Reclassify DPP policy “not found / not parseable” fallback from
InvalidOperationExceptiontoKeyNotFoundExceptionfor more accurate telemetry. - Improve 403 (Forbidden) UX for
backup status,vault get, andgovernance find-unprotected. - Add command-boundary validation for
--workload-type(policy create validator) and--datasource-type(protecteditem protect).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/Azure.Mcp.Tools.AzureBackup/src/Services/Policy/PolicyCreateValidator.cs | Reject unknown --workload-type early with an actionable 400-style validation issue. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Services/DppBackupOperations.cs | Throw KeyNotFoundException for policy-not-found fallback to improve telemetry classification. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Vault/VaultGetCommand.cs | Add 403 handler with RBAC guidance. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/ProtectedItem/ProtectedItemProtectCommand.cs | Add --datasource-type validator at the command boundary. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Governance/GovernanceFindUnprotectedCommand.cs | Add command-specific error messaging (ArgumentException + 403 guidance). |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Backup/BackupStatusCommand.cs | Improve 403 guidance with required permission and alternate workflow. |
| tools/Azure.Mcp.Tools.AzureBackup/skills/azurebackup-telemetry-report/SKILL.md | Broaden skill “argument-hint” wording and clarify C360 cluster reference. |
| tools/Azure.Mcp.Tools.AzureBackup/skills/azurebackup-telemetry-report/references/kql-customer-queries.md | Add note about KQL bag key casing normalization and troubleshooting. |
| command.Validators.Add(commandResult => | ||
| { | ||
| if (commandResult.HasOptionResult(AzureBackupOptionDefinitions.DatasourceType.Name)) | ||
| { | ||
| var value = commandResult.GetValue<string>(AzureBackupOptionDefinitions.DatasourceType.Name); | ||
| if (!string.IsNullOrEmpty(value) && | ||
| Services.RsvDatasourceRegistry.Resolve(value) is null && | ||
| !Services.DppDatasourceRegistry.AllProfiles.Any(p => | ||
| p.FriendlyName.Equals(value, StringComparison.OrdinalIgnoreCase) || | ||
| p.Aliases.Any(a => a.Equals(value, StringComparison.OrdinalIgnoreCase)))) | ||
| { | ||
| commandResult.AddError( | ||
| $"Unknown datasource type '{value}'. " + | ||
| $"RSV types: {string.Join(", ", Services.RsvDatasourceRegistry.KnownTypeNames)}. " + | ||
| $"DPP types: {string.Join(", ", Services.DppDatasourceRegistry.KnownTypeNames)}."); | ||
| } | ||
| } | ||
| }); |
| command.Validators.Add(commandResult => | ||
| { | ||
| if (commandResult.HasOptionResult(AzureBackupOptionDefinitions.DatasourceType.Name)) | ||
| { | ||
| var value = commandResult.GetValue<string>(AzureBackupOptionDefinitions.DatasourceType.Name); |
| RequestFailedException reqEx when reqEx.Status == (int)HttpStatusCode.Forbidden => | ||
| "Authorization failed listing vaults. Ensure you have 'Reader' or 'Backup Reader' role at subscription scope.", |
| if (family == WorkloadFamily.Unknown && !string.IsNullOrWhiteSpace(workload)) | ||
| { | ||
| issues.Add(new PolicyValidationIssue( | ||
| $"--{AzureBackupOptionDefinitions.WorkloadTypeName}", | ||
| $"Unknown workload type '{workload}'. Supported types: " + |
… tests - ProtectedItemProtectCommand: Remove Services. prefix, add ArmResourceType and TryAutoDetect checks, use IsNullOrWhiteSpace to catch whitespace-only inputs - VaultGetCommand: Change 'listing vaults' to 'accessing vault information' for neutral phrasing across single/list paths - Add ProtectedItemProtectCommandTests: RejectsUnknownDatasourceType (garbage, SQL injection, whitespace) and AcceptsValidDatasourceType (RSV types, DPP types, ARM resource types, auto-detect base types) - Add PolicyCreateValidatorTests: Validate_UnknownWorkloadType_RejectsWithActionableMessage
| command.Validators.Add(commandResult => | ||
| { | ||
| OptionResult? optionResult = commandResult.GetResult(AzureBackupOptionDefinitions.DatasourceType); | ||
| if (optionResult is null || optionResult.Implicit) |
There was a problem hiding this comment.
The Implicit doesn't really do much here as the option was optional without a default value. So nullness checking is already handling implicit checking.
| return; | ||
| } | ||
|
|
||
| var value = optionResult.Tokens.LastOrDefault()?.Value ?? string.Empty; |
There was a problem hiding this comment.
Given the type is string? option handling implicitly configures arity to ZeroOrOne, LastOrDefault() here doesn't make much sense.
| DppDatasourceRegistry.TryAutoDetect(normalizedValue) is null && | ||
| !DppDatasourceRegistry.AllProfiles.Any(p => | ||
| p.FriendlyName.Equals(normalizedValue, StringComparison.OrdinalIgnoreCase) || | ||
| p.ArmResourceType.Equals(normalizedValue, StringComparison.OrdinalIgnoreCase) || | ||
| p.Aliases.Any(a => a.Equals(normalizedValue, StringComparison.OrdinalIgnoreCase)))) |
There was a problem hiding this comment.
Why doesn't DppDatasourceRegistry.Resolve get used here?
| commandResult.AddError( | ||
| $"Unknown datasource type '{value}'. " + | ||
| $"RSV types: {string.Join(", ", RsvDatasourceRegistry.KnownTypeNames)}. " + | ||
| $"DPP types: {string.Join(", ", DppDatasourceRegistry.KnownTypeNames)}."); |
There was a problem hiding this comment.
DppDatasourceRegistry.KnownTypeNames doesn't include ArmResourceType in KnownTypeNames but we use it to determine if it's valid. Should make those consistent.
| if (string.IsNullOrEmpty(normalizedValue) || | ||
| RsvDatasourceRegistry.Resolve(normalizedValue) is null && | ||
| DppDatasourceRegistry.TryAutoDetect(normalizedValue) is null && | ||
| !DppDatasourceRegistry.AllProfiles.Any(p => | ||
| p.FriendlyName.Equals(normalizedValue, StringComparison.OrdinalIgnoreCase) || | ||
| p.ArmResourceType.Equals(normalizedValue, StringComparison.OrdinalIgnoreCase) || | ||
| p.Aliases.Any(a => a.Equals(normalizedValue, StringComparison.OrdinalIgnoreCase)))) |
There was a problem hiding this comment.
We should take into account ProtectedItemProtectOptions.VaultType when validating. DPP or RSV can be passed and we'll match on either but then fail later on after this spot.
Summary
Fix telemetry misclassification, improve customer-facing error messages, and add command-boundary validation for
--workload-typeand--datasource-typeparameters.Changes (6 files, 49 insertions, 3 deletions)
P1 - Telemetry Misclassification Fix
InvalidOperationExceptiontoKeyNotFoundExceptionfor policy-not-found fallback inGetPolicyAsync. This was the last remainingInvalidOperationExceptionthrow in the Azure Backup toolset. Eliminates false MCP Bug counts in telemetry - the error is now correctly classified as a Customer error (404 resource not found).P2 - Error UX Improvements (driven by Jun 9-15 telemetry)
Microsoft.RecoveryServices/locations/backupStatus/actionpermission, suggestsBackup Readerrole at subscription scope, and recommendsvault_get + protecteditem_getas an alternative workflow. Directly addresses Antares Capital 6x 403 pattern.GetErrorMessageoverride - handlesArgumentException, 403 with RBAC hint, and generalRequestFailedException. Previously all errors fell through to the base class default message.P2 - Command-Boundary Validation (fixes #2727)
--workload-typevalues at the command boundary with an actionable list of all supported types (VM, SQL, SAPHANA, SAPASE, AzureFileShare, AzureDisk, AzureBlob, AKS, ElasticSAN, PostgreSQLFlexible, ADLS, CosmosDB).--datasource-typethat checks both RSV (RsvDatasourceRegistry) and DPP (DppDatasourceRegistry) registries at command boundary.Telemetry Context
Jun 9-15, 2026 analysis (334 calls, 188 failures, 16 MCP Tool Bugs):
InvalidOperationException422 fromListVaultsAsyncTask.WhenAll)InvalidOperationExceptionin DPP policy operationsInvalidOperationExceptionthrows remain in Azure Backup toolset (except the legitimate both-fail fallback)Invoking Livetests
Copilot submitted PRs are not trustworthy by default. Users with
writeaccess to the repo need to validate the contents of this PR before leaving a comment with the text/azp run mcp - pullrequest - live. This will trigger the necessary livetest workflows to complete required validation.