diff --git a/TSG/ARM/README.md b/TSG/ARM/README.md new file mode 100644 index 00000000..dfae9e58 --- /dev/null +++ b/TSG/ARM/README.md @@ -0,0 +1,21 @@ +# ARM Telemetry + +ARM telemetry for a deployment, update, or even status can be queried using a query like: + +```KQL +cluster("https://armprodgbl.eastus.kusto.windows.net").database('ARMProd').Unionizer('Deployments', 'DeploymentOperations') +| where providerNamespace contains "AzureStackHCI" +| take 50; +``` + +the above query is for a deployment, but there are a number of [other databases](https://eng.ms/docs/cloud-ai-platform/azure-core/azure-cloud-native-and-management-platform/control-plane-bburns/azure-resource-reporting/azure-resource-reporting/dataconsumeronboarding/armdata/kustov2/overview_prod) that can be accessed, as listed on that page: + +| database | tables | +|---------|----------| +| Requests | EventServiceEntries, HttpIncomingRequests, HttpOutgoingRequests | +| Deployments | DeploymentOperations, Deployments, PreflightEvents | +| Traces | Errors, Traces | +| Providers | ProviderErrors, ProviderTraces | +| Jobs | JobDefinitions, JobDispatchingErrors, JobErrors, JobExecutionStatus, JobHistory, JobOperations, JobStatus, JobThrottles, JobTrace| +| Storage | Compactions, Diagnostics, RedisOperations, RegionalStoreAdminLogs, RegionalStoreAdminTraces, RegionalStoreConfigurationLogs, RegionalStoreGarnetServerLogs, RegionalStoreGarnetServerTraces, RegionalStoreHealthCheckLogs, RegionalStoreJobEngineLogs, RegionalStoreServerLogs, RegionalStoreServerTraces, RegionalStoreService, StorageOperations, StorageRequests | +| General | APIValidationErrors, APIValidationTraces, AppPerfCounters, ArmHttpOutgoingRequests, CapacityErrors, CapacityTraces, ClientErrors, ClientRequests, ClientTelemetry, ClientTraces, DispatcherErrors, DispatcherEvents, DispatcherTraces, IISHttpErrors, IISLogs, ManifestRegistrations, MarketplaceErrors, MarketplaceTraces, PolicyServiceDebug, PolicyServiceError, PolicyServiceWarning, ResourceDeletions, ResourceGroupDeletions, Service, SubscriptionProvisioningRequests2, SysPerfCounters, WindowsEvents | diff --git a/TSG/Lifecycle/README.md b/TSG/Lifecycle/README.md index 28378d77..735b8b28 100644 --- a/TSG/Lifecycle/README.md +++ b/TSG/Lifecycle/README.md @@ -1,3 +1,14 @@ # Infra Lifecycle Operations -* [Add node, repair node fails with Type 'AddAsZHostToDomain' of Role 'BareMetal' raised an exception after cluster upgrade fail when upgraded from <=2311](./Add-node-repair-node-fails-with-Type-AddAsZHostToDomain-of-Role-BareMetal-raised-an-exception.md) \ No newline at end of file +* [Add node, repair node fails with Type 'AddAsZHostToDomain' of Role 'BareMetal' raised an exception after cluster upgrade fail when upgraded from <=2311](./Add-node-repair-node-fails-with-Type-AddAsZHostToDomain-of-Role-BareMetal-raised-an-exception.md) + +The status of any ARC Node can be ascertained by looking at it's Census records. A sample query could be: + +```KQL +cluster("https://aeoprodtelemetry.eastus.kusto.windows.net").database("Telemetry").Census +| where AEODeviceARMResourceUri =~ '' +| where AEOClusterNodeName =~ "Node name" +| take 20 +| order by PreciseTimeStamp desc +``` +Regular census reports indicate a healthy node. The 'EventString' column provides detailed information of various components of the Arc Node reporting in.