Skip to content

Add azure.ai.training extension#8130

Open
saanikaguptamicrosoft wants to merge 28 commits into
Azure:mainfrom
saanikaguptamicrosoft:foundry-training-dev
Open

Add azure.ai.training extension#8130
saanikaguptamicrosoft wants to merge 28 commits into
Azure:mainfrom
saanikaguptamicrosoft:foundry-training-dev

Conversation

@saanikaguptamicrosoft
Copy link
Copy Markdown
Collaborator

No description provided.

achauhan-scc and others added 28 commits March 18, 2026 12:31
* adding design detaiils for command job CLI

* adding more details

* adding dedup details

* adding api details

* adding execution plan

* adding draft version of custom training commands
…to show

- Make job name optional in YAML; auto-generate {adj}_{noun}_{suffix} (matching AML SDK)
- Fix buildProjectEndpoint to use services.ai.azure.com (not cognitiveservices.azure.com)
- Rename 'job get' to 'job show' to match models/finetune extensions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* adding design detaiils for command job CLI

* adding more details

* adding dedup details

* adding api details

* adding execution plan

* adding draft version of custom training commands

* integrating with API
…rt (Azure#7203)

- Add --skip-token flag for pagination with next-page UX message
- Add --tag and --properties flags for server-side filtering
- Add --include-archived flag for listViewType control
- Add SystemData (createdBy, createdAt) to job list output
- Update doDataPlane() to support variadic query params

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… code, and input resolution (Azure#7205)

- Rename job create command to job submit for consistency with finetune extension
- Add resolver interfaces: ComputeResolver, CodeResolver, InputResolver
- Add JobResolver orchestrator that resolves all references in JobDefinition
- Wire resolver into submit flow before buildJobResource()
- Stub implementations guide users to provide full ARM IDs / remote URIs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Custom training (Azure#7180)

* adding design detaiils for command job CLI

* adding more details

* adding dedup details

* adding api details

* adding execution plan

* adding draft version of custom training commands

* integrating with API

* adding -e -s override

* fixing asset resolution

* custom training: enhance job show, fix asset resolution, add full resource config support

- Enhanced job show with rich output: run history, metrics, artifacts, timing, compute info
- Added client APIs for run history, metrics, and artifacts endpoints
- Fixed dataset version field: json:dataType -> json:type
- Fixed input/output mode mapping: ro_mount -> ReadOnlyMount, rw_mount -> ReadWriteMount
- Added full resource config support: instanceType, shmSize, dockerArgs, properties
- Added ResourceDefinition YAML struct with AISuperComputer properties pass-through
- Backward compatible: flat instance_count still works

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* custom training: add spinner progress to job show command

Show animated spinner with progress text while fetching job details.
Updates text as each parallel fetch (run history, metrics, artifacts)
completes, showing remaining items until all data is loaded.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 11, 2026 11:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new first-party azd extension (azure.ai.training) that provides CLI commands and supporting client/service layers for Azure AI Foundry training jobs (submit, list, stream logs, download artifacts, and SSH connectivity), plus environment validation/initialization helpers.

Changes:

  • Introduces the azure.ai.training Go extension module (build/test scripts, metadata, and initial versioning).
  • Adds a Foundry/AML data-plane client + models, and service-layer helpers for uploads (azcopy), streaming logs, artifact downloads, and SSH proxy tunneling.
  • Updates CODEOWNERS to include the new extension path.

Reviewed changes

Copilot reviewed 75 out of 76 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cli/azd/extensions/azure.ai.training/version.txt Initial extension version.
cli/azd/extensions/azure.ai.training/extension.yaml Extension manifest (id/namespace/usage/examples).
cli/azd/extensions/azure.ai.training/go.mod New Go module for the extension and its dependencies.
cli/azd/extensions/azure.ai.training/main.go Extension entrypoint wiring Cobra root command.
cli/azd/extensions/azure.ai.training/ci-build.ps1 CI build script for the extension.
cli/azd/extensions/azure.ai.training/ci-test.ps1 CI unit test runner script for the extension.
cli/azd/extensions/azure.ai.training/build.sh Local multi-platform build script.
cli/azd/extensions/azure.ai.training/build.ps1 Local multi-platform build script (PowerShell).
cli/azd/extensions/azure.ai.training/pkg/models/common.go Shared API model types (inputs/outputs/resources/errors/table flattening).
cli/azd/extensions/azure.ai.training/pkg/models/job.go Job resource and command job properties model.
cli/azd/extensions/azure.ai.training/pkg/models/history.go Run history and log-file SAS URI response models.
cli/azd/extensions/azure.ai.training/pkg/models/metrics.go Metrics list/full request/response models.
cli/azd/extensions/azure.ai.training/pkg/models/serviceinstance.go AML history serviceinstances response models.
cli/azd/extensions/azure.ai.training/pkg/models/artifact.go Job artifact list/contentinfo response models.
cli/azd/extensions/azure.ai.training/pkg/models/dataset.go Dataset upload/pending upload + dataset version models.
cli/azd/extensions/azure.ai.training/pkg/models/download.go Model/dataset credential and AML artifact download models.
cli/azd/extensions/azure.ai.training/pkg/client/client.go Core HTTP client (auth, versioning, error handling) for Foundry APIs.
cli/azd/extensions/azure.ai.training/pkg/client/jobs.go Job CRUD operations.
cli/azd/extensions/azure.ai.training/pkg/client/metrics.go Metrics list/full operations.
cli/azd/extensions/azure.ai.training/pkg/client/history.go Run history + tracking-endpoint parsing.
cli/azd/extensions/azure.ai.training/pkg/client/history_test.go Unit tests for tracking endpoint parsing.
cli/azd/extensions/azure.ai.training/pkg/client/serviceinstances.go AML history serviceinstances calls (typed + raw JSON).
cli/azd/extensions/azure.ai.training/pkg/client/artifacts.go Artifact listing + content/contentinfo operations.
cli/azd/extensions/azure.ai.training/pkg/client/download.go Model/dataset credential calls + AML history artifact operations.
cli/azd/extensions/azure.ai.training/pkg/client/datasets.go Dataset version CRUD + startPendingUpload operations.
cli/azd/extensions/azure.ai.training/pkg/client/blob.go Direct SAS URI blob content fetch helper (bounded read).
cli/azd/extensions/azure.ai.training/internal/utils/yaml_parser.go YAML job definition structs and parsing/path resolution helpers.
cli/azd/extensions/azure.ai.training/internal/utils/output.go Table/JSON output formatting utilities.
cli/azd/extensions/azure.ai.training/internal/utils/job_name_generator.go Auto job-name generation helper.
cli/azd/extensions/azure.ai.training/internal/utils/environment.go Reads azd environment values and defines extension env keys.
cli/azd/extensions/azure.ai.training/internal/utils/uami.go UAMI presence detection + user messaging helpers.
cli/azd/extensions/azure.ai.training/internal/utils/uami_test.go Unit tests for UAMI helpers.
cli/azd/extensions/azure.ai.training/internal/service/hash.go Directory hashing + version truncation for dedup uploads.
cli/azd/extensions/azure.ai.training/internal/service/upload_service.go Dataset upload flow (pending upload → azcopy → confirm) with dedup logic.
cli/azd/extensions/azure.ai.training/internal/service/input_resolver.go Upload-and-resolve for YAML input directories (dedup + collision fallback).
cli/azd/extensions/azure.ai.training/internal/service/code_resolver.go Upload-and-resolve for YAML code directory (dedup + collision fallback).
cli/azd/extensions/azure.ai.training/internal/service/compute_resolver.go Stub compute name → ARM ID resolver.
cli/azd/extensions/azure.ai.training/internal/service/resolver.go Orchestrates compute/code/input resolution for job definitions.
cli/azd/extensions/azure.ai.training/internal/service/stream_service.go Log polling/streaming implementation using tracking endpoint + SAS log files.
cli/azd/extensions/azure.ai.training/internal/service/stream_service_test.go Unit tests for polling interval, filtering, and endpoint extraction.
cli/azd/extensions/azure.ai.training/internal/download/download.go Parallel artifact downloader with retries + path traversal protection.
cli/azd/extensions/azure.ai.training/internal/download/download_test.go Unit tests for retryability and safe path joining.
cli/azd/extensions/azure.ai.training/internal/azcopy/installer.go Secure-ish azcopy downloader/installer with host allowlist + size limits.
cli/azd/extensions/azure.ai.training/internal/cmd/root.go Cobra root command + global flags.
cli/azd/extensions/azure.ai.training/internal/cmd/version.go version command (build-time populated fields).
cli/azd/extensions/azure.ai.training/internal/cmd/metadata.go Hidden metadata generator for extension framework.
cli/azd/extensions/azure.ai.training/internal/cmd/job.go job command group + shared env validation pre-run.
cli/azd/extensions/azure.ai.training/internal/cmd/job_list.go job list implementation + pagination hinting.
cli/azd/extensions/azure.ai.training/internal/cmd/job_submit.go job submit implementation (parse/validate/resolve/upload/submit).
cli/azd/extensions/azure.ai.training/internal/cmd/job_validate.go Offline YAML validation command (job validate).
cli/azd/extensions/azure.ai.training/internal/cmd/job_stream.go job stream command using StreamService.
cli/azd/extensions/azure.ai.training/internal/cmd/job_delete.go job delete implementation with interactive confirmation.
cli/azd/extensions/azure.ai.training/internal/cmd/job_cancel.go job cancel implementation.
cli/azd/extensions/azure.ai.training/internal/cmd/job_ssh_proxy.go Hidden ProxyCommand WebSocket tunnel for SSH.
cli/azd/extensions/azure.ai.training/internal/cmd/job_show_services.go job show-services JSON output shaping for AML serviceinstances.
cli/azd/extensions/azure.ai.training/internal/cmd/validation.go Environment validation/implicit init + flag override logic.
cli/azd/extensions/azure.ai.training/internal/cmd/uami.go Lazy/cached UAMI gating for job submit.
cli/azd/extensions/azure.ai.training/internal/cmd/validation_test.go Unit tests for project endpoint parsing + env-name sanitization.
cli/azd/extensions/azure.ai.training/internal/cmd/job_ssh_proxy_test.go Unit tests for ws/wss tunnel URL building.
cli/azd/extensions/azure.ai.training/internal/cmd/job_show_services_test.go Unit tests for serviceinstances JSON transformation helpers.
cli/azd/extensions/azure.ai.training/internal/cmd/job_download_test.go Unit tests for download-mode selection and tracking endpoint extraction.
cli/azd/extensions/azure.ai.training/internal/cmd/job_connect_ssh_test.go Unit tests for SSH proxy endpoint resolution/pattern validation.
cli/azd/extensions/azure.ai.training/internal/cmd/init_template_test.go Unit tests for init scaffolding helpers (URL detection, copy helpers).
.github/CODEOWNERS Adds code owner coverage for the new extension path.

Comment on lines +75 to +80
if existing != nil {
// Version record exists — check the sentinel tag to verify upload completed
storedHash, hasTag := existing.Tags["contentHash"]

if !hasTag || storedHash == "" {
// Zombie: POST created the version but azcopy/PATCH never completed.
Comment on lines +183 to +200
// TestSafeJoin_OSAbsolutePath documents that an OS-absolute path supplied as
// relPath is *not* treated as an escape: filepath.Join strips the leading
// separator / drive letter on both POSIX and Windows, so the result is safely
// re-rooted under destDir. We assert the resolved path stays inside dest.
func TestSafeJoin_OSAbsolutePath(t *testing.T) {
dest := t.TempDir()
other := t.TempDir() // a different absolute path
abs, err := filepath.Abs(other)
require.NoError(t, err)

got, err := safeJoin(dest, abs)
require.NoError(t, err)
absDest, _ := filepath.Abs(dest)
absGot, _ := filepath.Abs(got)
assert.True(t,
strings.HasPrefix(absGot, absDest+string(filepath.Separator)) || absGot == absDest,
"expected %q to be re-rooted under %q", absGot, absDest,
)
Comment on lines +72 to +83
// GenerateJobName generates a human-friendly job name in the format {adjective}_{noun}_{suffix}.
// This matches the Azure ML SDK name generation pattern.
func GenerateJobName() string {
adj := allowedAdjectives[rand.Intn(len(allowedAdjectives))]
noun := allowedNouns[rand.Intn(len(allowedNouns))]

suffix := make([]byte, suffixLength)
for i := range suffix {
suffix[i] = allowedChars[rand.Intn(len(allowedChars))]
}

return strings.Join([]string{adj, noun, string(suffix)}, "_")
@@ -0,0 +1,92 @@
module azure.ai.training

go 1.25
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.1
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/cognitiveservices/armcognitiveservices v1.8.0
github.com/azure/azure-dev/cli/azd v0.0.0-20260122173819-89795b295491
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants