Add azure.ai.training extension#8130
Open
saanikaguptamicrosoft wants to merge 28 commits into
Open
Conversation
* adding design detaiils for command job CLI * adding more details * adding dedup details * adding api details * adding execution plan * adding draft version of custom training commands
…to show
- Make job name optional in YAML; auto-generate {adj}_{noun}_{suffix} (matching AML SDK)
- Fix buildProjectEndpoint to use services.ai.azure.com (not cognitiveservices.azure.com)
- Rename 'job get' to 'job show' to match models/finetune extensions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* adding design detaiils for command job CLI * adding more details * adding dedup details * adding api details * adding execution plan * adding draft version of custom training commands * integrating with API
…rt (Azure#7203) - Add --skip-token flag for pagination with next-page UX message - Add --tag and --properties flags for server-side filtering - Add --include-archived flag for listViewType control - Add SystemData (createdBy, createdAt) to job list output - Update doDataPlane() to support variadic query params Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… code, and input resolution (Azure#7205) - Rename job create command to job submit for consistency with finetune extension - Add resolver interfaces: ComputeResolver, CodeResolver, InputResolver - Add JobResolver orchestrator that resolves all references in JobDefinition - Wire resolver into submit flow before buildJobResource() - Stub implementations guide users to provide full ARM IDs / remote URIs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Custom training (Azure#7180) * adding design detaiils for command job CLI * adding more details * adding dedup details * adding api details * adding execution plan * adding draft version of custom training commands * integrating with API * adding -e -s override * fixing asset resolution * custom training: enhance job show, fix asset resolution, add full resource config support - Enhanced job show with rich output: run history, metrics, artifacts, timing, compute info - Added client APIs for run history, metrics, and artifacts endpoints - Fixed dataset version field: json:dataType -> json:type - Fixed input/output mode mapping: ro_mount -> ReadOnlyMount, rw_mount -> ReadWriteMount - Added full resource config support: instanceType, shmSize, dockerArgs, properties - Added ResourceDefinition YAML struct with AISuperComputer properties pass-through - Backward compatible: flat instance_count still works Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * custom training: add spinner progress to job show command Show animated spinner with progress text while fetching job details. Updates text as each parallel fetch (run history, metrics, artifacts) completes, showing remaining items until all data is loaded. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…zure#7892) This reverts commit 5216202.
…oint flags over stored env values (Azure#8093)
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new first-party azd extension (azure.ai.training) that provides CLI commands and supporting client/service layers for Azure AI Foundry training jobs (submit, list, stream logs, download artifacts, and SSH connectivity), plus environment validation/initialization helpers.
Changes:
- Introduces the
azure.ai.trainingGo extension module (build/test scripts, metadata, and initial versioning). - Adds a Foundry/AML data-plane client + models, and service-layer helpers for uploads (azcopy), streaming logs, artifact downloads, and SSH proxy tunneling.
- Updates CODEOWNERS to include the new extension path.
Reviewed changes
Copilot reviewed 75 out of 76 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/extensions/azure.ai.training/version.txt | Initial extension version. |
| cli/azd/extensions/azure.ai.training/extension.yaml | Extension manifest (id/namespace/usage/examples). |
| cli/azd/extensions/azure.ai.training/go.mod | New Go module for the extension and its dependencies. |
| cli/azd/extensions/azure.ai.training/main.go | Extension entrypoint wiring Cobra root command. |
| cli/azd/extensions/azure.ai.training/ci-build.ps1 | CI build script for the extension. |
| cli/azd/extensions/azure.ai.training/ci-test.ps1 | CI unit test runner script for the extension. |
| cli/azd/extensions/azure.ai.training/build.sh | Local multi-platform build script. |
| cli/azd/extensions/azure.ai.training/build.ps1 | Local multi-platform build script (PowerShell). |
| cli/azd/extensions/azure.ai.training/pkg/models/common.go | Shared API model types (inputs/outputs/resources/errors/table flattening). |
| cli/azd/extensions/azure.ai.training/pkg/models/job.go | Job resource and command job properties model. |
| cli/azd/extensions/azure.ai.training/pkg/models/history.go | Run history and log-file SAS URI response models. |
| cli/azd/extensions/azure.ai.training/pkg/models/metrics.go | Metrics list/full request/response models. |
| cli/azd/extensions/azure.ai.training/pkg/models/serviceinstance.go | AML history serviceinstances response models. |
| cli/azd/extensions/azure.ai.training/pkg/models/artifact.go | Job artifact list/contentinfo response models. |
| cli/azd/extensions/azure.ai.training/pkg/models/dataset.go | Dataset upload/pending upload + dataset version models. |
| cli/azd/extensions/azure.ai.training/pkg/models/download.go | Model/dataset credential and AML artifact download models. |
| cli/azd/extensions/azure.ai.training/pkg/client/client.go | Core HTTP client (auth, versioning, error handling) for Foundry APIs. |
| cli/azd/extensions/azure.ai.training/pkg/client/jobs.go | Job CRUD operations. |
| cli/azd/extensions/azure.ai.training/pkg/client/metrics.go | Metrics list/full operations. |
| cli/azd/extensions/azure.ai.training/pkg/client/history.go | Run history + tracking-endpoint parsing. |
| cli/azd/extensions/azure.ai.training/pkg/client/history_test.go | Unit tests for tracking endpoint parsing. |
| cli/azd/extensions/azure.ai.training/pkg/client/serviceinstances.go | AML history serviceinstances calls (typed + raw JSON). |
| cli/azd/extensions/azure.ai.training/pkg/client/artifacts.go | Artifact listing + content/contentinfo operations. |
| cli/azd/extensions/azure.ai.training/pkg/client/download.go | Model/dataset credential calls + AML history artifact operations. |
| cli/azd/extensions/azure.ai.training/pkg/client/datasets.go | Dataset version CRUD + startPendingUpload operations. |
| cli/azd/extensions/azure.ai.training/pkg/client/blob.go | Direct SAS URI blob content fetch helper (bounded read). |
| cli/azd/extensions/azure.ai.training/internal/utils/yaml_parser.go | YAML job definition structs and parsing/path resolution helpers. |
| cli/azd/extensions/azure.ai.training/internal/utils/output.go | Table/JSON output formatting utilities. |
| cli/azd/extensions/azure.ai.training/internal/utils/job_name_generator.go | Auto job-name generation helper. |
| cli/azd/extensions/azure.ai.training/internal/utils/environment.go | Reads azd environment values and defines extension env keys. |
| cli/azd/extensions/azure.ai.training/internal/utils/uami.go | UAMI presence detection + user messaging helpers. |
| cli/azd/extensions/azure.ai.training/internal/utils/uami_test.go | Unit tests for UAMI helpers. |
| cli/azd/extensions/azure.ai.training/internal/service/hash.go | Directory hashing + version truncation for dedup uploads. |
| cli/azd/extensions/azure.ai.training/internal/service/upload_service.go | Dataset upload flow (pending upload → azcopy → confirm) with dedup logic. |
| cli/azd/extensions/azure.ai.training/internal/service/input_resolver.go | Upload-and-resolve for YAML input directories (dedup + collision fallback). |
| cli/azd/extensions/azure.ai.training/internal/service/code_resolver.go | Upload-and-resolve for YAML code directory (dedup + collision fallback). |
| cli/azd/extensions/azure.ai.training/internal/service/compute_resolver.go | Stub compute name → ARM ID resolver. |
| cli/azd/extensions/azure.ai.training/internal/service/resolver.go | Orchestrates compute/code/input resolution for job definitions. |
| cli/azd/extensions/azure.ai.training/internal/service/stream_service.go | Log polling/streaming implementation using tracking endpoint + SAS log files. |
| cli/azd/extensions/azure.ai.training/internal/service/stream_service_test.go | Unit tests for polling interval, filtering, and endpoint extraction. |
| cli/azd/extensions/azure.ai.training/internal/download/download.go | Parallel artifact downloader with retries + path traversal protection. |
| cli/azd/extensions/azure.ai.training/internal/download/download_test.go | Unit tests for retryability and safe path joining. |
| cli/azd/extensions/azure.ai.training/internal/azcopy/installer.go | Secure-ish azcopy downloader/installer with host allowlist + size limits. |
| cli/azd/extensions/azure.ai.training/internal/cmd/root.go | Cobra root command + global flags. |
| cli/azd/extensions/azure.ai.training/internal/cmd/version.go | version command (build-time populated fields). |
| cli/azd/extensions/azure.ai.training/internal/cmd/metadata.go | Hidden metadata generator for extension framework. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job.go | job command group + shared env validation pre-run. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_list.go | job list implementation + pagination hinting. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_submit.go | job submit implementation (parse/validate/resolve/upload/submit). |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_validate.go | Offline YAML validation command (job validate). |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_stream.go | job stream command using StreamService. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_delete.go | job delete implementation with interactive confirmation. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_cancel.go | job cancel implementation. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_ssh_proxy.go | Hidden ProxyCommand WebSocket tunnel for SSH. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_show_services.go | job show-services JSON output shaping for AML serviceinstances. |
| cli/azd/extensions/azure.ai.training/internal/cmd/validation.go | Environment validation/implicit init + flag override logic. |
| cli/azd/extensions/azure.ai.training/internal/cmd/uami.go | Lazy/cached UAMI gating for job submit. |
| cli/azd/extensions/azure.ai.training/internal/cmd/validation_test.go | Unit tests for project endpoint parsing + env-name sanitization. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_ssh_proxy_test.go | Unit tests for ws/wss tunnel URL building. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_show_services_test.go | Unit tests for serviceinstances JSON transformation helpers. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_download_test.go | Unit tests for download-mode selection and tracking endpoint extraction. |
| cli/azd/extensions/azure.ai.training/internal/cmd/job_connect_ssh_test.go | Unit tests for SSH proxy endpoint resolution/pattern validation. |
| cli/azd/extensions/azure.ai.training/internal/cmd/init_template_test.go | Unit tests for init scaffolding helpers (URL detection, copy helpers). |
| .github/CODEOWNERS | Adds code owner coverage for the new extension path. |
Comment on lines
+75
to
+80
| if existing != nil { | ||
| // Version record exists — check the sentinel tag to verify upload completed | ||
| storedHash, hasTag := existing.Tags["contentHash"] | ||
|
|
||
| if !hasTag || storedHash == "" { | ||
| // Zombie: POST created the version but azcopy/PATCH never completed. |
Comment on lines
+183
to
+200
| // TestSafeJoin_OSAbsolutePath documents that an OS-absolute path supplied as | ||
| // relPath is *not* treated as an escape: filepath.Join strips the leading | ||
| // separator / drive letter on both POSIX and Windows, so the result is safely | ||
| // re-rooted under destDir. We assert the resolved path stays inside dest. | ||
| func TestSafeJoin_OSAbsolutePath(t *testing.T) { | ||
| dest := t.TempDir() | ||
| other := t.TempDir() // a different absolute path | ||
| abs, err := filepath.Abs(other) | ||
| require.NoError(t, err) | ||
|
|
||
| got, err := safeJoin(dest, abs) | ||
| require.NoError(t, err) | ||
| absDest, _ := filepath.Abs(dest) | ||
| absGot, _ := filepath.Abs(got) | ||
| assert.True(t, | ||
| strings.HasPrefix(absGot, absDest+string(filepath.Separator)) || absGot == absDest, | ||
| "expected %q to be re-rooted under %q", absGot, absDest, | ||
| ) |
Comment on lines
+72
to
+83
| // GenerateJobName generates a human-friendly job name in the format {adjective}_{noun}_{suffix}. | ||
| // This matches the Azure ML SDK name generation pattern. | ||
| func GenerateJobName() string { | ||
| adj := allowedAdjectives[rand.Intn(len(allowedAdjectives))] | ||
| noun := allowedNouns[rand.Intn(len(allowedNouns))] | ||
|
|
||
| suffix := make([]byte, suffixLength) | ||
| for i := range suffix { | ||
| suffix[i] = allowedChars[rand.Intn(len(allowedChars))] | ||
| } | ||
|
|
||
| return strings.Join([]string{adj, noun, string(suffix)}, "_") |
| @@ -0,0 +1,92 @@ | |||
| module azure.ai.training | |||
|
|
|||
| go 1.25 | |||
| github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.1 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/cognitiveservices/armcognitiveservices v1.8.0 | ||
| github.com/azure/azure-dev/cli/azd v0.0.0-20260122173819-89795b295491 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.