Skip to content

feat(tier-1): foundation — feature flags, smart tags, KV/ACR security hardening#150

Open
pgabriel-01 wants to merge 3 commits into
Azure:mainfrom
pgabriel-01:feat/tier-1-foundation
Open

feat(tier-1): foundation — feature flags, smart tags, KV/ACR security hardening#150
pgabriel-01 wants to merge 3 commits into
Azure:mainfrom
pgabriel-01:feat/tier-1-foundation

Conversation

@pgabriel-01
Copy link
Copy Markdown
Contributor

Summary

Tier 1 of a phased enterprise-hardening initiative for the MLOps v2 project template. All changes are feature-flagged off by default; existing dev/test deployments behave identically. Opt-in via config-infra-{dev,prod}.yml.

Scope (3 commits)

1. Feature flags, smart tags, KV/ACR security hardening (ff5f6f5)

infrastructure/bicep/main.bicep

  • New params: enableMonitoring, enableContainerRegistry, enableComputeCluster, kvEnablePurgeProtection, kvSoftDeleteRetentionDays, tagCostCenter, tagManagedBy
  • Conditional modules; non-null assertion (appi!.outputs.X, cr!.outputs.X) avoids BCP318
  • Project tag now derived from prefix (was hard-coded)

infrastructure/bicep/modules/key_vault.bicep

  • enableRbacAuthorization: true (Azure recommendation; access policies deprecated)
  • Configurable soft-delete retention
  • Conditional purge protection (default on for prod, off for dev)
  • API bumped to @2024-04-01-preview

infrastructure/bicep/modules/container_registry.bicep

  • adminUserEnabled: false (security baseline)

infrastructure/bicep/modules/aml_workspace.bicep

  • Handles optional appinsightid / crid via ternaries for the conditional modules

infrastructure/bicep/pipelines/bicep-ado-deploy-infra.yml

  • Fixes wrong bicep template path (./infrastructure/main.bicep./infrastructure/bicep/main.bicep) in 3 places
  • Wires new feature flags + tag parameters

config-infra-{dev,prod}.yml — adds enable_container_registry: true

2. Model comparison settings for taxi regression (0f87407)

  • New classical/aml-cli-v2/mlops/azureml/train/model_settings.json with threshold/baseline configuration; consumed by the model evaluation step

3. Pipeline resilience — debug skip flags (cf6b931)

  • ADO deploy-model-training-pipeline.yml + GHA deploy-model-training-pipeline-classical.yml
  • Adds optional skip flags so individual training/eval/register steps can be bypassed during pipeline debugging without commenting out YAML

Diff

10 files, +154/-36. Bicep builds clean (no warnings, no BCP errors).

Compatibility

  • All new params default to current behavior; no breaking changes for existing deployments
  • Tested locally (Builds 9, 11, 14, 15, 16 on internal ADO org)

Checklist

  • Bicep az bicep build clean
  • Pipeline YAML linted
  • No breaking changes (feature-flagged)
  • Maintainer review

Adds optional-by-default feature flags so users can opt out of
Application Insights, Container Registry, or the AML Compute Cluster
without forking the bicep template. Strengthens Key Vault and ACR
defaults and supports cost/governance reporting via tags.

Bicep:
- enableMonitoring, enableContainerRegistry, enableComputeCluster flags
- Conditional module deployment (App Insights, ACR, compute cluster)
- aml_workspace.bicep handles optional appinsightid/crid via empty-string
  sentinels, passing null to the workspace properties when omitted
- Key Vault: enableRbacAuthorization, configurable softDelete/purgeProtection,
  conditional purge protection (cannot be unset once enabled)
- ACR: adminUserEnabled set to false (use AcrPull RBAC instead)
- Smart tags: CostCenter, ManagedBy params surfaced through tags object;
  Project tag now uses the project prefix instead of static 'mlops-v2'
- Fix path: LintBicepCode and deployment commands now correctly reference
  infrastructure/bicep/main.bicep (was infrastructure/main.bicep)

ADO pipeline (bicep-ado-deploy-infra.yml):
- Wire feature flags and tag params from config-infra-*.yml through
  validate and deploy stages

Configs:
- enable_container_registry variable added to config-infra-{dev,prod}.yml
  (other variables already present from prior modernization PRs)
- Create model_settings.json with weighted metric thresholds (RMSE, R2, Spearman)
- Champion/challenger pattern from enterprise AI Factory repo
- Supports promote_on_all_metrics toggle and minimum sample check
- ADO: Add skipEnvironmentRegistration, skipComputeCreation, skipDataRegistration parameters
- ADO: Wrap register-environment, create-compute, register-data with conditional execution
- GH: Add workflow_dispatch inputs for skip_environment/data/compute
- GH: Add if conditions on register-environment, register-dataset, create-compute jobs
- GH: Add cancelled/failure guard on run-model-training-pipeline for skipped deps
- Enterprise reference: 12 debug_disable_* flags pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant