Skip to content

Add per-job hf_token/hf_org override for fine-tuning jobs#61

Open
timf34 wants to merge 2 commits into
mainfrom
tim/hf-override-per-job
Open

Add per-job hf_token/hf_org override for fine-tuning jobs#61
timf34 wants to merge 2 commits into
mainfrom
tim/hf-override-per-job

Conversation

@timf34
Copy link
Copy Markdown
Collaborator

@timf34 timf34 commented Apr 23, 2026

TLDR; enables you to easily override with your own personal HF TOKEN for uploading

Claude's description:
Fine-tuning jobs previously read HF_TOKEN from the worker pod's env and used the org-level hf_org for the {org_id} slot in finetuned_model_id. In shared OpenWeights orgs this meant one user could not route their uploads to their own HF namespace without changing org-wide secrets (affecting other users' in-flight jobs).

Add optional hf_token and hf_org fields to TrainingConfig. When set:

  • The worker's push_model uses cfg.hf_token for all four upload calls (push_to_hub_merged, push_to_hub, tokenizer.push_to_hub, HfApi).
  • The client's FineTuning.create uses hf_org for the {org_id} template slot.

The base model download still uses the pod-env HF_TOKEN, so gated-model access (Llama, etc.) keeps working. Defaults are unchanged — jobs without the override behave exactly as before, so existing users' flows are unaffected.

compute_id now shallow-copies validated_params before filtering. The existing filter mutates its input to exclude default-valued fields from the content hash; without the copy, popping hf_token/hf_org for the hash also stripped them from the stored job params (so the worker would never see the override).

Fine-tuning jobs previously read HF_TOKEN from the worker pod's env and used
the org-level hf_org for the {org_id} slot in finetuned_model_id. In shared
OpenWeights orgs this meant one user could not route their uploads to their
own HF namespace without changing org-wide secrets (affecting other users'
in-flight jobs).

Add optional hf_token and hf_org fields to TrainingConfig. When set:
- The worker's push_model uses cfg.hf_token for all four upload calls
  (push_to_hub_merged, push_to_hub, tokenizer.push_to_hub, HfApi).
- The client's FineTuning.create uses hf_org for the {org_id} template slot.

The base model download still uses the pod-env HF_TOKEN, so gated-model
access (Llama, etc.) keeps working. Defaults are unchanged — jobs without
the override behave exactly as before, so existing users' flows are
unaffected.

compute_id now shallow-copies validated_params before filtering. The existing
filter mutates its input to exclude default-valued fields from the content
hash; without the copy, popping hf_token/hf_org for the hash also stripped
them from the stored job params (so the worker would never see the
override).
@timf34
Copy link
Copy Markdown
Collaborator Author

timf34 commented Apr 23, 2026

Probably not something to merge into main but for a quick workaround... though it could be good to include similar functionality as a flag

End-to-end pattern for N-job sweeps via the SDK: dataset x hyperparam
matrices, idempotent submission via content-hashed job IDs, persisted
manifests, dry-run validation, and downstream inference + download.
Linked from cookbook/README.md alongside the custom-job entry.

uv.lock refreshed to current resolver state.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant