OCPBUGS-87841: gather: redact pull-secret auth tokens from machineconfigs.json#10660
Conversation
The bootstrap gather collects machineconfigs.json via `oc get machineconfigs -o json`, which includes MachineConfig resources containing Ignition file definitions for /var/lib/kubelet/config.json. The pull-secret JSON is URL-encoded inside data: URIs and contains real base64 registry auth credentials and Kubernetes service account JWTs. Add a sed post-processing step after resource collection to replace all URL-encoded "auth" values with REDACTED before the log bundle is assembled.
|
@patrickdillon: This pull request references Jira Issue OCPBUGS-87841, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThe bootstrap gather script now post-processes ChangesBootstrap artifact redaction
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@patrickdillon: This pull request references Jira Issue OCPBUGS-87841, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@data/data/bootstrap/files/usr/local/bin/bootstrap-cluster-gather.sh`:
- Around line 77-78: The redaction pattern in bootstrap-cluster-gather.sh only
matches uppercase percent-escapes, so lowercase-encoded auth values can bypass
the filter. Update the sed expression in the redaction block to treat
percent-encoding case-insensitively so both uppercase and lowercase variants of
the auth field are replaced with REDACTED, using the existing redaction step in
the script as the location to adjust.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5b260459-1c32-47bf-9baa-220a69a34013
📒 Files selected for processing (1)
data/data/bootstrap/files/usr/local/bin/bootstrap-cluster-gather.sh
| sed -E -i \ | ||
| 's/%22auth%22%3A%22([A-Za-z0-9_-]|%3D|%2B|%2F)*%22/%22auth%22%3A%22REDACTED%22/g' \ |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Handle lowercase percent-escapes in the redaction regex.
Line 78 only matches uppercase %2B/%2F/%3D and %3A. Percent-encoding is case-insensitive, so lowercase-escaped auth values will bypass this filter and land unredacted in the final log bundle.
Suggested fix
- sed -E -i \
- 's/%22auth%22%3A%22([A-Za-z0-9_-]|%3D|%2B|%2F)*%22/%22auth%22%3A%22REDACTED%22/g' \
+ sed -E -i \
+ 's/%22auth%22%3[Aa]%22([A-Za-z0-9_-]|%3[Dd]|%2[Bb]|%2[Ff])*%22/%22auth%22%3A%22REDACTED%22/g' \
"${ARTIFACTS_TEMP}/resources/machineconfigs.json"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sed -E -i \ | |
| 's/%22auth%22%3A%22([A-Za-z0-9_-]|%3D|%2B|%2F)*%22/%22auth%22%3A%22REDACTED%22/g' \ | |
| sed -E -i \ | |
| 's/%22auth%22%3[Aa]%22([A-Za-z0-9_-]|%3[Dd]|%2[Bb]|%2[Ff])*%22/%22auth%22%3A%22REDACTED%22/g' \ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@data/data/bootstrap/files/usr/local/bin/bootstrap-cluster-gather.sh` around
lines 77 - 78, The redaction pattern in bootstrap-cluster-gather.sh only matches
uppercase percent-escapes, so lowercase-encoded auth values can bypass the
filter. Update the sed expression in the redaction block to treat
percent-encoding case-insensitively so both uppercase and lowercase variants of
the auth field are replaced with REDACTED, using the existing redaction step in
the script as the location to adjust.
There was a problem hiding this comment.
According to RFC 3986:
The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively. If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent. For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings.
So, using uppercase hexadecimal here is sufficient.
There was a problem hiding this comment.
@tthvo Good point on RFC 3986 equivalence, but I don’t think that makes uppercase-only matching sufficient here.
The redaction runs on the raw JSON text in resources/machineconfigs.json; it does not normalize percent-encoding first. So if a data: URI reaches this file with lowercase escapes, the current sed pattern will miss it and the secret will remain in the gather bundle.
For a security redaction path, I’d avoid depending on producers following the uppercase recommendation unless there’s a strict upstream guarantee that MachineConfig Ignition data: URIs are always canonicalized before being stored. I don’t see that guarantee in the code here, so I’d still recommend making the regex case-insensitive for the percent-escapes.
There was a problem hiding this comment.
hmm fair point, should does not mean must. Better cover them all, I guess.
There was a problem hiding this comment.
@tthvo Agreed — for a redaction path, matching both uppercase and lowercase escapes is the safer choice.
|
/retest-required |
|
/cc @zaneb |
tthvo
left a comment
There was a problem hiding this comment.
/approve
The log-bundle is no longer redacted 🤩
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tthvo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
@zaneb: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@patrickdillon: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@patrickdillon: Jira Issue OCPBUGS-87841: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-87841 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The bootstrap gather collects machineconfigs.json via
oc get machineconfigs -o json, which includes MachineConfig resources containing Ignition file definitions for /var/lib/kubelet/config.json. The pull-secret JSON is URL-encoded inside data: URIs and contains real base64 registry auth credentials and Kubernetes service account JWTs.Add a sed post-processing step after resource collection to replace all URL-encoded "auth" values with REDACTED before the log bundle is assembled.
Summary by CodeRabbit