Update quay ARO Azure e2e tests to quay 3.17 on OCP 4.21#80771
Update quay ARO Azure e2e tests to quay 3.17 on OCP 4.21#80771LiZhang19817 wants to merge 4 commits into
Conversation
Replace quay 3.12/3.13/3.14 ARO periodic jobs with a single quay 3.17 job targeting OCP 4.21, using FBC catalog source. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WalkthroughThe PR updates Quay ARO OCP test infrastructure to target OCP 4.21, replaces OCP 4.19 cron jobs with new Quay 3.17/3.18 test jobs on the Azure QE profile, improves QBO test script error handling and environment discovery logic, and makes Azure SAS token expiration dynamic via Terraform functions. ChangesQuay ARO OCP Test Infrastructure Update
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Replace hardcoded expired Azure SAS token dates (2024-07-25) with dynamic timestamp()/timeadd() calls, and add CSO_SOURCE/QBO_SOURCE env vars set to fbc-operator-catalog for the quay 3.17 ARO job. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LiZhang19817 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/quay-tests/deploy-quay-azure/quay-tests-deploy-quay-azure-commands.sh`:
- Around line 81-82: The start variable in the SAS token configuration is set
directly to timestamp() without any time buffer, which can cause clock skew
failures when the token is immediately used. Modify the start assignment to use
the timeadd() function to backdate the start time by 15 minutes, following the
same pattern used for the expiry calculation. This will provide a safety margin
to prevent "not yet valid" authentication errors due to clock skew between the
Terraform provider and Azure Storage service.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 299ff082-8dca-4196-bce8-33c6b3297904
📒 Files selected for processing (2)
ci-operator/config/quay/quay-tests/quay-quay-tests-master__quay-aro-ocp.yamlci-operator/step-registry/quay-tests/deploy-quay-azure/quay-tests-deploy-quay-azure-commands.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- ci-operator/config/quay/quay-tests/quay-quay-tests-master__quay-aro-ocp.yaml
| start = timestamp() | ||
| expiry = timeadd(timestamp(), "8760h") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Azure Storage SAS best practice: should SAS start time be set in the past to avoid clock skew (for example 10-15 minutes)?
💡 Result:
Yes, it is a recommended best practice to set the SAS start time in the past to account for clock skew [1][2]. Because different machines (the client generating the token and the Azure service validating it) may have slightly different current times, setting a SAS start time to the exact current time can lead to intermittent authentication failures (typically 403 errors) during the first few minutes of the token's life [1][3][4]. Official Microsoft guidance recommends two main approaches to avoid this: 1. Set the start time to be at least 15 minutes in the past [1][2]. This creates a buffer that ensures the token is treated as valid immediately, even if there is a discrepancy between clocks [1][5]. 2. Omit the start time entirely [1][6][2]. When the start time is not specified, Azure Storage defaults the start time to the current time, making the SAS valid immediately in all cases without requiring a manual buffer [1][6][4]. The same principle applies to expiration times; you should be aware that clock skew can cause a token to be rejected if it appears expired due to time differences between systems [1][7][3]. Ensuring that systems generating tokens have their clocks synchronized (e.g., via NTP) is also a fundamental best practice for preventing these issues [8].
Citations:
- 1: https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview
- 2: https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/storage/common/storage-sas-overview.md
- 3: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-sas
- 4: https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/event-hubs/authorize-access-shared-access-signature.md
- 5: http://www.allenconway.net/2023/11/dealing-with-time-skew-and-sas-azure.html
- 6: https://learn.microsoft.com/en-us/rest/api/storageservices/create-service-sas
- 7: https://github.com/MicrosoftDocs/SupportArticles-docs/blob/main/support/azure/azure-storage/blobs/authentication/storage-troubleshoot-403-errors.md
- 8: https://www.webfactoryltd.com/blog/azure-blob-sas-tokens-expiring-instantly-with-403-authenticationfailed-and-the-clock-drift-token-scope-fix-that-restored-uploads/
🏁 Script executed:
cat -n ci-operator/step-registry/quay-tests/deploy-quay-azure/quay-tests-deploy-quay-azure-commands.sh | sed -n '75,200p'Repository: openshift/release
Length of output: 4104
Backdate SAS token start time to prevent clock skew failures.
Lines 81–82 set start = timestamp() without any buffer, but the SAS token is used immediately afterward (line 194). This can cause intermittent "not yet valid" authentication failures due to clock skew between the Terraform provider and Azure Storage service. Per Microsoft best practices, backdate the start time by 15 minutes to stabilize provisioning.
Suggested patch
- start = timestamp()
- expiry = timeadd(timestamp(), "8760h")
+ start = timeadd(timestamp(), "-15m")
+ expiry = timeadd(timestamp(), "8760h")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| start = timestamp() | |
| expiry = timeadd(timestamp(), "8760h") | |
| start = timeadd(timestamp(), "-15m") | |
| expiry = timeadd(timestamp(), "8760h") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/quay-tests/deploy-quay-azure/quay-tests-deploy-quay-azure-commands.sh`
around lines 81 - 82, The start variable in the SAS token configuration is set
directly to timestamp() without any time buffer, which can cause clock skew
failures when the token is immediately used. Modify the start assignment to use
the timeadd() function to backdate the start time by 15 minutes, following the
same pattern used for the expiry calculation. This will provide a safety margin
to prevent "not yet valid" authentication errors due to clock skew between the
Terraform provider and Azure Storage service.
Re-sort env vars alphabetically after make update. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/pj-rehearse periodic-ci-quay-quay-tests-master-quay-aro-ocp-quay-e2e-tests-quay317-aro-ocp421 |
|
@LiZhang19817: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
- Replace pipe chains with jsonpath queries in qbo-qe-test to fix exit 141 (SIGPIPE) from head closing pipes under set -o pipefail - Wrap QBO test body in subshell so failures don't block the subsequent CSO test step - Update ARO job cron schedules to yearly and add quay 3.18 job Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[REHEARSALNOTIFIER]
A total of 44 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-quay-quay-tests-master-quay-aro-ocp-quay-e2e-tests-quay318-aro |
|
@LiZhang19817: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ci-operator/step-registry/quay-tests/qbo-qe-test/quay-tests-qbo-qe-test-commands.sh (1)
46-48: 💤 Low valueConsider validating resource existence before using jsonpath results.
The jsonpath queries return empty strings if no resources match, rather than failing. If
quay_nsis empty, subsequent commands will fail with potentially confusing errors (e.g.,-n ""on line 47).Since the deployment step should ensure Quay exists before this runs, this is a minor robustness concern.
🛡️ Optional: Add existence validation
-quay_ns=$(oc get quayregistry --all-namespaces -o jsonpath='{.items[0].metadata.namespace}') -quay_registry=$(oc get quayregistry -n "$quay_ns" -o jsonpath='{.items[0].metadata.name}') -quay_app_pod=$(oc -n "$quay_ns" get pods -l quay-component=quay-app -o jsonpath='{.items[0].metadata.name}') +quay_ns=$(oc get quayregistry --all-namespaces -o jsonpath='{.items[0].metadata.namespace}') +if [[ -z "$quay_ns" ]]; then + echo "ERROR: No QuayRegistry found in cluster" + exit 1 +fi +quay_registry=$(oc get quayregistry -n "$quay_ns" -o jsonpath='{.items[0].metadata.name}') +quay_app_pod=$(oc -n "$quay_ns" get pods -l quay-component=quay-app -o jsonpath='{.items[0].metadata.name}') +if [[ -z "$quay_app_pod" ]]; then + echo "ERROR: No quay-app pod found in namespace $quay_ns" + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/quay-tests/qbo-qe-test/quay-tests-qbo-qe-test-commands.sh` around lines 46 - 48, Add validation checks after each of the three jsonpath queries to ensure the results are not empty before proceeding. After the oc commands that set quay_ns, quay_registry, and quay_app_pod variables, add conditional checks (using if statements with -z test) that verify each variable contains a value, and if not, log a descriptive error message and exit the script. This prevents confusing errors downstream when empty values are passed to subsequent commands using these variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/quay-tests/qbo-qe-test/quay-tests-qbo-qe-test-commands.sh`:
- Around line 526-532: The script forces exit 0 after a QBO test failure, which
masks the failure from job dashboards and logs. To address this while still
allowing subsequent steps to run, modify the code block where QBO_RC is checked
to write a failure indicator to the shared directory (${SHARED_DIR}) when QBO_RC
is non-zero, so downstream steps or workflow aggregation can detect and properly
report the failure. This way the failure is prominently recorded for failure
visibility without blocking subsequent test steps from executing.
---
Nitpick comments:
In
`@ci-operator/step-registry/quay-tests/qbo-qe-test/quay-tests-qbo-qe-test-commands.sh`:
- Around line 46-48: Add validation checks after each of the three jsonpath
queries to ensure the results are not empty before proceeding. After the oc
commands that set quay_ns, quay_registry, and quay_app_pod variables, add
conditional checks (using if statements with -z test) that verify each variable
contains a value, and if not, log a descriptive error message and exit the
script. This prevents confusing errors downstream when empty values are passed
to subsequent commands using these variables.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 05da284d-6036-41fc-b4ea-70c842b47fd3
⛔ Files ignored due to path filters (1)
ci-operator/jobs/quay/quay-tests/quay-quay-tests-master-periodics.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (2)
ci-operator/config/quay/quay-tests/quay-quay-tests-master__quay-aro-ocp.yamlci-operator/step-registry/quay-tests/qbo-qe-test/quay-tests-qbo-qe-test-commands.sh
| ) || QBO_RC=$? | ||
|
|
||
| if [ $QBO_RC -ne 0 ]; then | ||
| echo "!!! QBO test failed with exit code $QBO_RC, but continuing to allow subsequent steps to run" | ||
| fi | ||
|
|
||
| exit 0 |
There was a problem hiding this comment.
Test failures are masked - consider impact on failure visibility.
The forced exit 0 allows the CSO test step to run, but means QBO test failures won't fail the overall job. The failure is logged, but:
- In lengthy CI logs, the message may be missed
- The ARO workflow (per context) lacks firewatch configuration that could parse logs for failures
- Job dashboards will show this step as "passed" even when tests failed
If this trade-off is acceptable for this use case, consider adding a more prominent failure indicator (e.g., writing failure status to ${SHARED_DIR} for downstream consumption or aggregation).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/quay-tests/qbo-qe-test/quay-tests-qbo-qe-test-commands.sh`
around lines 526 - 532, The script forces exit 0 after a QBO test failure, which
masks the failure from job dashboards and logs. To address this while still
allowing subsequent steps to run, modify the code block where QBO_RC is checked
to write a failure indicator to the shared directory (${SHARED_DIR}) when QBO_RC
is non-zero, so downstream steps or workflow aggregation can detect and properly
report the failure. This way the failure is prominently recorded for failure
visibility without blocking subsequent test steps from executing.
|
@LiZhang19817: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Test plan
make update🤖 Generated with Claude Code
Summary by CodeRabbit
This PR updates OpenShift CI’s Quay ARO periodic e2e testing configuration in
openshift/releaseto drop outdated Quay-on-ARO coverage and standardize the environment on OCP 4.21 with FBC (file-based catalog) catalog source.ARO periodic jobs refreshed (Quay 3.12/3.13/3.14 → Quay 3.17/3.18):
quay312/313/314variants targeting OCP 4.19).quay-e2e-tests-quay317-aro-ocp421quay-e2e-tests-quay318-aro-ocp421azure-qecluster profile and are scheduled yearly viacron: 0 12 1 1 *(Jan 1 at 12:00).FBC / catalog-source wiring for Quay tests:
fbc-operator-catalog:CSO_SOURCE,QBO_SOURCE,QUAY_OPERATOR_SOURCEref: quay-tests-enable-quay-catalogsourcetest step so the Quay catalog source is enabled as part of the job.OCP 4.21 alignment:
base_imagestags forcliandupi-installer→4.21build_root.image_stream_tag.tag→ corresponding 4.21 release tagreleases.latest.candidate.version→"4.21"Azure deployment step hardening (SAS validity window):
start = timestamp()expiry = timeadd(timestamp(), "8760h")QBO QE test script reliability fixes:
pipefail, initializingQBO_RC, running the main QBO body in a subshell, capturing its exit code, and continuing subsequent steps even if QBO fails.jsonpathselectors instead of parsingocoutput.Generation + verification:
make updateand confirming the generated periodic job(s) match the Quay 3.17 / OCP 4.21 / FBC setup.periodic-ci-quay-quay-tests-master-quay-aro-ocp-quay-e2e-tests-quay317-aro-ocp421main.