[WIP]OCPBUGS-87249: on AWS, select associated IPv6 CIDR block for egress IP subnet#228
[WIP]OCPBUGS-87249: on AWS, select associated IPv6 CIDR block for egress IP subnet#228jechen0648 wants to merge 1 commit into
Conversation
…ess IP subnet On dualstack AWS clusters, a subnet's Ipv6CidrBlockAssociationSet can contain multiple entries when an IPv6 CIDR block has been replaced (e.g., a stale 'disassociated' entry followed by the current 'associated' one). The previous code unconditionally picked [0], which could return the wrong CIDR or no CIDR at all, causing the egress-ipconfig annotation to lack the IPv6 subnet. OVN-Kubernetes would then have no node eligible to host the IPv6 egress IP. Fix by iterating through all associations and selecting the first one in 'associated' state. Signed-off-by: Jean Chen <jechen@redhat.com>
|
@jechen0648: This pull request references Jira Issue OCPBUGS-87249, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedIgnore keyword(s) in the title. ⛔ Ignored keywords (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jechen0648 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@jechen0648: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
On dualstack IPv4-primary AWS clusters, egressIPv6 addresses are never assigned to egress nodes even though the nodes report non-zero IPv6 capacity in their cloud.network.openshift.io/egress-ipconfig annotation.
The root cause is in getSubnet (pkg/cloudprovider/aws.go). AWS represents an IPv6 CIDR block association on a subnet through Ipv6CidrBlockAssociationSet, and each entry carries a state (associating, associated, disassociated, etc.). A subnet can accumulate multiple entries over its lifetime — for example, after an IPv6 CIDR block is replaced, the old entry stays in the list as disassociated ahead of the new associated entry.
The previous code unconditionally picked Ipv6CidrBlockAssociationSet[0] without checking state, if the first entry is stale (disassociated), one of two things goes wrong:
Wrong CIDR returned — the old, now-disassociated IPv6 CIDR is used as the node's IPv6 subnet in the annotation. OVN-Kubernetes stores this and then cannot find a node whose subnet contains the egress IPv6 address (which is from the current, correct CIDR), so it never assigns the EgressIP.
No CIDR returned — if the first entry has an empty Ipv6CidrBlock, the condition fails and v6Subnet stays nil. The annotation lacks an IPv6 subnet entirely and OVN-Kubernetes concludes the node has no IPv6 egress capability.
In both cases, no CloudPrivateIPConfig is ever created for the IPv6 EgressIP, so CNCC is never asked to assign it.
Fix
Iterate through all entries in Ipv6CidrBlockAssociationSet and select the first one whose state is associated.
Two minor cleanups are included in the same change:
The empty-subnet guard is tightened from len > 1 to len != 1 so that a DescribeSubnets response returning zero subnets is also caught.
The IPv4 parsing block is cleaned up to avoid shadowing the outer subnet variable (_, v4Net, err instead of _, subnet, err).
Testing
Related
A companion fix in openshift/ovn-kubernetes ensures OVN-Kubernetes re-evaluates unassigned EgressIPs when the cloud.network.openshift.io/egress-ipconfig annotation changes, so that any cluster where the annotation was previously incorrect (e.g., missing IPv6 subnet) gets healed automatically once CNCC updates it.