ls-apis should use package-manifest.toml to figure out which version of related repos to use#10220
ls-apis should use package-manifest.toml to figure out which version of related repos to use#10220davepacheco wants to merge 10 commits into
Conversation
| Arc::into_inner(omicron).expect("no more Omicron Arc references"), | ||
| ); | ||
|
|
||
| // To load Dendrite, we need to look something up in Maghemite (loaded |
There was a problem hiding this comment.
This appears to have been totally superfluous after #7907, which added "dendrite" to the block above instead.
| [[intra_deployment_unit_only_edges]] | ||
| server = "lldpd" | ||
| client = "gateway-client" | ||
| note = """ | ||
| lldpd defaults to localhost for gateway (main.rs:194), and the SMF start | ||
| script doesn't override it. | ||
| """ | ||
| permalinks = [ | ||
| "https://github.com/oxidecomputer/lldp/blob/d22509dfdb051321b859e924948605115691b93c/lldpd/src/main.rs#L148-L154", | ||
| "https://github.com/oxidecomputer/lldp/blob/d22509dfdb051321b859e924948605115691b93c/lldpd/misc/svc-lldpd", | ||
| ] | ||
|
|
There was a problem hiding this comment.
Interestingly, I added this block as part of the PR that introduced IDU-only metadata, specifically as a result of a merge:
#9707 (comment)
What I think happened here is that:
- When I started working on ls-apis needs to detect cycles in dependency unit graph #9707, lldpd didn't depend on MGS.
- While working on it, enable lldp to be aware of what switch it is managing lldp#41 landed in the
lldprepo that added a dependency from lldpd on MGS. This happened around February 26. - There was no immediate impact on Omicron:
- To this day, package-manifest.toml in Omicron points at an lldp commit from October, 2025.
- lldpd-client is a different story. As described in pin lldp client #10361, Omicron's Cargo.toml only refers to lldpd-client coming from the lldp repo's
mainbranch, without a specific commit. However, at this point, Cargo.lock remained pinned to an earlier commit.
- Around March 2, pull in dendrite PR 220 #9898 landed, which updated Omicron's Cargo.lock so that
lldpd-clientnow came from thelldpcommit where lldpd has a dependency on MGS. package-manifest.toml was not updated. This is basically what introduced the API version mismatch that resulted in pin lldp client #10361. - When I sync'd up with that change in ls-apis needs to detect cycles in dependency unit graph #9707, I dug into this dependency, looked at lldpd
main, and added this block to the API manifest. I didn't notice the mismatch within Omicron (which is an argument for this PR). I believe this block is actually correct -- it just doesn't belong on Omicronmainyet. It will belong here once we update lldp in package-manifest.toml.
In summary: due to a combination of #10361 and the ls-apis bug that I'm fixing here, ls-apis prematurely picked up the lldp -> MGS dependency and I prematurely added this block. Fixing this bug, ls-apis no longer identifies this dependency, and the rule has to go because it's now superfluous.
| live-tests-macros = { path = "live-tests/macros" } | ||
| lldpd_client = { git = "https://github.com/oxidecomputer/lldp", package = "lldpd-client" } | ||
| lldp_protocol = { git = "https://github.com/oxidecomputer/lldp", package = "protocol" } | ||
| lldpd_client = { git = "https://github.com/oxidecomputer/lldp", rev = "61479b6922f9112fbe1e722414d2b8055212cb12", package = "lldpd-client" } |
There was a problem hiding this comment.
This is basically rolling back lldpd-client, but I believe it's correct. See #10361.
sunshowers
left a comment
There was a problem hiding this comment.
Thanks for doing this -- overall, looks great. Just have a few questions and comments.
| fn find_repo_commit( | ||
| package_manifest: &omicron_zone_package::config::Config, | ||
| repo_name: &str, | ||
| ) -> Result<String> { |
There was a problem hiding this comment.
Worth a newtype around a string for a commit hash?
| // This is cheesy, but it works okay for now and fails safely. | ||
| if source.repr.contains(expected_commit) { | ||
| found_pkg = Some(pkginfo); | ||
| break; | ||
| } |
There was a problem hiding this comment.
Heh, guppy would be able to handle this reliably using ExternalSource, but I think this is okay for now. Thoughts on matching against a more precise ends_with("#<hash>"), though? Or maybe using rsplit_once('#')?
| eprintln!( | ||
| "warn: looking up {pkgid:?}: looking for git commit \ | ||
| {expected_commit} (based on package-manifest.toml), found \ | ||
| source {source:?}" | ||
| ); | ||
| eprintln!( | ||
| "If another version of package {pkgname:?} is found corresponding \ | ||
| with this commit, then it may be suspicious to have multiple version \ | ||
| of this package, but it will not break this tool." | ||
| ); | ||
| eprintln!( | ||
| "If not, there's a mismatch between commits in package-manifest.toml \ | ||
| and Cargo.toml or there is a bug in this tool." | ||
| ); |
There was a problem hiding this comment.
Under what circumstances would this be hit? I'm specifically wondering if you're going to hit this if you have a [patch] section in your local copy of the workspace Cargo.toml.
I was a little bit concerned the eprintln!s are nondeterministic depending on iteration order, but it looks like we consistently use BTreeMaps internally so that isn't an issue. But I'm still concerned that we're dependent on lexicographic order of keys here, so that if the matching key is first we don't print out this warning, while if the matching key comes later, we do. Maybe this isn't a huge deal though, if the cases where we'll hit this are rare? What do you think?
| let Some(source) = &pkginfo.source else { | ||
| eprintln!( | ||
| "warn: looking up {pkgid:?}: unexpectedly found source `None`" | ||
| ); | ||
| continue; | ||
| }; |
There was a problem hiding this comment.
I believe, going by this reference, that source: None means either a path dependency or a workspace package. But that does mean that (I believe) we'll run into this if you [patch] one of the git dependencies here with a path dependency. Definitely worth testing, at least.
There was a problem hiding this comment.
What would we want to happen in that case?
There was a problem hiding this comment.
I'm not sure tbh -- but I assume people do patch in some dependencies while working locally. Probably worth doing something reasonable there (or failing if that isn't possible).
| // In order to assemble this metadata, Cargo already has a clone of most | ||
| // of the other workspaces that we care about. We'll use those clones | ||
| // rather than manage our own. | ||
| // | ||
| // To find each of these other repos, we'll need to look up a package | ||
| // that comes from each of these workspaces and look at where its local | ||
| // manifest file is. | ||
| // | ||
| // Loading each workspace involves running `cargo metadata`, which is | ||
| // pretty I/O intensive. Latency benefits significantly from | ||
| // parallelizing, though we have to respect the dependencies. We can't | ||
| // look up a package in "maghemite" before we've loaded Maghemite. |
There was a problem hiding this comment.
Does this comment need to be updated? I think "we have to respect the dependencies" is slightly out of date now thanks to the "To load Dendrite, we need to look something up in Maghemite" block you removed below.
| let handles: Vec<_> = RELATED_REPOS | ||
| .iter() | ||
| .map(|repo_config| { | ||
| let RelatedRepoConfig { | ||
| repo_name, | ||
| expected_pkg_name, | ||
| extra_cargo_features, | ||
| } = repo_config; | ||
| let mine = omicron.clone(); | ||
| let my_ignored = ignored_non_clients.clone(); | ||
| // unwrap(): we loaded a commit for each repo in the loop above | ||
| let expected_commit = | ||
| (*related_repo_commits.get(repo_name).unwrap()).clone(); |
There was a problem hiding this comment.
nit: related_repo_commits could also store the &RelatedRepoConfig, and then you could iterate on that rather than doing a map lookup.
| // It's possible to have more than one non-workspace package with a given | ||
| // name. For example, Omicron references `dpd-client` in multiple ways: | ||
| // from Nexus and through lldpd-client. So which version do we want? Well, |
There was a problem hiding this comment.
I found the example here to be a little confusing. I think what this is trying to say is that dpd-client can get pulled in in multiple different ways, through multiple different git dependencies (correct me if I'm wrong?)
| ); | ||
| eprintln!( | ||
| "If another version of package {pkgname:?} is found corresponding \ | ||
| with this commit, then it may be suspicious to have multiple version \ |
There was a problem hiding this comment.
nit:
| with this commit, then it may be suspicious to have multiple version \ | |
| with this commit, then it may be suspicious to have multiple versions \ |
| extra_cargo_features: Some(CargoOpt::SomeFeatures(vec![ | ||
| String::from("omicron-build"), | ||
| ])), |
There was a problem hiding this comment.
nit, take it or leave it: if you switch this to be a &'static [&'static str] you could avoid LazyLock
(depends on #10217)
This change causes
ls-apisto parsepackage-manifest.tomlto figure out what commits of related repos (like Crucible, Dendrite, etc.) will actually be deployed (from the current Omicron workspace). It then uses this information to choose the correct clone of the repo to use for its analysis.One other change I made here was to tie
lldpd-clientand itsprotocolpackage to the version that's deployed in package-manifest.toml. This ought to fix #10361. (A previous version of this PR updated package-manifest.toml instead, but I opted for the smaller change here.)Background
ls-apisneeds access to checked-out repos for Omicron as well as related components like Dendrite, LLDP, Crucible, Propolis, etc. It wants the versions of these repos that get deployed on real systems (based on the Omicron workspace that it's running in), since the goal is to analyze the runtime API dependencies between these components. It could create its own clones of these repos, but instead, it leverages the fact that just runningcargo metadatain Omicron requires having downloaded copies of all of these repos already. How doesls-apisfind these copies? It uses Cargo to locate a package that's known to be in that repo. Generally, it picks the package of a client that Omicron already from that repo, likedpd-clientto find Dendrite.But it's not quite so simple: Omicron can reference multiple versions of a given repo. More specifically: Omicron may reference
dpd-clientfrom multiple versions of Dendrite. This happens withdpd-clientspecifically:This is almost certainly not great. But it shouldn't cause
ls-apisto break. Right now if this happens,ls-apispicks one of these arbitrarily, which can cause it to analyze the wrong version of our software and draw wrong conclusions. This is the real cause of #10214.Again: we want
ls-apisto be looking at the version of these things that gets deployed. How can it know which one it is? The authoritative version is the one in package-manifest.toml. Hence the solution here: parse that file, find the commit being used there, and choose the version of the package that corresponds to that commit.Other notes
This is still a little cheesy in a few ways:
but I think it's a meaningful improvement.
One other note: this will break in the future if:
dpd-clientpaths above is deliberately fixed to an old version for upgrade-related reasons. This works out fine though because there's another reference todpd-clientthat is the right version.ls-apisneeds to analyze both?