Skip to content

ls-apis should use package-manifest.toml to figure out which version of related repos to use#10220

Open
davepacheco wants to merge 10 commits into
mainfrom
dap/ls-apis-slightly-less-brittle
Open

ls-apis should use package-manifest.toml to figure out which version of related repos to use#10220
davepacheco wants to merge 10 commits into
mainfrom
dap/ls-apis-slightly-less-brittle

Conversation

@davepacheco
Copy link
Copy Markdown
Collaborator

@davepacheco davepacheco commented Apr 3, 2026

(depends on #10217)

This change causes ls-apis to parse package-manifest.toml to figure out what commits of related repos (like Crucible, Dendrite, etc.) will actually be deployed (from the current Omicron workspace). It then uses this information to choose the correct clone of the repo to use for its analysis.

One other change I made here was to tie lldpd-client and its protocol package to the version that's deployed in package-manifest.toml. This ought to fix #10361. (A previous version of this PR updated package-manifest.toml instead, but I opted for the smaller change here.)

Background

ls-apis needs access to checked-out repos for Omicron as well as related components like Dendrite, LLDP, Crucible, Propolis, etc. It wants the versions of these repos that get deployed on real systems (based on the Omicron workspace that it's running in), since the goal is to analyze the runtime API dependencies between these components. It could create its own clones of these repos, but instead, it leverages the fact that just running cargo metadata in Omicron requires having downloaded copies of all of these repos already. How does ls-apis find these copies? It uses Cargo to locate a package that's known to be in that repo. Generally, it picks the package of a client that Omicron already from that repo, like dpd-client to find Dendrite.

But it's not quite so simple: Omicron can reference multiple versions of a given repo. More specifically: Omicron may reference dpd-client from multiple versions of Dendrite. This happens with dpd-client specifically:

$ cargo tree -e normal -i dpd-client
error: There are multiple `dpd-client` packages in your project, and the specification `dpd-client` is ambiguous.
Please re-run this command with one of the following specifications:
  git+https://github.com/oxidecomputer/dendrite?branch=main#dpd-client@0.1.0
  git+https://github.com/oxidecomputer/dendrite?rev=44a949c9bedf4fcd4d280337fa1965b4293c88d1#dpd-client@0.1.0
  git+https://github.com/oxidecomputer/dendrite?rev=cc8e02a0800034c431c8cf96b889ea638da3d194#dpd-client@0.1.0
$ cargo tree -e normal -i git+https://github.com/oxidecomputer/dendrite?branch=main#dpd-client@0.1.0
dpd-client v0.1.0 (https://github.com/oxidecomputer/dendrite?branch=main#f20f786e)
└── lldpd-common v0.1.0 (https://github.com/oxidecomputer/lldp?rev=c3305fd1a7ea7aba31f3834757a6b931e4f59fe6#c3305fd1)
    └── lldpd-client v0.1.0 (https://github.com/oxidecomputer/lldp?rev=c3305fd1a7ea7aba31f3834757a6b931e4f59fe6#c3305fd1)
        └── omicron-nexus v0.1.0 (/home/dap/omicron-review/nexus)
            └── omicron-dev v0.1.0 (/home/dap/omicron-review/dev-tools/omicron-dev)
$ cargo tree -e normal -i git+https://github.com/oxidecomputer/dendrite?rev=44a949c9bedf4fcd4d280337fa1965b4293c88d1#dpd-client@0.1.0
dpd-client v0.1.0 (https://github.com/oxidecomputer/dendrite?rev=44a949c9bedf4fcd4d280337fa1965b4293c88d1#44a949c9)
├── nexus-test-utils v0.1.0 (/home/dap/omicron-review/nexus/test-utils)
│   └── omicron-dev v0.1.0 (/home/dap/omicron-review/dev-tools/omicron-dev)
├── omicron-nexus v0.1.0 (/home/dap/omicron-review/nexus)
│   └── omicron-dev v0.1.0 (/home/dap/omicron-review/dev-tools/omicron-dev)
├── wicket-common v0.1.0 (/home/dap/omicron-review/wicket-common)
│   ├── wicket v0.1.0 (/home/dap/omicron-review/wicket)
│   │   └── wicket-dbg v0.1.0 (/home/dap/omicron-review/wicket-dbg)
│   ├── wicketd v0.1.0 (/home/dap/omicron-review/wicketd)
│   ├── wicketd-api v0.1.0 (/home/dap/omicron-review/wicketd-api)
│   │   ├── omicron-dropshot-apis v0.1.0 (/home/dap/omicron-review/dev-tools/dropshot-apis)
│   │   └── wicketd v0.1.0 (/home/dap/omicron-review/wicketd)
│   └── wicketd-client v0.1.0 (/home/dap/omicron-review/clients/wicketd-client)
│       ├── wicket v0.1.0 (/home/dap/omicron-review/wicket) (*)
│       └── wicketd v0.1.0 (/home/dap/omicron-review/wicketd)
└── wicketd v0.1.0 (/home/dap/omicron-review/wicketd)
$ cargo tree -e normal -i git+https://github.com/oxidecomputer/dendrite?rev=cc8e02a0800034c431c8cf96b889ea638da3d194#dpd-client@0.1.0
dpd-client v0.1.0 (https://github.com/oxidecomputer/dendrite?rev=cc8e02a0800034c431c8cf96b889ea638da3d194#cc8e02a0)
└── omicron-sled-agent v0.1.0 (/home/dap/omicron-review/sled-agent)
    ├── end-to-end-tests v0.1.0 (/home/dap/omicron-review/end-to-end-tests)
    └── nexus-test-utils v0.1.0 (/home/dap/omicron-review/nexus/test-utils)
        └── omicron-dev v0.1.0 (/home/dap/omicron-review/dev-tools/omicron-dev)

This is almost certainly not great. But it shouldn't cause ls-apis to break. Right now if this happens, ls-apis picks one of these arbitrarily, which can cause it to analyze the wrong version of our software and draw wrong conclusions. This is the real cause of #10214.

Again: we want ls-apis to be looking at the version of these things that gets deployed. How can it know which one it is? The authoritative version is the one in package-manifest.toml. Hence the solution here: parse that file, find the commit being used there, and choose the version of the package that corresponds to that commit.

Other notes

This is still a little cheesy in a few ways:

  • to determine if it's the right commit, we do a string comparison on the "source", which is basically a git URL
  • it's still using this sort of goofy heuristic (find a known package referenced by Omicron but contained in the other repo) -- just to be able to re-use Cargo's clone of the repo. It's pretty nice to not have to manage a separate set of clones (and make sure they're correct, have no local changes, etc.), but it's kind of a weird assumption to make and it would break if we ever had a repo we cared about where Omicron doesn't reference one of its packages.

but I think it's a meaningful improvement.

One other note: this will break in the future if:

  • Omicron has no reference at all to a package in the other repo (the case above -- this would already have broken before)
  • Omicron has no reference to a package in the other repo at the same Git commit as the one in package-manifest.toml. This would be unlikely, since we usually move all of our deps forward at the same time. But there's a notable exception today: one of the dpd-client paths above is deliberately fixed to an old version for upgrade-related reasons. This works out fine though because there's another reference to dpd-client that is the right version.
  • package-manifest.toml references the same repo multiple times with different commits. Again, this seems unlikely, and the problem is deeper than it looks. That means we have two different versions of something deployed and ls-apis needs to analyze both?

@davepacheco davepacheco requested a review from sunshowers April 3, 2026 22:30
Arc::into_inner(omicron).expect("no more Omicron Arc references"),
);

// To load Dendrite, we need to look something up in Maghemite (loaded
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to have been totally superfluous after #7907, which added "dendrite" to the block above instead.

Comment thread Cargo.toml Outdated
Base automatically changed from dap/rm-falcon-runner to main April 3, 2026 23:09
@davepacheco davepacheco mentioned this pull request May 5, 2026
Comment on lines -741 to -752
[[intra_deployment_unit_only_edges]]
server = "lldpd"
client = "gateway-client"
note = """
lldpd defaults to localhost for gateway (main.rs:194), and the SMF start
script doesn't override it.
"""
permalinks = [
"https://github.com/oxidecomputer/lldp/blob/d22509dfdb051321b859e924948605115691b93c/lldpd/src/main.rs#L148-L154",
"https://github.com/oxidecomputer/lldp/blob/d22509dfdb051321b859e924948605115691b93c/lldpd/misc/svc-lldpd",
]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I added this block as part of the PR that introduced IDU-only metadata, specifically as a result of a merge:
#9707 (comment)

What I think happened here is that:

  • When I started working on ls-apis needs to detect cycles in dependency unit graph #9707, lldpd didn't depend on MGS.
  • While working on it, enable lldp to be aware of what switch it is managing lldp#41 landed in the lldp repo that added a dependency from lldpd on MGS. This happened around February 26.
  • There was no immediate impact on Omicron:
    • To this day, package-manifest.toml in Omicron points at an lldp commit from October, 2025.
    • lldpd-client is a different story. As described in pin lldp client #10361, Omicron's Cargo.toml only refers to lldpd-client coming from the lldp repo's main branch, without a specific commit. However, at this point, Cargo.lock remained pinned to an earlier commit.
  • Around March 2, pull in dendrite PR 220 #9898 landed, which updated Omicron's Cargo.lock so that lldpd-client now came from the lldp commit where lldpd has a dependency on MGS. package-manifest.toml was not updated. This is basically what introduced the API version mismatch that resulted in pin lldp client #10361.
  • When I sync'd up with that change in ls-apis needs to detect cycles in dependency unit graph #9707, I dug into this dependency, looked at lldpd main, and added this block to the API manifest. I didn't notice the mismatch within Omicron (which is an argument for this PR). I believe this block is actually correct -- it just doesn't belong on Omicron main yet. It will belong here once we update lldp in package-manifest.toml.

In summary: due to a combination of #10361 and the ls-apis bug that I'm fixing here, ls-apis prematurely picked up the lldp -> MGS dependency and I prematurely added this block. Fixing this bug, ls-apis no longer identifies this dependency, and the rule has to go because it's now superfluous.

Comment thread Cargo.toml
live-tests-macros = { path = "live-tests/macros" }
lldpd_client = { git = "https://github.com/oxidecomputer/lldp", package = "lldpd-client" }
lldp_protocol = { git = "https://github.com/oxidecomputer/lldp", package = "protocol" }
lldpd_client = { git = "https://github.com/oxidecomputer/lldp", rev = "61479b6922f9112fbe1e722414d2b8055212cb12", package = "lldpd-client" }
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically rolling back lldpd-client, but I believe it's correct. See #10361.

@davepacheco davepacheco marked this pull request as ready for review May 6, 2026 00:38
Copy link
Copy Markdown
Contributor

@sunshowers sunshowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this -- overall, looks great. Just have a few questions and comments.

Comment thread dev-tools/ls-apis/src/bin/ls-apis.rs
Comment on lines +357 to +360
fn find_repo_commit(
package_manifest: &omicron_zone_package::config::Config,
repo_name: &str,
) -> Result<String> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a newtype around a string for a commit hash?

Comment on lines +284 to +288
// This is cheesy, but it works okay for now and fails safely.
if source.repr.contains(expected_commit) {
found_pkg = Some(pkginfo);
break;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, guppy would be able to handle this reliably using ExternalSource, but I think this is okay for now. Thoughts on matching against a more precise ends_with("#<hash>"), though? Or maybe using rsplit_once('#')?

Comment on lines +290 to +303
eprintln!(
"warn: looking up {pkgid:?}: looking for git commit \
{expected_commit} (based on package-manifest.toml), found \
source {source:?}"
);
eprintln!(
"If another version of package {pkgname:?} is found corresponding \
with this commit, then it may be suspicious to have multiple version \
of this package, but it will not break this tool."
);
eprintln!(
"If not, there's a mismatch between commits in package-manifest.toml \
and Cargo.toml or there is a bug in this tool."
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances would this be hit? I'm specifically wondering if you're going to hit this if you have a [patch] section in your local copy of the workspace Cargo.toml.

I was a little bit concerned the eprintln!s are nondeterministic depending on iteration order, but it looks like we consistently use BTreeMaps internally so that isn't an issue. But I'm still concerned that we're dependent on lexicographic order of keys here, so that if the matching key is first we don't print out this warning, while if the matching key comes later, we do. Maybe this isn't a huge deal though, if the cases where we'll hit this are rare? What do you think?

Comment on lines +277 to +282
let Some(source) = &pkginfo.source else {
eprintln!(
"warn: looking up {pkgid:?}: unexpectedly found source `None`"
);
continue;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, going by this reference, that source: None means either a path dependency or a workspace package. But that does mean that (I believe) we'll run into this if you [patch] one of the git dependencies here with a path dependency. Definitely worth testing, at least.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we want to happen in that case?

Copy link
Copy Markdown
Contributor

@sunshowers sunshowers May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure tbh -- but I assume people do patch in some dependencies while working locally. Probably worth doing something reasonable there (or failing if that isn't possible).

Comment on lines 109 to 120
// In order to assemble this metadata, Cargo already has a clone of most
// of the other workspaces that we care about. We'll use those clones
// rather than manage our own.
//
// To find each of these other repos, we'll need to look up a package
// that comes from each of these workspaces and look at where its local
// manifest file is.
//
// Loading each workspace involves running `cargo metadata`, which is
// pretty I/O intensive. Latency benefits significantly from
// parallelizing, though we have to respect the dependencies. We can't
// look up a package in "maghemite" before we've loaded Maghemite.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment need to be updated? I think "we have to respect the dependencies" is slightly out of date now thanks to the "To load Dendrite, we need to look something up in Maghemite" block you removed below.

Comment on lines +124 to +136
let handles: Vec<_> = RELATED_REPOS
.iter()
.map(|repo_config| {
let RelatedRepoConfig {
repo_name,
expected_pkg_name,
extra_cargo_features,
} = repo_config;
let mine = omicron.clone();
let my_ignored = ignored_non_clients.clone();
// unwrap(): we loaded a commit for each repo in the loop above
let expected_commit =
(*related_repo_commits.get(repo_name).unwrap()).clone();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: related_repo_commits could also store the &RelatedRepoConfig, and then you could iterate on that rather than doing a map lookup.

Comment on lines +260 to +262
// It's possible to have more than one non-workspace package with a given
// name. For example, Omicron references `dpd-client` in multiple ways:
// from Nexus and through lldpd-client. So which version do we want? Well,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the example here to be a little confusing. I think what this is trying to say is that dpd-client can get pulled in in multiple different ways, through multiple different git dependencies (correct me if I'm wrong?)

);
eprintln!(
"If another version of package {pkgname:?} is found corresponding \
with this commit, then it may be suspicious to have multiple version \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
with this commit, then it may be suspicious to have multiple version \
with this commit, then it may be suspicious to have multiple versions \

Comment on lines +47 to +49
extra_cargo_features: Some(CargoOpt::SomeFeatures(vec![
String::from("omicron-build"),
])),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: if you switch this to be a &'static [&'static str] you could avoid LazyLock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pin lldp client

2 participants