model-meta

Go library that enumerates models served by a vLLM (or any OpenAI-compatible) endpoint and resolves rich metadata for each one from the HuggingFace Hub: features, license, lineage, weight quantization, and tags. Also resolves a single HuggingFace model id directly when you don't need an endpoint.

Install

go get github.com/algonode/model-meta

Requires Go 1.24+.

Quick start

import modelmeta "github.com/algonode/model-meta"

e := &modelmeta.Enumerator{
    EndpointURL: "http://localhost:8000", // any OpenAI-compatible /v1
    APIKey:      os.Getenv("VLLM_API_KEY"),
    HFToken:     os.Getenv("HF_TOKEN"),
}

// Enumerate everything the endpoint serves.
models, err := e.Enumerate(ctx)

// Or resolve one HuggingFace model without contacting the endpoint.
m, err := e.Resolve(ctx, "meta-llama/Meta-Llama-3-8B")

CLI

A resolve-model binary ships with the module — install with:

go install github.com/algonode/model-meta/resolve-model@latest

It mirrors both library modes:

resolve-model                                   # default localhost:8000
resolve-model http://host:8000                  # explicit vLLM endpoint
resolve-model -m meta-llama/Meta-Llama-3-8B     # single HF model, no endpoint
VLLM_API_KEY=... HF_TOKEN=... resolve-model

What you get back

Each Model aggregates one set of weights and every endpoint-exposed alias that resolves to it:

{
  "root": "meta-llama/Meta-Llama-3-8B-Instruct",
  "aliases": ["default", "llama3"],
  "max_model_len": 8192,
  "owned_by": "meta-llama",
  "features": {
    "text_generation": true,
    "tool_use": true,
    "quantization": "bf16",
    "architectures": ["LlamaForCausalLM"],
    "pipeline": "text-generation"
  },
  "lineage": ["meta-llama/Meta-Llama-3-8B"],
  "tags": {
    "huggingface": ["transformers", "tool-use", "license:llama3"],
    "compliance":  []
  },
  "license": { "id": "llama3" },
  "flags": {
    "compliant":   false,
    "huggingface": true,
    "lineage":     true,
    "quantized":   false
  }
}

How resolution works

Aggregation

/v1/models entries are grouped by their root field. Anything whose id differs from root becomes an alias. The result is sorted by Root.

Quantization

Detected with the following priority — the first source that fires wins:

quantization_config.quant_method from the API summary (NVFP4 recognized from either quant_method == "nvfp4" or a format containing "nvfp4").
Compressed-tensors refine. When step 1 yields "compressed-tensors" the API summary has truncated the real format. The library does a follow-up GET /{id}/resolve/main/config.json and inspects both the top-level quantization_config.format and per-group config_groups.*.format for an NVFP4 marker. Cached, with 404 short-circuit, so non-affected models pay no extra HTTP cost.
GGUF filename. If the repo is a GGUF repo (library_name, tag, or .gguf siblings), the canonical file is picked — id pin first (Foo-GGUF-Q5_K_M), then a llama.cpp preference order (Q4_K_M > Q5_K_M > Q5_K_S > …), then size, then lexicographic. Its tier is parsed from the filename. Multi-part shards (-NNNNN-of-NNNNN.gguf) are collapsed to part 1.
torch_dtype mapping: bfloat16 → bf16, float16/half → fp16, float32 → fp32, float8* → fp8, float4* → fp4.
Vendor suffix on the vLLM id: -AWQ, -GPTQ, -FP8, -NVFP4, …
GGUF tier suffix on the id: Q4_K_M, IQ3_XXS, BF16, … This is the practical fallback for llama.cpp endpoints, whose id is typically the local filename rather than an HF path.

Features

Pipeline tag, HF tags, and architecture names feed boolean flags: TextGeneration, Embedding, Vision, Audio, ToolUse, Reasoning, Code. Best-effort — false means "not detected", not "definitely unsupported".

License

License.ID comes from cardData.license, with a license:<id> HF tag as fallback. Name and Link are filled when the model card declares an other-style license with a custom title and URL. Set only when HF resolution succeeded.

Lineage

Walks the declared parent outward, depth-capped (default 8) and cycle-safe. The parent at each step is read from cardData.base_model first, then from authoritative HF base_model:* tags as a fallback (those tags are present even when the API summary drops cardData). Lineage[0] is the immediate parent; the last element is the deepest declared ancestor.

Ancestor

Model.Ancestor carries a fully-resolved (non-recursive) view of the top-most upstream model we could identify:

When Lineage is non-empty, Ancestor resolves the last entry (the deepest declared base_model).
When the model is not on HF (e.g. llama.cpp ids like qwen2.5-7b-instruct-q4_k_m), or is on HF but declares no base_model and is quantized (e.g. an NVFP4 fork whose author didn't fill in the model card), the library searches HF (/api/models?search=…&sort=downloads) with a cleaned-up query and picks the best non-quantized candidate. Concretely:
- The query and candidate ids are stripped of vendor/GGUF quant suffixes and fork-marker words (uncensored, abliterated, dolphin, hermes, merge, dare, lora, dpo, ultra, …), so HF's ranker doesn't bias toward other forks.
- Candidates whose own tags include a quant marker (gguf, compressed-tensors, awq, gptq, bitsandbytes, 4-bit, 8-bit, nf4, exl2) are dropped — a quant of a quant isn't an upstream.
- Two similarity gates: at least 60% of normalized query tokens must appear in the candidate (query ⊆ candidate), and at least 80% of the candidate's tokens must appear in the query (candidate ⊆ query). The second gate kills sibling forks that introduce extra tokens beyond the query.
Disable the search fallback with Enumerator.SkipGuessParent. Native-dtype HF models with no lineage (bf16/fp16/fp32) are treated as bases themselves and never searched, so true bases like meta-llama/Meta-Llama-3-8B aren't pointed at their own siblings.

If the resolved Ancestor itself declares a base_model, the library pivots to the tip of that chain and re-resolves — repeating up to MaxLineageDepth times — so you land on the deepest reachable base rather than stopping at the first hop. This matters mostly after a search-guess: a guessed parent like org/llama-3-8b-instruct will pivot to org/llama-3-8b when the latter is declared upstream.

The non-recursion guarantee: Ancestor.Ancestor is always nil. Ancestor entries do walk their own Lineage though, so you can still see the ancestor's own declared upstream chain.

MaxModelLen

Endpoint-reported value wins (it reflects the configured serving limit, e.g. --max-model-len 32768 on a 128k-context model). If the endpoint omits it, falls back to config.max_position_embeddings from HF.

Configuration

Enumerator fields:

Field	Purpose
`EndpointURL`	OpenAI-compatible base; `/v1` and `/v1/models` both accepted.
`APIKey`	Bearer token for the endpoint.
`HFBaseURL`	Override the Hub root (tests / private mirrors).
`HFToken`	Bearer token for HuggingFace.
`HTTPClient`	Custom client for endpoint requests (default: 30s timeout).
`HFHTTPClient`	Custom client for HF requests (default: 30s timeout).
`MaxLineageDepth`	Cap on `base_model` traversal (default 8).
`SkipHF`	Disable HF resolution; only id-derived signals are used.
`SkipGuessParent`	Disable the HF search fallback used to populate Ancestor when direct resolution fails. Lineage-tip Ancestor is unaffected.

Development

make test       # unit tests
make test-race  # race detector
make cover      # coverage report
make all        # fmt + vet + test

The repo follows a tight convention: every functional change ships in its own commit with a short rationale, and every new or modified feature has a test.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
resolve-model		resolve-model
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
ancestor_test.go		ancestor_test.go
compliance.go		compliance.go
compliance_test.go		compliance_test.go
enumerate.go		enumerate.go
enumerate_test.go		enumerate_test.go
gguf.go		gguf.go
gguf_test.go		gguf_test.go
go.mod		go.mod
huggingface.go		huggingface.go
huggingface_test.go		huggingface_test.go
types.go		types.go
vllm.go		vllm.go
vllm_test.go		vllm_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

model-meta

Install

Quick start

CLI

What you get back

How resolution works

Aggregation

Quantization

Features

License

Tags

Lineage

Ancestor

MaxModelLen

Configuration

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

model-meta

Install

Quick start

CLI

What you get back

How resolution works

Aggregation

Quantization

Features

License

Tags

Lineage

Ancestor

MaxModelLen

Configuration

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages