Skip to content

fix output for query cli#2017

Open
jperez999 wants to merge 4 commits into
NVIDIA:mainfrom
jperez999:clean-cli-skills
Open

fix output for query cli#2017
jperez999 wants to merge 4 commits into
NVIDIA:mainfrom
jperez999:clean-cli-skills

Conversation

@jperez999
Copy link
Copy Markdown
Collaborator

Description

cleans output of query sub command to only display source, page_number and text.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jperez999 jperez999 self-assigned this May 12, 2026
@jperez999 jperez999 requested review from a team as code owners May 12, 2026 00:05
@jperez999 jperez999 requested a review from edknv May 12, 2026 00:05
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR filters the output of the query CLI sub-command so that only text, source, and page_number are returned to the user, dropping internal VDB fields such as _distance. The change is isolated to query_documents in sdk_workflow.py and is covered by an updated test that distinguishes raw retriever output from the public-facing projection.

  • sdk_workflow.py: A list comprehension projects each hit to the three public fields using .get() with safe defaults, preventing KeyError on documents missing metadata.
  • test_root_cli_workflow.py: Test fixtures split into raw_hits (full VDB response) and public_hits (expected CLI output), with assertions on both the parsed JSON and the exact formatted string.

Confidence Score: 4/5

Safe to merge after fixing the line-length violation that will cause the Black/Flake8 pre-commit hook to fail.

The logic change is correct and the test covers the new filtering behaviour. The single list comprehension in sdk_workflow.py is ~130 characters and will be rejected by the enforced Black + Flake8 pre-commit hooks, blocking CI on the changed file.

nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py — line 85 needs reformatting to pass Black/Flake8 checks.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py Adds a projection step in query_documents to strip VDB-internal fields (_distance, etc.) and expose only text, source, and page_number. Uses .get() with safe defaults. One list comprehension line exceeds the 120-char limit enforced by Black/Flake8.
nemo_retriever/tests/test_root_cli_workflow.py Test updated to distinguish raw_hits (full VDB response with _distance) from public_hits (filtered output), correctly asserting the CLI prints only the three public fields.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CLI: retriever query] --> B[query_documents]
    B --> C[Retriever.query]
    C --> D[LanceDB VDB]
    D --> E["raw hits\n{text, source, page_number,\n_distance, ...}"]
    E --> F["project hits\n{text, source, page_number}"]
    F --> G[JSON output to stdout]
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py:85
The list comprehension on this line is ~130 characters, exceeding the project's enforced 120-character limit (Black + Flake8). The pre-commit hook will reject the file as-is, blocking CI. Black would reformat it to a multi-line style.

```suggestion
    hits = [
        {
            "text": hit.get("text", ""),
            "source": hit.get("source", ""),
            "page_number": hit.get("page_number"),
        }
        for hit in hits
    ]
```

Reviews (3): Last reviewed commit: "Merge branch 'clean-cli-skills' of https..." | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py Outdated
jperez999 and others added 3 commits May 11, 2026 20:41
retriever = Retriever(top_k=top_k, vdb_kwargs={"uri": lancedb_uri, "table_name": table_name})
return retriever.query(query)
hits = retriever.query(query)
hits = [{"text": hit.get("text", ""), "source": hit.get("source", ""), "page_number": hit.get("page_number")} for hit in hits]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The list comprehension on this line is ~130 characters, exceeding the project's enforced 120-character limit (Black + Flake8). The pre-commit hook will reject the file as-is, blocking CI. Black would reformat it to a multi-line style.

Suggested change
hits = [{"text": hit.get("text", ""), "source": hit.get("source", ""), "page_number": hit.get("page_number")} for hit in hits]
hits = [
{
"text": hit.get("text", ""),
"source": hit.get("source", ""),
"page_number": hit.get("page_number"),
}
for hit in hits
]
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py
Line: 85

Comment:
The list comprehension on this line is ~130 characters, exceeding the project's enforced 120-character limit (Black + Flake8). The pre-commit hook will reject the file as-is, blocking CI. Black would reformat it to a multi-line style.

```suggestion
    hits = [
        {
            "text": hit.get("text", ""),
            "source": hit.get("source", ""),
            "page_number": hit.get("page_number"),
        }
        for hit in hits
    ]
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Collaborator

@randerzander randerzander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns are much cleaner, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants