Integrate reusable xCRG package into ARAX by venkataseshtej · Pull Request #2772 · RTXteam/RTX

venkataseshtej · 2026-05-18T06:27:28Z

This PR integrates the reusable catrax-xcrg package into ARAX for MVP2/xCRG inferred ChemicalEntity-Gene activity/abundance queries.

Main changes

Adds catrax-xcrg as a pinned dependency.
Detects MVP2/xCRG query shape in ARAX_query_graph_interpreter.
Routes matching queries to connect(action=xcrg).
Calls the reusable catrax-xcrg package from ARAX_connect.
Uses config/env-driven Retriever URL, timeout, and TF batch size:
- ARAX_XCRG_RETRIEVER_URL
- ARAX_XCRG_TIMEOUT
- ARAX_XCRG_TF_BATCH_SIZE
Uses ARAX NGD DB path via get_curie_ngd_path().
Makes ResultTransformer safely no-op for xCRG responses because xCRG already returns final TRAPI support graph output.
Fixes ARAX final result count for responses where connect() replaces the message.

Validation

I ran one MVP2/xCRG query through ARAX locally.

The query was recognized as xCRG MVP2 and routed to:

connect(action=xcrg)

It returned real results:

status: OK
message: Normal completion with 1378 results.
xcrg_connect_flag: True

TRAPI validator result on the ARAX-produced response:

CRITICAL: 0
ERRORS: 0
WARNINGS: 0

Extra checks also passed:

missing node binding attributes: 0
missing edge binding attributes: 0
KG edges missing attributes: 0
KG edges missing sources: 0
metatype:Datetime attributes: 0
schema_version: 1.6.0
biolink_version: 4.3.2

Notes

For local testing, I used:

ARAX_XCRG_RETRIEVER_URL=https://retriever.ci.transltr.io/query

because dev.retriever.biothings.io was returning 424 Failed Dependency for this query shape.

The xCRG package does not hardcode the Retriever endpoint; it is config/envdriven.

Local NGD testing used get_curie_ngd_path() and a local symlink to the Shepherd NGD SQLite DB. In CI/prod, ARAX DB manager should provide the NGD DB at that expected path.

dkoslicki

Looks good, with just one comment about deployment level. @mohsenht can you review too since this is in your connect part of the code?

Also, likely best to, moving forward, pin the requirements to a versioned PyPi release rather than a commit SHA

dkoslicki · 2026-05-18T13:29:42Z

+XCRG_RETRIEVER_URL_ENV = "ARAX_XCRG_RETRIEVER_URL"
+XCRG_TIMEOUT_ENV = "ARAX_XCRG_TIMEOUT"
+XCRG_TF_BATCH_SIZE_ENV = "ARAX_XCRG_TF_BATCH_SIZE"
+DEFAULT_XCRG_RETRIEVER_URL = "https://retriever.ci.transltr.io/query"


The retriever URL should pull from whatever deployment level ARAX is currently at. For example, if the code is deployed at arax.ci.transltr.io, it should call retriever.ci.transltr.io. If deployed at arax.test.transltr.io, it should call retirever.test.transltr.io. And if at arax.transltr.io, it should call retriever.transltr.io. You should be able to pull this from configuration file (I forget exactly which).

Thanks, that makes sense. I updated xCRG Retriever URL selection to use RTXConfiguration().maturity so ARAX staging/testing/production map to the corresponding Retriever deployment. I kept ARAX_XCRG_RETRIEVER_URL as an explicit local/debug override.

I also agree that once catrax-xcrg has a versioned PyPI release, we should replace the Git commit pin with a pinned PyPI version.

dkoslicki · 2026-05-18T16:29:19Z

Awesome! But I just noticed that all the xCRG tests are skipped due to being marked as slow in the test suite. Can you please:

Run those tests locally (including the slow-marked xCRG tests)
If those tests execute relatively quickly, remove the slow marks in the automated tests
Include new, faster tests for xCRG so we can be testing that component every time the CICD runs

saramsey · 2026-05-18T16:46:19Z

From this PR, I see that environment variables are used for xCRG configuration. From the submitter's comments, I gather that the environment variables are intended for debugging purposes, which I presume means on a dev machine or in a "test rig" type environment.

If there is a potential use-case intended for setting one or more of the new xCRG environment variables within the ARAX flask application, how would we be proposing to set them? This is just a reminder that ARAX is (at least, with the current code-base we have) started out of a System V init script, on ITRB systems and on arax.ncats.io. I suppose that init script could be hacked to override xCRG environment variables, if need be? Just wondering what the plan is here. Maybe the plan is that xCRG environment variable overriding will not ever be needed or used for the Flask server ARAX on arax.ncats.io or on ITRB systems?

venkataseshtej · 2026-05-18T17:08:37Z

Awesome! But I just noticed that all the xCRG tests are skipped due to being marked as slow in the test suite. Can you please:

Run those tests locally (including the slow-marked xCRG tests)

If those tests execute relatively quickly, remove the slow marks in the automated tests

Include new, faster tests for xCRG so we can be testing that component every time the CICD runs

Thank you, I checked this and pushed an update to the PR.

I ran the existing slow-marked xCRG tests locally with --runslow. Those tests appear to be for the old legacy ARAX_infer.py / creativeCRG path, not the new package-backed connect(action=xcrg) path. They did execute, but 4/5 failed because the old legacy embedding file is missing:

Infer/data/xCRG_data/chemical_gene_embeddings_...npz

So I did not remove the slow marks from those tests, since they are not reliable always-on CI tests for the new xCRG path.

Instead, I added fast always-on tests for the new ARAX xCRG integration path in:

code/ARAX/test/test_ARAX_xcrg_connect.py

These tests cover:

deployment-aware Retriever URL mapping,
ARAX_XCRG_RETRIEVER_URL override behavior,
MVP2 query routing to connect(action=xcrg),
non-xCRG queries not routing to connect(action=xcrg),
ARAX_connect calling run_xcrg(...) with the expected config,
ResultTransformer no-op behavior for xCRG responses.

Local validation passed:

6 passed in 1.82s
git diff --check: passed
py_compile: passed

I pushed this update to the PR branch.

saramsey · 2026-05-18T17:09:58Z

@venkataseshtej thank you for coding up some new faster xCRG unit tests. That was a great idea.

saramsey · 2026-05-18T17:21:52Z

@venkataseshtej
In the new xcrg-package-integration branch, it doesn't look like the new xCRG database files are listed in RTX/code/config_dbs.json, are they? Or are there no new database files for xCRG?
https://github.com/RTXteam/RTX/blob/xcrg-package-integration/code/config_dbs.json
In any event, in that file, there seem to be "old" xCRG files that should be updated or removed.

Have you tested this PR by running the ARAX flask application server and the "Example 3" question in it?

I think it would be a good idea, before merging. See
https://github.com/RTXteam/RTX/blob/master/notes/arax-maintenance-sop.md
for some tips on how to run the flask server locally on a dev machine.

Another option is to commandeer a dev-area on arax.ncats.io and test there using the xcrg-package-integration branch code.

Personally, I prefer to test on a developer machine by just running the flask server locally. I find it is easier that doing:

ssh arax.ncats.io
sudo docker exec -it rtx2 bash
su - rt
cd /mnt/data/orangeboard/devarea/RTX
git fetch origin
git stash
git checkout xcrg-package-integration
python3.12 code/ARAX/ARAXQuery/ARAX_database_manager.py -c
exit
# service RTX_OpenAPI_devarea stop
# service RTX_OpenAPI_devarea start
# tail -f /tmp/RTX_OpenAPI_devarea.elog
<Ctrl-C>
su - rt
cd /mnt/data/orangeboard/devarea/RTX
git checkout master
git stash pop
exit
# service RTX_OpenAPI_devarea stop
# service RTX_OpenAPI_devarea start
# tail -f /tmp/RTX_OpenAPI_devarea.elog
<Ctrl-C>

and so forth.

If you would like some help getting the ARAX Flask server working on your developer machine, @hodgesf or @bazarkua can help show you how to do it.

venkataseshtej · 2026-05-18T17:23:15Z

From this PR, I see that environment variables are used for xCRG configuration. From the submitter's comments, I gather that the environment variables are intended for debugging purposes, which I presume means on a dev machine or in a "test rig" type environment.

If there is a potential use-case intended for setting one or more of the new xCRG environment variables within the ARAX flask application, how would we be proposing to set them? This is just a reminder that ARAX is (at least, with the current code-base we have) started out of a System V init script, on ITRB systems and on arax.ncats.io. I suppose that init script could be hacked to override xCRG environment variables, if need be? Just wondering what the plan is here. Maybe the plan is that xCRG environment variable overriding will not ever be needed or used for the Flask server ARAX on arax.ncats.io or on ITRB systems?

Thanks, that is a good point. The intention is that the ARAX_XCRG_* environment variables are optional local/debug overrides only, not required deployment configuration for Flask ARAX.

For deployed ARAX, no new environment variables need to be set. The Retriever URL is now selected from RTXConfiguration().maturity, so the default behavior is deployment-aware:

staging     -> https://retriever.ci.transltr.io/query
testing     -> https://retriever.test.transltr.io/query
production  -> https://retriever.transltr.io/query
development -> https://retriever.ci.transltr.io/query

The timeout and TF batch size also have code defaults, so the System V init script should not need to be modified for xCRG.

The env vars are mainly for local testing/debugging, for example if a developer wants to temporarily point xCRG to a specific Retriever deployment or adjust timeout/batch size without changing code. If we later decide these values need to be production-tunable, I agree the cleaner approach would be to add them to the ARAX/RTX configuration rather than relying on System V init-script environment overrides.

I can also add a short code comment to make this explicit.

venkataseshtej · 2026-05-18T17:28:58Z

@venkataseshtej have you tested this PR by running the ARAX flask application server? I think it might be a good idea, before merging. See https://github.com/RTXteam/RTX/blob/master/notes/arax-maintenance-sop.md

Thanks, good point. I have tested the PR through the local ARAXQuery().query(...) path and validated the resulting TRAPI response, but I have not yet tested it through the ARAX Flask application server.

I agree that this is worth doing before merge. I will follow the local Flask server portion of the ARAX maintenance SOP, start the Flask server from this PR branch, submit the same MVP2/xCRG query through the HTTP endpoint, and validate the returned response with the TRAPI validator. I will report the result back on the PR before merge.

saramsey · 2026-05-18T17:42:20Z

@venkataseshtej here is the most relevant section of the ARAX Maintenance SOP for this situation, I think:
https://github.com/RTXteam/RTX/blob/master/notes/arax-maintenance-sop.md#setup-of-your-local-dev-system

Unfortunately, that section has gotten a little bit out-of-date since we deployed Tier0 ARAX. But it gives the gist of how to set things up, at least.

dkoslicki · 2026-05-18T18:18:37Z

@venkataseshtej , there are no new database files for this new xCRG implementation, correct? @saramsey , do you have any guidance on removing the old xCRG database files? This new method is model-free in comparison to the old, model-based approach.

saramsey · 2026-05-18T19:10:09Z

@saramsey , do you have any guidance on removing the old xCRG database files?

If there are database files that are no longer needed, I propose that in the xcrg-package-integration branch, those database file references should be removed from:
https://github.com/RTXteam/RTX/blob/xcrg-package-integration/code/config_dbs.json

and the code corresponding to those file(s) should be removed from the ARAX_database_manager.py script (for consistency, I think that code edit be done in the same branch as the change to the config_dbs.json file):
https://github.com/RTXteam/RTX/blob/xcrg-package-integration/code/ARAX/ARAXQuery/ARAX_database_manager.py

@bazarkua or @hodgesf can help with the edits to the ARAX_database_manager.py, if that would be useful to the PSU team.

There is also this shell script,
https://github.com/RTXteam/RTX/blob/xcrg-package-integration/code/generate-db-symlinks.sh

which isn't used in an automated way by any of our ARAX systems but it is a convenience script for managing ARAX on a dev machine. I'm happy to edit it in the xcrg-package-integration branch, with permission from @venkataseshtej . I just need to know which database files are being eliminated from config_dbs.json. @venkataseshtej maybe you can comment about which specific database files referenced in config_dbs.json are going away, from ARAX? Or, I guess, I can inspect the commit when someone removes them from config_dbs.json in the branch.

Are all three of these lines going away?

"xcrg_embeddings": "/translator/data/orangeboard/databases/KG2.10.2/chemical_gene_embeddings_v1.0.KG2.10.0_refreshedTo_KG2.10.2.npz",
"xcrg_increase_model": "/translator/data/orangeboard/databases/KG2.10.0/xcrg_increase_model_v1.0.KG2.10.0_new_version.pt",
"xcrg_decrease_model": "/translator/data/orangeboard/databases/KG2.10.0/xcrg_decrease_model_v1.0.KG2.10.0_new_version.pt"

venkataseshtej · 2026-05-18T20:55:54Z

@venkataseshtej here is the most relevant section of the ARAX Maintenance SOP for this situation, I think: https://github.com/RTXteam/RTX/blob/master/notes/arax-maintenance-sop.md#setup-of-your-local-dev-system

Unfortunately, that section has gotten a little bit out-of-date since we deployed Tier0 ARAX. But it gives the gist of how to set things up, at least.

Thanks, that makes sense.

For the new package-backed xCRG path, I do not believe there are new xCRG-specific database files that need to be added to code/config_dbs.json. The new path uses Retriever through a configured URL, uses the existing ARAX NGD DB path through get_curie_ngd_path(), and the TF list is bundled inside the catrax-xcrg package.

That said, I will audit code/config_dbs.json and grep the codebase for the old xCRG/creativeCRG references. If those old xCRG entries are only for the legacy ARAX_infer.py / creativeCRG path and are no longer needed for the new package-backed connect(action=xcrg) path, I can either remove/update them in this PR or leave that as a separate cleanup, depending on what you prefer.

I also agree on the Flask server test. So far I tested through the local ARAXQuery().query(...) path and validated the returned TRAPI response, but I have not yet tested through the ARAX Flask application server yet. I will follow the local Flask server portion of the maintenance SOP, run the server from this PR branch, submit the “Example 3” query through the HTTP endpoint, and validate the returned response. I will report the result back here before merge.

venkataseshtej · 2026-05-18T21:12:08Z

@venkataseshtej . I just need to know which database files are being eliminated from config_dbs.json. @venkataseshtej may

Thanks, this makes sense.

For the new package-backed connect(action=xcrg) implementation, there are no new xCRG-specific database/model files that need to be added to code/config_dbs.json.

The new path uses:

Retriever through the configured Retriever URL,
the existing ARAX NGD DB path via get_curie_ngd_path(),
the TF (transcription factors) list bundled inside the catrax-xcrg package.

So yes, these three old model-based xCRG entries are legacy only and are not used by the new package backed xCRG path:

xcrg_embeddings
xcrg_increase_model
xcrg_decrease_model

I removed those three entries from code/config_dbs.json, removed the corresponding handling from RTXConfiguration.py and ARAX_database_manager.py, and removed the matching dev symlink entries from generate-db-symlinks.sh.

I did not remove the existing NGD DB configuration, since the new xCRG path still uses NGD through the ARAX NGD helper.

Validation:

config_dbs.json JSON syntax: OK
py_compile: OK
git diff --check: OK
fast ARAX xCRG tests: 6 passed in 1.85s
ARAXDatabaseManager check:
  xcrg_embeddings: False
  xcrg_increase_model: False
  xcrg_decrease_model: False
  curie_ngd: True

venkataseshtej · 2026-05-18T21:19:09Z

@venkataseshtej , there are no new database files for this new xCRG implementation, correct? @saramsey , do you have any guidance on removing the old xCRG database files? This new method is model-free in comparison to the old, model-based approach.

Yes, correct. The new package-backed connect(action=xcrg) implementation does not add any new xCRG-specific database/model files.

It uses:

Retriever through the configured Retriever URL,
the existing ARAX NGD DB via get_curie_ngd_path(),
and the TF list bundled inside the catrax-xcrg package.

So the old model-based xCRG files are no longer needed for this new path:

xcrg_embeddings
xcrg_increase_model
xcrg_decrease_model

I removed those legacy references from config_dbs.json, RTXConfiguration.py, ARAX_database_manager.py, and generate-db-symlinks.sh. I kept curie_ngd, since the new xCRG path still uses the existing ARAX NGD helper.

venkataseshtej · 2026-05-18T21:53:58Z

I found two separate issues and pushed follow up fixes.

First, the CI failure happened because the Docker image used in the Python analysis job clones RTX again inside the container, and that inner clone was using default branch code rather than the PR branch. That is why CI still saw the old xCRG DB entries and tried to rsync the old chemical_gene_embeddings file even after the PR branch removed those entries. I added a small CI/Docker change so the Docker build receives the PR branch and checks it out inside the container.

Second, I found that the legacy creativeCRG.py code still references RTXConfig.xcrg_embeddings_path, RTXConfig.xcrg_increase_model_path, and RTXConfig.xcrg_decrease_model_path. To avoid breaking legacy imports/tests, I kept those as legacy fallback attributes in RTXConfiguration.py, but did not add them back to config_dbs.json, ARAX_database_manager.py, or generate-db-symlinks.sh.

So the database manager should no longer download/manage the old xCRG model files, while the old creativeCRG.py code will not immediately fail with missing config attributes. The new package-backed connect(action=xcrg) path does not use those legacy model files.

Local validation passed:

py_compile: OK
git diff --check: OK
fast xCRG tests: 6 passed
DB manager old xCRG keys: False
DB manager curie_ngd: True
legacy config attrs: True

@dkoslicki @saramsey please let me know if you would prefer the CI/Docker branch checkout fix to be split into a separate PR. I included it here because otherwise this PR’s CI was not actually testing the PR branch inside the Docker container .

saramsey · 2026-05-18T23:48:47Z

First, the CI failure happened because the Docker image used in the Python analysis job clones RTX again inside the container, and that inner clone was using default branch code rather than the PR branch.

Ah yes, this is a frustrating limitation of the CICD-Dockerfile. It has tripped me up multiple times before. Some day, we (the ARAX team) should fix it.

saramsey · 2026-05-19T23:32:53Z

@dkoslicki @saramsey please let me know if you would prefer the CI/Docker branch checkout fix to be split into a separate PR. I included it here because otherwise this PR’s CI was not actually testing the PR branch inside the Docker container .

I'm OK with including these fixes in this PR. Thank you for implementing those fixes, @venkataseshtej.

dkoslicki · 2026-05-20T17:39:08Z

@edeutsch to point arax.ncats.io/test to this branch for us to test

edeutsch · 2026-05-20T22:13:19Z

okay @dkoslicki and @chunyuma I have now deployed branch xcrg-package-integration to /test.

A naive first test seems to show it is working well:
https://arax.ncats.io/test/?r=458983

But please test and confirm if we are ready to merge into master and deploy everywhere.
Others are welcome to test, too!

hodgesf · 2026-05-20T23:15:53Z

All tests pass locally with the pytest suite and the flask server. I think this is good to go. Also, example 3 is now lightening fast, compared to before this update. Awesome job!!

venkataseshtej · 2026-05-21T01:32:14Z

@venkataseshtej
Have you tested this PR by running the ARAX flask application server and the "Example 3" question in it?

I think it would be a good idea, before merging. See https://github.com/RTXteam/RTX/blob/master/notes/arax-maintenance-sop.md for some tips on how to run the flask server locally on a dev machine.

Another option is to commandeer a dev-area on arax.ncats.io and test there using the xcrg-package-integration branch code.

Personally, I prefer to test on a developer machine by just running the flask server locally. I find it is easier that doing:

I completed the local ARAX Flask server testing from the xcrg-package-integration branch.

I tested through the HTTP endpoint:

POST http://localhost:5001/api/arax/v1.4/query

1. UI Example 3 / xCRG MVP2 increased query

status: Success
operations: connect(action=xcrg)
results: 1083
kg_nodes: 1571
kg_edges: 3107
auxiliary_graphs: 1523
schema_version: 1.6.0
biolink_version: 4.3.2

TRAPI validator:

CRITICAL: 0
ERRORS: 0
WARNINGS: 0

2. Additional xCRG decreased query

status: Success
operations: connect(action=xcrg)
results: 1378
kg_nodes: 1866
kg_edges: 5243
auxiliary_graphs: 2661
schema_version: 1.6.0
biolink_version: 4.3.2

TRAPI validator:

CRITICAL: 0
ERRORS: 0
WARNINGS: 0

Extra checks passed for both responses:

missing node binding attributes: 0
missing edge binding attributes: 0
KG edges missing attributes: 0
KG edges missing sources: 0
metatype:Datetime attributes: 0

So the local Flask HTTP server path is working for the new package-backed connect(action=xcrg) route. This now covers both the direct ARAXQuery().query(...) path and the Flask HTTP endpoint path.

edeutsch · 2026-05-21T02:12:59Z

This seems great, but I'm afraid the TRAPI validator report:

CRITICAL: 0
ERRORS: 0
WARNINGS: 0

is a red flag. This is extremely difficult to achieve and thus seems unlikely. (the validator is very fussy, so there are always warnings)

The thing is that the validator does not run on initial queries (for various reasons including it would slow things down unacceptably). The validator only runs when a previous result is recalled.

I ran the Example 3 query on /test. No errors are visible. But then I recalled it:
https://arax.ncats.io/test/?r=458992

This reveals the errors:

The biggest issue seems to be that the NCBIGene entries have all empty/null properties:

        "NCBIGene:9994": {
          "attributes": [],
          "categories": [],
          "is_set": false,
          "name": null
        },

I'm not entirely sure where these entries come from, but this is definitely invalid.

dkoslicki · 2026-05-21T02:27:30Z

I'm seeing similar issues in the nodes in the support graphs that are missing all their properties/names/etc.:

@venkataseshtej , the nodes in the support graphs should be as they are returned from retriever (i.e. all of their properties and the like preserved)

venkataseshtej · 2026-05-21T02:27:52Z

This seems great, but I'm afraid the TRAPI validator report:
CRITICAL: 0
ERRORS: 0
WARNINGS: 0
is a red flag. This is extremely difficult to achieve and thus seems unlikely. (the validator is very fussy, so there are always warnings)

The thing is that the validator does not run on initial queries (for various reasons including it would slow things down unacceptably). The validator only runs when a previous result is recalled.

I ran the Example 3 query on /test. No errors are visible. But then I recalled it: https://arax.ncats.io/test/?r=458992

This reveals the errors:

The biggest issue seems to be that the NCBIGene entries have all empty/null properties:
        "NCBIGene:9994": {
          "attributes": [],
          "categories": [],
          "is_set": false,
          "name": null
        },
I'm not entirely sure where these entries come from, but this is definitely invalid.

Thanks @edeutsch for pointing this out. I see the issue now.
It looks like some KG nodes, especially NCBIGene:* support/path nodes, are coming through with empty categories and null/empty node properties. This seems to be a final TRAPI cleanup issue in the 'catrax-xcrg' package rather than an ARAX routing issue.

I will update the package so that before returning the final response, every KG node has non-empty categories, using CURIE-prefix fallback categories where needed, e.g. NCBIGene:* -> biolink:Gene. I will also check for / prune dangling nodes during that cleanup.
I will include the 500 result limit in the same package update, then update the pinned catrax-xcrg commit in this ARAX PR and retest through the local Flask endpoint before asking for test redeployment.

dkoslicki · 2026-05-21T02:30:20Z

@venkataseshtej Sounds good (our messages were posted simultaneously), but one thing to note: don't go about it by looking up the node categories, nor using any fallback rules like CURIE prefix to infer categories. Just use the nodes as returned by retriever. Since you're getting them from retriever, just preserve and pass through all of these properties. You definitely do not want to be figuring them out yourself.

venkataseshtej · 2026-05-21T02:34:26Z

I'm seeing similar issues in the nodes in the support graphs that are missing all their properties/names/etc.:
@venkataseshtej , the nodes in the support graphs should be as they are returned from retriever (i.e. all of their properties and the like preserved)

Got it.For the support graph nodes, I should not just create fallback nodes from CURIE prefixes if Retriever already returned full node objects. The correct fix is to preserve the node records from Retriever’s knowledge graph when copying support/path edges into the final response, including their name, categories, attributes, and other properties.

I’ll update the catrax-xcrg final TRAPI builder so that support graph node IDs are hydrated from the original Retriever KG nodes first. CURIE-prefix category inference will only be used as a fallback when a referenced node is genuinely missing from the Retriever node map.

@venkataseshtej Sounds good (our messages were posted simultaneously), but one thing to note: don't go about it by looking up the node categories, nor using any fallback rules like CURIE prefix to infer categories. Just use the nodes as returned by retriever. Since you're getting them from retriever, just preserve and pass through all of these properties. You definitely do not want to be figuring them out yourself.

Thanks @dkoslicki, that clarification helps. I will avoid adding any CURIE-prefix category inference or other category lookup logic.

I will fix this by preserving the node objects exactly as returned by Retriever. So when xCRG copies support/path edges into the final KG/support graphs, it will also copy the corresponding subject/object node records from the Retriever KG, including their names, categories, attributes, and other properties.

If a support edge references a node that is not present in the Retriever KG node map, I will treat that as an incomplete support path and avoid fabricating node metadata. The final clean up goal will be pass through preservation from Retriever, not the category reconstruction in xCRG.

dkoslicki · 2026-05-21T02:47:09Z

What you wrote later is correct: treat missing node properties as truly missing. Earlier in this message, you have:

CURIE-prefix category inference will only be used as a fallback when a referenced node is genuinely missing from the Retriever node map.

This is what we don't want. If retriever is missing stuff, it's missing and needs to be fixed by them. We don't want to silently be trying to fix their mistakes, but rather pass them through verbatim, that way if someone sees missing information, we can point to the source and say "that's their problem". No fall backs or trying to fix retriever issues.

venkataseshtej · 2026-05-21T06:08:37Z

What you wrote later is correct: treat missing node properties as truly missing. Earlier in this message, you have:

CURIE-prefix category inference will only be used as a fallback when a referenced node is genuinely missing from the Retriever node map.

This is what we don't want. If retriever is missing stuff, it's missing and needs to be fixed by them. We don't want to silently be trying to fix their mistakes, but rather pass them through verbatim, that way if someone sees missing information, we can point to the source and say "that's their problem". No fall backs or trying to fix retriever issues.

Follow up update @dkoslicki : I tested the updated xCRG package locally through the ARAX Flask server using an Example 3-style xCRG query through:

POST http://127.0.0.1:5001/api/arax/v1.4/query

The fresh Flask response now looks clean:

HTTP: 200
operations: connect(action=xcrg)
results: 500
kg_nodes: 501
kg_edges: 1191
aux_graphs: 500
empty-category KG nodes: 0
empty/null placeholder KG nodes: 0
missing support_graph references: 0

I also validated the saved Flask response locally with reasoner-validator 6.0.1, matching the validator version that exposed the /test recall issue:

CRITICAL: 0
ERRORS: 0
WARNINGS: 0

The package now preserves Retriever-provided node metadata when available, avoids CURIE category fallback repair, avoids incomplete evidence nodes/support references and caps xCRG results at 500.

I have pushed the updated catrax-xcrg package and updated the ARAX pin in this PR. @edeutsch could you please redeploy xcrg-package-integration to /test so we can validate the /test response before merge?

edeutsch · 2026-05-21T14:01:46Z

Thank you @venkataseshtej this is now looking very good.
I have deployed to /test and performed the Example 3 query again
https://arax.ncats.io/test/?r=460178
Validation passed, although there are 3 warnings.

The first one is an objection to this:

I did a little investigating and I suspect these warnings come Retriever data, not xCRG data, but I'm not certain.

I think we're in good shape.

Is there anything else that needs to be done or evaluated before we merge into master ?

bazarkua · 2026-05-23T00:12:06Z

Just wanted to mention looks like that after commit 9e5f571
CI Test build fails even using the PR branch

====== 3 failed, 157 passed, 133 skipped, 1 warning in 1098.52s (0:18:18) ======

saramsey · 2026-05-26T16:16:23Z

Just recapping here my thoughts about xCRG TRAPI logging, that I shared in Slack on Friday:

As of Friday, the new xCRG module was (seemingly) not giving any details in the TRAPI message log from which we are able to discern what is going wrong. For example, we see no information about the HTTP status code returned from Retriever, or whether Retreiver's response's TRAPI message itself contained an error in the TRAPI log, and finally, whether any results (and if there were results, how many) were returned from Retriever. It would also be useful if xCRG could emit to STDERR, at least in debugging mode, the TRAPI query graph that is is POSTing to Retriever. So a team member can manually curl the the TRAPI message to Retriever's API at the command-line to see how it responds.

venkataseshtej · 2026-05-26T21:21:50Z

Just recapping here my thoughts about xCRG TRAPI logging, that I shared in Slack on Friday:

As of Friday, the new xCRG module was (seemingly) not giving any details in the TRAPI message log from which we are able to discern what is going wrong. For example, we see no information about the HTTP status code returned from Retriever, or whether Retreiver's response's TRAPI message itself contained an error in the TRAPI log, and finally, whether any results (and if there were results, how many) were returned from Retriever. It would also be useful if xCRG could emit to STDERR, at least in debugging mode, the TRAPI query graph that is is POSTing to Retriever. So a team member can manually curl the the TRAPI message to Retriever's API at the command-line to see how it responds.

Thanks @saramsey , agreed. This is feasible and I think it will make the xCRG path much easier to debug.

I have already added the first part of this in the latest catrax-xcrg package update: each Retriever call now logs the Retriever HTTP status, Retriever TRAPI status/description, returned result/node/edge counts, and Retriever log messages when the call returns zero results or a non-complete status.

I will also make sure the xCRG call label is clear in the logs, e.g. direct lookup vs TF-mediated template/batch, and that failures/non 200 responses are surfaced as ARAX/TRAPI warnings or errors rather than silently resulting in zero results.

For the exact Retriever query graph, I agree that we should expose it in debugging mode. I’ll keep the normal TRAPI log compact, but add DEBUG-level logging and/or debug artifact output for the full TRAPI query being posted to Retriever so it can be copied and tested with curl.

I will verify this in /test after the latest branch is redeployed so the ARAX TRAPI message log shows enough information to diagnose Retriever behavior directly.

venkataseshtej · 2026-05-26T21:32:47Z

Status update on the earlier xCRG changes: (05/22/26)

I pushed the updated package/ARAX fixes. The current xCRG config now calls Retriever with tiers=[0, 1] instead of only Tier 0 or only Tier 1. This keeps Tier 0 included as intended, while avoiding the previous zero-result behavior when Tier 0 alone returned no results.

I also added Retriever diagnostics in the catrax-xcrg package so the ARAX TRAPI log now reports:

- Retriever HTTP status
- Retriever TRAPI status/description
- returned result/node/edge counts
- Retriever log messages when a lookup returns zero results or a non-complete status

Local validation passed:

Example 3 live xCRG smoke against Retriever CI with tiers=[0,1]: 500 results
Second MVP2 query against Retriever CI with tiers=[0,1]: 41 results
TRAPI validator on the second response: 0 critical / 0 errors / 0 warnings
Fast ARAX xCRG tests: 6 passed
xCRG package tests: 9 passed

saramsey · 2026-05-27T21:44:44Z

@venkata SESH TEJ MATTA I have a question about the new xCRG. I would have put this question in the relevant ARAX issue, but there doesn't seem to be an obvious one (or maybe it is the very old issue #2048?), for the new xCRG, only this PR. Anyhow, here's my question. My understanding is that the new xCRG does not put any information into the ARAX UI's "Expansion Progress" screen. I also vaguely recall hearing that it was claimed (I don't recall in what context, sorry) that this is because xCRG doesn't use ARAX-expand. But, on looking at the ARAX code base, the function that updates the information in the ARAX UI's "Expansion Progress" screen is not per se in ARAX-expand (yes, the name of the screen could be understandably misconstrued to mean that it only works with ARAX-expand), but in the ARAX_response.py module's ARAXResponse class, specifically, the update_query_plan method. And since xCRG is clearly streaming results back to the ARAX UI, and since xCRG's creativeCRG.py module's creativeCRG class has an ARAXResponse object, as shown in its initializer here:

RTX/code/ARAX/ARAXQuery/Infer/scripts/creativeCRG.py

Line 239 in b31e001

def __init__(self, response: ARAXResponse, data_path: str):

I am wondering, why exactly can't xCRG update to the ARAX UI's Expansion Progress screen? Can't it just call response.update_query_plan, and just specify the qedge_key for the query graph's edge that connects between the "chemical" query node and the "gene" query node? Please forgive my ignorance. I am only asking because this has been (per my understanding) an interface contract that all ARAX modules abide by, for a long time, including xDTD.

edeutsch · 2026-05-27T22:11:12Z

I would encourage Steve's suggestion on the query_plan updating.
But yet, I'm thinking that it would be good to get the current functionality deployed first before working on this, rather than holding up deployment for the new functionality, since Sarah seems eager for the new functionality.

saramsey · 2026-05-28T04:15:59Z

My understanding is that there is a blocking issue with the new xCRG code, in that it is not returning aux graphs. I may be mistaken, but that is the impression I am getting from the thread on Slack in #deployment.

hodgesf · 2026-05-28T22:30:08Z

Dr. Ramsey's comment above has been verified. Example 3 is failing on /test because there are TRAPI validation errors.

Integrated reusable xCRG package into ARAX

46d049d

venkataseshtej requested review from mohsenht and saramsey May 18, 2026 06:27

venkataseshtej self-assigned this May 18, 2026

dkoslicki requested changes May 18, 2026

View reviewed changes

Use ARAX maturity for xCRG Retriever URL

2e6acae

Add fast ARAX xCRG connect tests

8af5612

Remove legacy xCRG model database references

bd92deb

venkataseshtej added 2 commits May 18, 2026 17:40

Build CI Docker image from PR branch

9863dfa

Keep legacy xCRG config paths outside DB manager

e28eb98

venkataseshtej added 2 commits May 21, 2026 01:22

Updated xCRG package pin with updated filtering

5651634

Updated xCRG package pin for evidence node cleanup

368f918

venkataseshtej added 2 commits May 21, 2026 18:41

Added xCRG NGD publication support and pinned node metadata included

d76c52d

Resolve merge conflicts with master

9e5f571

Use Retriever tiers 0 and 1 for ARAX xCRG lookups

b31e001

saramsey mentioned this pull request May 27, 2026

What genes are upregulated by chemical X goes the wrong way or in some way is messed up #2048

Open

Conversation

venkataseshtej commented May 18, 2026

Main changes

Validation

Notes

Uh oh!

dkoslicki left a comment

Choose a reason for hiding this comment

Uh oh!

dkoslicki May 18, 2026

Choose a reason for hiding this comment

Uh oh!

venkataseshtej May 18, 2026

Choose a reason for hiding this comment

Uh oh!

dkoslicki commented May 18, 2026

Uh oh!

saramsey commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkataseshtej commented May 18, 2026

Uh oh!

saramsey commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saramsey commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkataseshtej commented May 18, 2026

Uh oh!

venkataseshtej commented May 18, 2026

Uh oh!

saramsey commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkoslicki commented May 18, 2026

Uh oh!

saramsey commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkataseshtej commented May 18, 2026

Uh oh!

venkataseshtej commented May 18, 2026

Uh oh!

venkataseshtej commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkataseshtej commented May 18, 2026

Uh oh!

saramsey commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saramsey commented May 19, 2026

Uh oh!

dkoslicki commented May 20, 2026

Uh oh!

edeutsch commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hodgesf commented May 20, 2026

Uh oh!

venkataseshtej commented May 21, 2026

1. UI Example 3 / xCRG MVP2 increased query

2. Additional xCRG decreased query

Uh oh!

edeutsch commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkoslicki commented May 21, 2026

Uh oh!

venkataseshtej commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkoslicki commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkataseshtej commented May 21, 2026

Uh oh!

dkoslicki commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

saramsey commented May 18, 2026 •

edited

Loading

saramsey commented May 18, 2026 •

edited

Loading

saramsey commented May 18, 2026 •

edited

Loading

saramsey commented May 18, 2026 •

edited

Loading

saramsey commented May 18, 2026 •

edited

Loading

venkataseshtej commented May 18, 2026 •

edited

Loading

saramsey commented May 18, 2026 •

edited

Loading

edeutsch commented May 20, 2026 •

edited

Loading

edeutsch commented May 21, 2026 •

edited

Loading

venkataseshtej commented May 21, 2026 •

edited

Loading

dkoslicki commented May 21, 2026 •

edited

Loading

dkoslicki commented May 21, 2026 •

edited

Loading

venkataseshtej commented May 21, 2026 •

edited

Loading

venkataseshtej commented May 26, 2026 •

edited

Loading