Skip to content

Update differ to ouptut MCF files#1998

Open
vish-cs wants to merge 1 commit into
datacommonsorg:masterfrom
vish-cs:differ
Open

Update differ to ouptut MCF files#1998
vish-cs wants to merge 1 commit into
datacommonsorg:masterfrom
vish-cs:differ

Conversation

@vish-cs
Copy link
Copy Markdown
Contributor

@vish-cs vish-cs commented May 11, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the import_differ tool to generate diffs in MCF format and a consolidated JSON summary, replacing the previous CSV-based outputs. It introduces a direct runner mode for executing a Java-based differ via subprocess and updates the validation logic to consume these new formats. Feedback from the review highlights a mismatch in the glob pattern used to locate MCF diff files, a regression in defensive error handling during JSON parsing, and confusing logic regarding the runner_mode flag mapping where the local mode triggers the Java runner instead of the native Python implementation.

Comment thread tools/import_validation/runner.py Outdated
Comment thread tools/import_validation/runner.py Outdated
Comment thread tools/import_differ/import_differ.py Outdated
@vish-cs vish-cs requested a review from ajaits May 11, 2026 11:39
@vish-cs vish-cs force-pushed the differ branch 3 times, most recently from 866f95e to aca06b8 Compare May 14, 2026 10:05
Comment thread tools/import_differ/differ_utils.py
Comment thread tools/import_differ/import_differ.py
for diff_type in [
Diff.ADDED.name, Diff.DELETED.name, Diff.MODIFIED.name
]:
df_type = diff_df[diff_df[Column.diff_type.name] == diff_type]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the obs nodes are being filtered separately by type, can we write them into separate mcf files too?
That may help with other analysis and also skip the diffType property in the node.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to create 3 separate files nodes-added.mcf, nodes-deleted.mcf, and nodes-modified.mcf

Comment thread tools/import_differ/import_differ.py
Comment thread tools/import_differ/import_differ.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants