Skip to content

Revert pr 2025#2041

Open
gmechali wants to merge 2 commits into
datacommonsorg:masterfrom
gmechali:revert-pr-2025
Open

Revert pr 2025#2041
gmechali wants to merge 2 commits into
datacommonsorg:masterfrom
gmechali:revert-pr-2025

Conversation

@gmechali
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies SQL query generation in aggregation_utils.py by removing single-quote escaping on import_names and hardcoding the 'dc/base/' prefix and 'dc/base/GeneratedGraphs' provenance. The reviewer points out that these changes introduce critical SQL injection vulnerabilities because user-controlled input is no longer escaped. Additionally, hardcoding the base prefix breaks functionality for non-base Data Commons instances where self.is_base_dc is False.

safe_names = [name.replace("'", "''") for name in import_names]
prefix = "dc/base/" if self.is_base_dc else ""
provenances = [f"'{prefix}{name}'" for name in safe_names]
provenances = [f"'dc/base/{name}'" for name in import_names]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Reverting this change introduces a SQL injection vulnerability because single quotes in import_names (which are sourced from user-controlled HTTP request payloads) are no longer escaped. Additionally, hardcoding the 'dc/base/' prefix breaks functionality when self.is_base_dc is False.

        safe_names = [name.replace("'", "''") for name in import_names]
        prefix = "dc/base/" if self.is_base_dc else ""
        provenances = [f"'{prefix}{name}'" for name in safe_names]
        gen_graphs_prov = 'dc/base/GeneratedGraphs' if self.is_base_dc else 'GeneratedGraphs'

safe_names = [name.replace("'", "''") for name in import_names]
prefix = "dc/base/" if self.is_base_dc else ""
provenances = [f"'{prefix}{name}'" for name in safe_names]
provenances = [f"'dc/base/{name}'" for name in import_names]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Reverting this change introduces a SQL injection vulnerability because single quotes in import_names are no longer escaped. Additionally, hardcoding the 'dc/base/' prefix breaks functionality when self.is_base_dc is False.

        safe_names = [name.replace("'", "''") for name in import_names]
        prefix = "dc/base/" if self.is_base_dc else ""
        provenances = [f"'{prefix}{name}'" for name in safe_names]
        gen_graphs_prov = 'dc/base/GeneratedGraphs' if self.is_base_dc else 'GeneratedGraphs'

safe_names = [name.replace("'", "''") for name in import_names]
prefix = "dc/base/" if self.is_base_dc else ""
provenances = [f"'{prefix}{name}'" for name in safe_names]
provenances = [f"'dc/base/{name}'" for name in import_names]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Reverting this change introduces a SQL injection vulnerability because single quotes in import_names are no longer escaped. Additionally, hardcoding the 'dc/base/' prefix breaks functionality when self.is_base_dc is False.

        safe_names = [name.replace("'", "''") for name in import_names]
        prefix = "dc/base/" if self.is_base_dc else ""
        provenances = [f"'{prefix}{name}'" for name in safe_names]
        gen_graphs_prov = 'dc/base/GeneratedGraphs' if self.is_base_dc else 'GeneratedGraphs'

# Format import names for the SQL IN clause
imports_str = ", ".join([f"'{name}'" for name in safe_names])
provenance_dcid_expr = "CONCAT('dc/base/', raw.import_name)" if self.is_base_dc else "raw.import_name"
imports_str = ", ".join([f"'{name}'" for name in import_names])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Reverting this change introduces a SQL injection vulnerability because single quotes in import_names are no longer escaped. Additionally, hardcoding the 'dc/base/' prefix breaks functionality when self.is_base_dc is False.

        safe_names = [name.replace("'", "''") for name in import_names]
        imports_str = ", ".join([f"'{name}'" for name in safe_names])
        provenance_dcid_expr = "CONCAT('dc/base/', raw.import_name)" if self.is_base_dc else "raw.import_name"

'linkedContainedInPlace' as predicate,
ancestor_place as object_id,
'{gen_graphs_prov}' as provenance
'dc/base/GeneratedGraphs' as provenance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding 'dc/base/GeneratedGraphs' breaks support for non-base Data Commons instances where self.is_base_dc is False. Use the dynamic gen_graphs_prov variable instead.

Suggested change
'dc/base/GeneratedGraphs' as provenance
'{gen_graphs_prov}' as provenance

'linkedMemberOf' as predicate,
ancestor as object_id,
'{gen_graphs_prov}' as provenance
'dc/base/GeneratedGraphs' as provenance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding 'dc/base/GeneratedGraphs' breaks support for non-base Data Commons instances where self.is_base_dc is False. Use the dynamic gen_graphs_prov variable instead.

Suggested change
'dc/base/GeneratedGraphs' as provenance
'{gen_graphs_prov}' as provenance

'linkedMember' as predicate,
subject_id as object_id,
'{gen_graphs_prov}' as provenance
'dc/base/GeneratedGraphs' as provenance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding 'dc/base/GeneratedGraphs' breaks support for non-base Data Commons instances where self.is_base_dc is False. Use the dynamic gen_graphs_prov variable instead.

Suggested change
'dc/base/GeneratedGraphs' as provenance
'{gen_graphs_prov}' as provenance

JSON_VALUE(v, '$.key') as date_val,
SAFE_CAST(JSON_VALUE(v, '$.value') AS FLOAT64) as value_num,
{provenance_dcid_expr} as provenance_dcid,
CONCAT('dc/base/', raw.import_name) as provenance_dcid,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding 'dc/base/' prefix breaks support for non-base Data Commons instances where self.is_base_dc is False. Use the dynamic provenance_dcid_expr variable instead.

Suggested change
CONCAT('dc/base/', raw.import_name) as provenance_dcid,
{provenance_dcid_expr} as provenance_dcid,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant