Skip to content

fix: preserve case-sensitive URL parts#2017

Closed
roian6 wants to merge 1 commit into
apify:masterfrom
roian6:fix/normalize-url-case-sensitive-2008
Closed

fix: preserve case-sensitive URL parts#2017
roian6 wants to merge 1 commit into
apify:masterfrom
roian6:fix/normalize-url-case-sensitive-2008

Conversation

@roian6

@roian6 roian6 commented Jul 3, 2026

Copy link
Copy Markdown

Description

  • Preserve path, query-name, query-value, and fragment casing in normalize_url while still relying on yarl to canonicalize scheme/host casing.
  • Update the normalization docstring so it no longer implies all URL casing is trivial.
  • Add regressions showing default unique keys no longer collide for case-distinct paths or query components.

Issues

Testing

  • uv run pytest tests/unit/_utils/test_requests.py -q
  • uv run poe lint
  • uv run poe type-check
  • git diff --check

I also attempted uv run poe unit-tests. After installing Playwright browsers, the suite reached 2152 passed, 6 skipped but still failed in tests/unit/browsers/test_stagehand_browser_plugin.py because the local environment lacks the Stagehand SEA binary (stagehand-linux-arm64). That failure appears unrelated to this URL normalization change.

Checklist

  • CI passed

Comment thread tests/unit/_utils/test_requests.py Outdated
assert output == expected_output


def test_compute_unique_key_preserves_case_sensitive_path_and_query() -> None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please write it as a parametrized test with 1 assert and 3 sets of parameters and ids explaining a specific test case (path_key, query_value_key, query_name_key)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated the regression to a parametrized test with one assert and the requested ids: path_key, query_value_key, and query_name_key.

I also rebased onto the current origin/master and re-ran:

  • uv run pytest tests/unit/_utils/test_requests.py -q (20 passed)
  • uv run poe lint
  • uv run poe type-check
  • git diff origin/master --check

The Semgrep PR check is passing on the updated head 7108d4ba45093e60a1166676ec84e65a1cc654cb as well.

@roian6 roian6 force-pushed the fix/normalize-url-case-sensitive-2008 branch from a3080be to 7108d4b Compare July 3, 2026 10:13
@vdusek

vdusek commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Hi @roian6, thanks for the PR. However, this issue is currently milestoned for v2 because the fix would introduce breaking changes to how requests are represented in storages.

Because of that, this needs to be handled carefully as part of a major version update, along with an appropriate upgrade guide. I'm going to close this for now.

@vdusek vdusek closed this Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

normalize_url lowercases the entire URL, silently deduplicating case-distinct pages

4 participants