Skip to content

fix lang() to match whole language subtags, not any prefix#287

Open
rootvector2 wants to merge 1 commit into
apache:masterfrom
rootvector2:lang-subtag-match
Open

fix lang() to match whole language subtags, not any prefix#287
rootvector2 wants to merge 1 commit into
apache:masterfrom
rootvector2:lang-subtag-match

Conversation

@rootvector2

Copy link
Copy Markdown
Contributor

Thanks for your contribution to Apache Commons!

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body.

NodePointer.isLanguage and its DOMNodePointer/JDOMNodePointer overrides test the language with String.startsWith, so lang() matches any leading character prefix of the node's xml:lang (or the locale name) instead of a whole subtag. I noticed it reading the lang() predicate path: a node with xml:lang="fr" satisfies lang('f'), and a locale of en-US satisfies lang('e') and lang('en-'), all of which must be false per XPath 1.0 section 4.3, which matches only when the argument equals the tag or a subtag delimited by -. Since lang() is used in predicates to select nodes, the over-match changes which nodes a filter returns for caller-supplied xml:lang data. The fix requires an exact case-insensitive match or a lang- prefix, in one shared helper used by all three implementations. The added assertions fail on the current code (lang('f') and lang('e') return true) and pass with the change.

isLanguage matched with String.startsWith, so lang('f') matched xml:lang="fr" and lang('e') matched an en-US locale; XPath 1.0 section 4.3 requires the argument to equal the tag or a subtag delimited by '-'. Apply that rule in one NodePointer helper shared by the DOM and JDOM overrides.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant