Skip to content

Use ConcurrentHashMap for DocumentContainer parser registries#285

Merged
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:documentcontainer-registry-race
Jun 12, 2026
Merged

Use ConcurrentHashMap for DocumentContainer parser registries#285
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:documentcontainer-registry-race

Conversation

@rootvector2

Copy link
Copy Markdown
Contributor

Thanks for your contribution to Apache Commons!

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body.

DocumentContainer keeps two static registries, parsers and parserClasses, as plain HashMap. registerXMLParser mutates them with put, and getParser (reached through getValue then parseXML) mutates parsers with computeIfAbsent, none of it synchronized, while the rest of the library already uses ConcurrentHashMap for the same kind of registry (JXPathIntrospector, ValueUtils). When two threads first parse a given model concurrently, or one thread registers a parser while another parses, they race on the shared map. I hit a ConcurrentModificationException thrown from HashMap.computeIfAbsent in getParser, and on other runs a corrupted map spinning forever in resize/treeify inside registerXMLParser at 100% CPU. Switching both fields to ConcurrentHashMap lines them up with the other registries and drops the race. The added test reproduces it: it fails on the current code with the ConcurrentModificationException and passes with the change.

the static parsers/parserClasses maps were plain HashMaps mutated by getParser and registerXMLParser without synchronization, so concurrent first-time parsing raced in computeIfAbsent/put; match the ConcurrentHashMap convention used by the other registries.
@garydgregory garydgregory changed the title use ConcurrentHashMap for DocumentContainer parser registries Use a ConcurrentHashMap for DocumentContainer parser registries Jun 12, 2026
@garydgregory garydgregory changed the title Use a ConcurrentHashMap for DocumentContainer parser registries Use ConcurrentHashMap for DocumentContainer parser registries Jun 12, 2026
@garydgregory garydgregory merged commit e88baf6 into apache:master Jun 12, 2026
19 checks passed
@garydgregory

Copy link
Copy Markdown
Member

@rootvector2
Merged, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants