fix: enforce max_workers in LLMMetadataExtractor.run_async by etairl · Pull Request #11248 · deepset-ai/haystack

etairl · 2026-05-04T17:13:31Z

Summary

LLMMetadataExtractor.run_async acquires its asyncio.Semaphore once around the outer gather(...) instead of inside each task, so max_workers has no effect and every prompt in a batch fires its LLM call simultaneously.
The docstring on __init__ advertises max_workers as "the maximum number of requests that should be allowed to run concurrently when using the run_async method", so the current behavior silently breaks that contract and can blow up LLM-provider rate limits / connection pools on large batches.
Fix is to move the async with sem: into a per-task wrapper coroutine so the limit is actually enforced. Added a regression test that verifies peak in-flight calls stay <= max_workers.

Before

sem = Semaphore(max(1, self.max_workers))
async with sem:
    results = await gather(*[self._run_async(prompt) for prompt in all_prompts])

After

sem = Semaphore(max(1, self.max_workers))

async def _bounded_run(prompt: ChatMessage | None) -> dict[str, Any]:
    async with sem:
        return await self._run_async(prompt)

results = await gather(*[_bounded_run(prompt) for prompt in all_prompts])

Test plan

hatch run test:unit -k test_llm_metadata_extractor passes (includes new test_run_async_respects_max_workers).
CI green.

The asyncio.Semaphore intended to bound concurrent LLM calls was acquired once around the outer gather(...) call instead of inside each task, so max_workers had no effect in run_async and all batched LLM requests fired simultaneously. Move the semaphore acquisition into a per-task wrapper so the documented concurrency cap is honored.

vercel · 2026-05-04T17:13:38Z

@etairl is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

bogdankostic

Thanks for the PR @etairl, looks already good in general! Please just make sure to use double backticks in the release note for in-line code.

etairl · 2026-05-12T09:18:31Z

Thanks for reviewing. Fixed.

github-actions · 2026-05-12T11:07:54Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
haystack/components/extractors
llm_metadata_extractor.py
Project Total

_{This report was generated by python-coverage-comment-action}

bogdankostic

Thanks @etairl! :)

etairl requested a review from a team as a code owner May 4, 2026 17:13

etairl requested review from bogdankostic and removed request for a team May 4, 2026 17:13

github-actions Bot added topic:tests type:documentation Improvements on the docs labels May 4, 2026

bogdankostic requested changes May 11, 2026

View reviewed changes

docs: use double backticks for inline code in release note

6061dd0

etairl requested a review from bogdankostic May 12, 2026 09:18

sjrl assigned bogdankostic May 12, 2026

Merge branch 'main' into fix/llm-metadata-extractor-async-semaphore

28d0e7d

bogdankostic approved these changes May 12, 2026

View reviewed changes

bogdankostic merged commit 50b2141 into deepset-ai:main May 12, 2026
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce max_workers in LLMMetadataExtractor.run_async#11248

fix: enforce max_workers in LLMMetadataExtractor.run_async#11248
bogdankostic merged 3 commits into
deepset-ai:mainfrom
etairl:fix/llm-metadata-extractor-async-semaphore

etairl commented May 4, 2026

Uh oh!

vercel Bot commented May 4, 2026

Uh oh!

bogdankostic left a comment

Uh oh!

etairl commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

bogdankostic left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

etairl commented May 4, 2026

Summary

Before

After

Test plan

Uh oh!

vercel Bot commented May 4, 2026

Uh oh!

bogdankostic left a comment

Choose a reason for hiding this comment

Uh oh!

etairl commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

bogdankostic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 12, 2026 •

edited

Loading