Skip to content

fix(email): download all attachments instead of only the first#2079

Merged
chubes4 merged 2 commits into
mainfrom
fix/2078-email-multi-attachment
May 18, 2026
Merged

fix(email): download all attachments instead of only the first#2079
chubes4 merged 2 commits into
mainfrom
fix/2078-email-multi-attachment

Conversation

@chubes4
Copy link
Copy Markdown
Member

@chubes4 chubes4 commented May 18, 2026

Summary

Fixes #2078 — silent data loss when an IMAP message has more than one attachment.

The bug

FetchEmailAbility::fetchMessage() hard-coded \$attachments[0] when calling downloadAttachment(), so only the first attachment was ever written to disk. Every additional attachment was silently dropped:

  • No warning surfaced by the CLI
  • No ERROR or WARNING row in the logs
  • The returned metadata.attachment_count correctly reported the upstream count (2, 3, etc.) — making it look like everything worked

Reproducer (on extrachill.com production, against the email used to discover this):

wp datamachine email read 87801 --format=json
# metadata.attachment_count: 2

rm wp-content/uploads/datamachine-files/email-attachments/*
wp datamachine email fetch \
  --search='FROM "chrisgardner" SUBJECT "dani rucker"' \
  --max=5 --download-attachments
ls wp-content/uploads/datamachine-files/email-attachments/
# only 1 file written

The fix

  • Loop over all detected attachments instead of indexing [0].
  • Expose every successfully downloaded file under metadata.files so callers (CLI/JSON consumers, downstream abilities) can read past the first attachment.
  • Keep \$item['file_info'] pointing at the first attachment so the existing single-file_info-per-item contract enforced by FetchHandler::toDataPackets() continues to hold. Nothing in the generic pipeline contract changes.
  • Log a warning for each attachment whose download fails (empty body, decode failure, disk write failure) and surface the count via metadata.attachments_skipped so silent loss can no longer happen.
  • Include the resolved on-disk filename in each file_info struct so callers can distinguish multiple files inside metadata.files.

What did NOT change

  • The pipeline contract. FetchHandler::toDataPackets() still reads \$item['file_info'] as a single struct and produces one packet per item. Email continues to emit one item per message; the extra attachments simply become accessible via metadata.files instead of being lost.
  • Filename collision behavior. wp_unique_filename() already handled collisions correctly inside downloadAttachment(); the bug was never about collisions, it was about the call site only ever firing once.

Why one-item-per-email, not one-item-per-attachment

Each email has one subject, one body, and one logical content payload. Splitting into N items would duplicate the body text N times and break dedupe via item_identifier (which is the message-id). The N-attachments-on-one-item shape is consistent with how an email is modeled everywhere else in the system.

Test plan

On a live install with at least one multi-attachment email in the inbox:

# 1. Clear any prior downloads.
rm wp-content/uploads/datamachine-files/email-attachments/*

# 2. Fetch the multi-attachment email.
wp datamachine email fetch \\
  --search='FROM "<sender>" SUBJECT "<subject with multiple attachments>"' \\
  --max=1 --download-attachments --format=json

# 3. Verify metadata.files contains every attachment with distinct paths.
# 4. Verify the on-disk count matches metadata.attachment_count.
# 5. Verify metadata.attachments_skipped is 0 (or matches the count of any genuinely unreadable parts).

For a synthetic test, send yourself a 3-attachment email and walk through the same steps. All three files should land on disk; metadata.files should have length 3; file_info should equal metadata.files[0].

Logging changes

New WARNING entries get a context payload { uid, filename, part_number } so the operator can correlate them to the source email. Existing INFO/DEBUG output is unchanged.

Closes

Closes #2078

The IMAP fetch path silently dropped every attachment after the first
because downloadAttachment was hard-coded to index [0]. Email messages
with N>1 attachments lost (N-1) files with no warning logged.

Fix:
- Loop through every detected attachment and download each one.
- Expose all successfully downloaded files under metadata.files for
  callers that need access beyond the first.
- Keep file_info pointing at the first attachment so the existing
  FetchHandler pipeline contract (single file_info per item) is
  preserved.
- Log a warning for each attachment whose download fails (e.g. empty
  body, decode failure, disk write failure) and surface the count via
  metadata.attachments_skipped so silent loss is no longer possible.
- Include the on-disk filename in the file_info struct so callers can
  distinguish multiple files in metadata.files.
@homeboy-ci
Copy link
Copy Markdown
Contributor

homeboy-ci Bot commented May 18, 2026

Homeboy Results — data-machine

Lint

lint — failed

  • other — 2 finding(s)
  • Total: 2 finding(s)

ℹ️ Auto-fix: homeboy lint data-machine --path /home/runner/work/data-machine/data-machine --changed-since 26686af --fix (or homeboy refactor data-machine --path /home/runner/work/data-machine/data-machine --changed-since 26686af --from lint --write)
ℹ️ Some issues may require manual fixes
ℹ️ Full options: homeboy docs commands/lint
Deep dive: homeboy lint data-machine --changed-since 26686af

Test

test — passed

  • 915 passed
  • 3 skipped

ℹ️ Auto-fix lint issues: homeboy refactor data-machine --from lint --write
ℹ️ Collect coverage: homeboy test data-machine --coverage
ℹ️ Save test baseline: homeboy test data-machine --baseline
ℹ️ Pass args to test runner: homeboy test -- [args]
ℹ️ Full options: homeboy docs commands/test
Deep dive: homeboy test data-machine --changed-since 26686af

Audit

audit — passed

  • requested_detectors — 15 finding(s)
  • intra-method-duplication — 3 finding(s)
  • test_coverage — 1 finding(s)
  • Total: 19 finding(s)

Deep dive: homeboy audit data-machine --changed-since 26686af

Tooling versions
  • Homeboy CLI: homeboy 0.182.0+d8a02083
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: dd47f26a
  • Action: unknown@unknown

MultipleStatementAlignment warnings on the new variable declarations
introduced in the previous commit. Cosmetic; no behavior change.
@chubes4 chubes4 merged commit 8cdff5d into main May 18, 2026
4 of 5 checks passed
@chubes4 chubes4 deleted the fix/2078-email-multi-attachment branch May 18, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Email fetch with --download-attachments silently drops attachments when multiple exist

1 participant