Skip to content

Replace renderer's in-tree PDF parser with openpdf-core#1560

Open
andreasrosdalw wants to merge 6 commits into
LibrePDF:masterfrom
andreasrosdalw:pdf-renderer-openpdf-core
Open

Replace renderer's in-tree PDF parser with openpdf-core#1560
andreasrosdalw wants to merge 6 commits into
LibrePDF:masterfrom
andreasrosdalw:pdf-renderer-openpdf-core

Conversation

@andreasrosdalw
Copy link
Copy Markdown
Contributor

@andreasrosdalw andreasrosdalw commented May 8, 2026

#1559 Replace renderer's in-tree PDF parser with openpdf-core

Wire openpdf-renderer onto openpdf-core (PdfReader / PdfContentParser) for all PDF parsing, and add OpenPdfCoreRenderer as the recommended entry point.

  • Add openpdf-core dependency to openpdf-renderer.
  • OpenPdfCoreRenderer: back getPageSize/Rotation, metadata, page text, decoded page content and content-stream operators with openpdf-core; add Path constructor, idempotent close, argument validation and closed-state checks.
  • Implement renderPage via OpenPdfCorePageRenderer, which walks the content stream with PdfContentParser and dispatches PDF operators (graphics state, paths, DeviceGray/RGB colors, text objects) to Java2D. Fix rotation=0 transform and clean up font/operand handling.
  • Deprecate the legacy in-tree parser: PDFFile, PDFPage, PDFParser and decode.PDFDecoder (kept for one release for migration).
  • Update README with migration guide, supported-operator matrix and new OpenPdfCoreRenderer examples; replace legacy quick-start snippets.
  • Tests: cover Path/file/byte[] constructors, metadata null-key, decoded page bytes, operator listing, and verify renderPage produces non-blank output; resolve fixture URL via toURI() for spaced paths.

Note that this is a "research" and innovation time project where I have spent time trying to improve OpenPDF using AI tools such as Claude and Copilot.

Your real name

Andreas Røsdal

Wire openpdf-renderer onto openpdf-core (PdfReader / PdfContentParser)
for all PDF parsing, and add OpenPdfCoreRenderer as the recommended
entry point.

- Add openpdf-core dependency to openpdf-renderer.
- OpenPdfCoreRenderer: back getPageSize/Rotation, metadata, page text,
  decoded page content and content-stream operators with openpdf-core;
  add Path constructor, idempotent close, argument validation and
  closed-state checks.
- Implement renderPage via OpenPdfCorePageRenderer, which walks the
  content stream with PdfContentParser and dispatches PDF operators
  (graphics state, paths, DeviceGray/RGB colors, text objects) to
  Java2D. Fix rotation=0 transform and clean up font/operand handling.
- Deprecate the legacy in-tree parser: PDFFile, PDFPage, PDFParser and
  decode.PDFDecoder (kept for one release for migration).
- Update README with migration guide, supported-operator matrix and new
  OpenPdfCoreRenderer examples; replace legacy quick-start snippets.
- Tests: cover Path/file/byte[] constructors, metadata null-key, decoded
  page bytes, operator listing, and verify renderPage produces
  non-blank output; resolve fixture URL via toURI() for spaced paths.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 8, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 149 complexity · 0 duplication

Metric Results
Complexity 149
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

andreasrosdalw and others added 5 commits May 8, 2026 09:21
- README: convert setext H1 to ATX for consistent heading style throughout
- README: wrap long lines (105→≤100 chars, 284→≤120 chars)
- OpenPdfCorePageRenderer: expand single-line Javadoc to multi-line format
- OpenPdfCorePageRenderer: suppress PMD.SingularField on path-state fields
  (pathStartX/Y, pathCurX/Y must be instance fields to survive across
  consecutive dispatch() calls in the parsing loop)
- OpenPdfCoreRendererTest: rename methods to camelCase (no underscores)
  to satisfy JUnit 5 method naming rule [a-z][a-zA-Z0-9]*

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. q/Q now saves and restores the g2 AffineTransform alongside GState,
   so cm operators inside a q/Q pair are properly rolled back on Q.
   A separate ctmStack (Deque<AffineTransform>) mirrors stateStack.

2. decodeString no longer calls raw.getBytes() twice; the byte array
   is captured once in a local variable before passing to font.decode().

3. The catch(RuntimeException) in processContent now logs the skipped
   operator and exception at Level.FINE via java.util.logging, making
   malformed-PDF rendering failures diagnosable instead of silent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes the one remaining Codacy CodeStyle minor issue (Checkstyle
NewlineAtEndOfFile / trailing empty lines) introduced in the previous
commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codacy/Checkstyle flagged README.md line 3 at 320 chars (max 120).
Split into four shorter lines; rendered Markdown is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant