feat: Checkboxes and Radio Button Mapping for PDFs (GSoC)#407
Conversation
Implemented PDF /Btn dictionary parsing in filler.py to extract and dynamically map truthy LLM outputs to their specific 'ON' Appearance Mode instead of blindly appending strings. Also resolved broken backend pipeline in main.py by initializing the base Controller instead of the removed Fill class.
|
Hi @marcvergees @juanalvv! I've resolved the merge conflicts in Makefile and src/main.py. The PR is now up to date with the latest main branch and ready for review/merge. It includes the fixes for #135 and #380, along with the PDF checkbox/radio mapping feature. Thanks! |
marcvergees
left a comment
There was a problem hiding this comment.
There's something you should need to care about. You've just implemented the fact that if the output of the LLM is something like "yes", "true", etc, and if the fillable field is a radiobutton, then you active it. But how you that the LLM is gonna return something like this? There should be something in the pipeline and the dict that gives all fields regarding booleans. E.g. the fields dict is {"is awake": boolean}, and tell the LLM to give a boolean specifically True or False, that way, we could verify that the field filled would be something like that. Please fix that and we'll be able to test and merge it.
Per maintainer feedback in fireform-core#407: - The fields dict now accepts Python types as values (e.g. {'is awake': bool}) - build_prompt() detects bool fields and explicitly instructs the LLM to return only the literal string 'True' or 'False', not fuzzy values - add_response_to_json() strictly coerces LLM output to Python bool for bool fields, logging a warning if an unexpected value is returned - filler.py now uses isinstance(answer, bool) instead of string matching so only a guaranteed Python True activates a checkbox/radio button - Updated example in main.py to demonstrate the new typed fields dict
|
Hi @marcvergees! Thanks for the clear feedback. I've addressed your concern in the latest commit ( The fields dict now carries Python type annotations (e.g. |
|
Hi @marcvergees! I've updated the PR. I pulled in the latest changes from main to resolve the merge conflicts and dropped my old redundant fixes for #135 and #380 (since you guys already took care of those!). The PR is now strictly focused on the GSoC Checkbox and Radio Button mapping feature, and it incorporates your feedback regarding strict boolean typing. It's fully up to date and ready for you to take another look whenever you have a moment. Thanks! |
This PR implements Checkboxes and Radio Button Mapping (Deliverable 1 from my GSoC Proposal).
Advanced PDF Form Mapping (
src/filler.py)filler.pyindiscriminately assigned the LLM string output to the Value (/V) property of all Widget types. This breaks when a PDF contains checkboxes or radio buttons (Field Type/Btn).sorted_annots:/Btn).yes,true,1,x).ONstate identifier directly from the PDF Appearance Dictionary (annot.AP.N)./V) and Appearance State (/AS) to accurately check the box on the final PDF layout, while assigning/Offwhen falsy.Strict Boolean Typing (per maintainer feedback)
fieldsdict now accepts Python types as values (e.g.{"is awake": bool})build_prompt()detectsboolfields and explicitly instructs the LLM to return only the literal string 'True' or 'False'.add_response_to_json()strictly coerces LLM output to Pythonboolfor boolean fields.filler.pyusesisinstance(answer, bool)to activate a checkbox/radio button.(Note: The previous redundant fixes for #135 and #380 were dropped via a merge with
upstream/main)