Releases: thiswillbeyourgithub/wdoc
Release 5.1.3
What's new
PLACEHOLDER
Commits details since the last release
- [54b243d] by @thiswillbeyourgithub, 46 seconds ago:
bump version 5.1.2 -> 5.1.3
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [6cea6b0] by @thiswillbeyourgithub, 2 hours ago:
feat: fall back to WHISPER_API_KEY for whisper transcription
When neither WDOC_WHISPER_API_KEY nor OPENAI_API_KEY is set, resolve the
whisper API key from a WHISPER_API_KEY environment variable if present.
Resolution order: WDOC_WHISPER_API_KEY > OPENAI_API_KEY > WHISPER_API_KEY.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
wdoc-skill/REFERENCE.md
wdoc/docs/help.md
wdoc/utils/loaders/shared_audio.py
- [79cacbd] by @thiswillbeyourgithub, 2 hours ago:
bump version 5.1.1 -> 5.1.2
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [f2d088a] by @thiswillbeyourgithub, 2 hours ago:
pin some audio libraries
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [8c25359] by @thiswillbeyourgithub, 2 hours ago:
fix: playwright.sync
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
wdoc/utils/loaders/online_media.py
Release 5.1.1
What's new
PLACEHOLDER
Commits details since the last release
- [6feb5f1] by @thiswillbeyourgithub, 32 minutes ago:
bump version 5.1.0 -> 5.1.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [61c040a] by @thiswillbeyourgithub, 46 minutes ago:
fix: unstrucutred version is now incompatible with python 3.13
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [ed81e2a] by @thiswillbeyourgithub, 7 days ago:
docs: restructure SKILL.md into the wdoc-skill/ skill directory
Split the monolithic root SKILL.md into a Claude Code skill directory
following the write-a-skill conventions: - wdoc-skill/SKILL.md: concise orientation with frontmatter + trigger
description, quick start, the four tasks, and core mechanics (73 lines) - wdoc-skill/REFERENCE.md: full CLI args, env vars, filetypes, loader
options, and the complete Python API tables - wdoc-skill/EXAMPLES.md: copy-pasteable shell and Python recipes
Polished wording, added the skill frontmatter, and removed em-dashes from
the reference content. Updated every reference to the old root SKILL.md
(bump_default_models.sh, README.md, CLAUDE.md, ARCHITECTURE.md) and the
Claude Code install instructions to the new three-file layout.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
ARCHITECTURE.md
CLAUDE.md
README.md
bump_default_models.sh
wdoc-skill/EXAMPLES.md
wdoc-skill/REFERENCE.md
wdoc-skill/SKILL.md
- [af68f46] by @thiswillbeyourgithub, 9 days ago:
docs: note how self-contained adding a new filetype is
Add a compact lead-in to CLAUDE.md's "Adding Support for a New Filetype"
section describing the two routes (standalone loader vs recursive fan-out)
and citing the recent zotero/karakeep types, plus a short note in SKILL.md
after the recursive filetypes table pointing contributors at CLAUDE.md.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
CLAUDE.md
SKILL.md
-
[a5473f4] by @thiswillbeyourgithub, 9 days ago:
fix(summarize): dump LLM output on empty content, fix recursion dedup
Follow-up to the previous empty-summary fix, addressing review feedback. -
Empty message content: instead of retrying (which just reproduces the
same blank when the cause is structural), raise a ValueError that dumps
every attribute of the LLM generations and llm_output. For some
reasoning models litellm leaves the messagecontentempty and puts the
answer inadditional_kwargs['reasoning_content'], so the dump reveals
when we are simply reading the wrong field
(generations[0].text==message.content). -
Recursion dedup bug:
if summary_text not in recursive_summariestested
membership against the dict's integer level keys, so it was always True
and forced recursion to stop after the first pass. Now compares against
recursive_summaries.values(), which also makes the "identical summary"
warning accurate. -
doc_reading_length: default to 0.0 (float) instead of 0 (int) when a
doc has nodoc_reading_time, otherwise wdocSummary construction crashes
under beartype (WDOC_TYPECHECKING=crash).
Tests: the empty-content test now asserts the generation dump is included;
adds a recursion test proving summarization continues past the first pass
(fails as 2 != 3 with the old predicate).
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
tests/test_wdoc.py
wdoc/utils/tasks/summarize.py
- [7eb0a61] by @thiswillbeyourgithub, 9 days ago:
fix(summarize): retry then raise on empty LLM completion
A reasoning model can return an empty completion (all of its budget spent
on reasoning tokens) with finish_reason "stop"._summarizepreviously
turned that into a silently empty summary, which only surfaced much later
as a confusingassert 'monkey' in ''in
test_summary_tim_urban_default_model.
Now each chunk generation retries once while bypassing the cache on an
empty completion, and raises an explicit ValueError if it is still empty.
Token usage from every attempt is accumulated so the cost accounting
stays correct.
Adds a basic regression test that drives _summarize with a fake LLM
returning empty content and asserts the explicit error is raised.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
tests/test_wdoc.py
wdoc/utils/tasks/summarize.py
- [10c3b40] by @thiswillbeyourgithub, 9 days ago:
docs: drop em-dashes introduced in the SKILL.md sync edits
Replace the em-dashes I added (CLAUDE.md SKILL.md bullet, SKILL.md
install comments) with colons, per the project's no-em-dash convention.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
CLAUDE.md
SKILL.md
- [85289a9] by @thiswillbeyourgithub, 9 days ago:
docs(skill): document the citation_url_template argument
It was a real but undocumented CLI/constructor argument (turns page
citations into clickable links in summaries). Add it to the Summary
Arguments table and the Python constructor signature.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
SKILL.md
- [8db96a1] by @thiswillbeyourgithub, 9 days ago:
docs(skill): document youtube subtitle changes and kebab-case flags
Reflect changes since the reference was written: --youtube_language now
auto-detects the original-language (-orig) subtitle track when unset,
--yt_* flags are rewritten to --youtube_*, and CLI flags accept
kebab-case (dashes normalized to underscores).
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
SKILL.md
- [65b619c] by @thiswillbeyourgithub, 9 days ago:
docs(skill): document the zotero and karakeep filetypes
Both were added since the reference was last written. Add them to the
Recursive Filetypes table, document their loader-specific arguments and
--path selector syntax in the DocDict section, and add shell examples.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
SKILL.md
- [f1b2693] by @thiswillbeyourgithub, 9 days ago:
docs(skill): bump reference to v5.1.0 and list modular install extras
The reference was pinned to v5.0.0; the repo is now v5.1.0. Also expand
the Installation section to document the modular extras introduced when
install_requires was split (youtube/audio/anki/office/logseq/zotero/
karakeep/...), since the base package only ships pdf + url loaders.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
SKILL.md
- [36db392] by @thiswillbeyourgithub, 9 days ago:
docs(claude): require keeping SKILL.md in sync with user-facing changes
SKILL.md is a hand-maintained comprehensive reference that drifts unless
updated by hand. Add it to the "Adding New Settings" and "Adding Support
for a New Filetype" checklists and add a dedicated "Keeping SKILL.md in
Sync" section so future changes update it.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
CLAUDE.md
- [43e7e8b] by @thiswillbeyourgithub, 10 days ago:
test(run_all_tests): validate MISTRAL_API_KEY before the suite runs
test_mistral_embeddings hardcoded "mistral/mistral-embed", so the
existing_check_provider_key "mistral/" MISTRAL_API_KEYguard in
run_all_tests.sh never matched anything and the key was never checked
up-front: a missing MISTRAL_API_KEY only surfaced late, mid-suite.
Make the embed model overridable via WDOC_TEST_MISTRAL_EMBED_MODEL
(default mistral/mistral-embed, full id incl. provider) and add it to
_ALL_TEST_MODELS so the guard now matches and crashes early.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
tests/run_all_tests.sh
tests/test_wdoc.py
- [1d4ea7a] by @thiswillbeyourgithub, 10 days ago:
test(karakeep): pad the roundtrip bookmark text past the reading-length guard
A bare marker string is too short and trips wdoc's 'total reading length
is suspiciously low' assertion (> 0.1 min). Repeat a deterministic
sentence around the unique marker so the loaded doc clears the guard
while the assertions still match on the marker.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
tests/test_karakeep.py
-
[14cb323] by @thiswillbeyourgithub, 10 days ago:
Merge branch 'karakeep_source' into dev -
[3ce4a93] by @thiswillbeyourgithub, 11 days ago:
test(karakeep): make the api test self-contained via a bookmark lifecycle
Instead of depending on a pre-existing library and a KARAKEEP_TEST_SELECTOR env
var, the api test now creates its own temporary text bookmark (deterministic,
unlike a link whose html is crawled asynchronously), loads it through wdoc's
loader, asserts the round-trip, then deletes it (mirrors karakeep_python_api's
own create/delete lifecycle tests). It also asserts the live bookmark carries
the structural keys the FakeKarakeep fixtures encode, so the fast basic tests
cannot silently drift from the real schema. Drop the now-unneeded
KARAKEEP_TEST_SELECTOR guard from run_all_tests.sh.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
tests/run_all_tests.sh
tests/test_karakeep.py
- [b3add40] by @thiswillbeyourgithub, 11 days ago:
docs(karakeep): document the karakeep filetype
Add the karakeep filetype, its --path selector syntax and karakeep_* args to
help.md, usage examples to examples.md, and short mentions to README.md and
ARCHITECTURE.md. Built with the help of Claude Code.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
ARCHITECTURE.md
README.md
wdoc/docs/examples.md
wdoc/docs/help.md
- [8ac80fe] by @thiswillbeyourgithub, 11 days ago:
test(karakeep): guard api creds and run karakeep suites in run_all_tests
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
tests/run_all_tests.sh
- [5de76cb] by @thiswillbeyourgithub, 11 days ago:
test(karakeep): cover selector parsing, fan-out, resilience and caching
A fake Karakeep client drives the basic tests (no network): selector parsing,
link/text/asset fan-out, the native-source skip path, the loading_failure=warn
resilience guarantee, and the doc_loaders_cache hit. Adds an api-gated real
instance test driven by KARAKEEP_TEST_SELECTOR.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
tests/test_karakeep.py
- [02df505] by @thiswillbeyourgithub, 11 day...
Release 5.1.0
What's new
This release focuses on modular installation extras, CLI robustness improvements, and a sweep of bug fixes across summarization, logging, and setup.
✨ Features
- CLI: Accept kebab-case flags (
--foo-bar→--foo_bar) automatically ([e9bfb80]) - CLI: Warn on every
sys.argvmutation viaArgvState([4fe38f2]); accept--yt_*as shorthand for--youtube_doc_*([f800805]) - YouTube: Auto-detect original-language subtitle track (
-orig), falling back toen/en-US([0753b00]) - Prompts: Skip per-bullet citations when only one source; mention it once at the top instead ([c0097b7])
- Exception: for YouTube/timecoded sources, use per-bullet timecodes (e.g.
[02:17:33]) ([c0779a1])
- Exception: for YouTube/timecoded sources, use per-bullet timecodes (e.g.
🐛 Fixes
- Summarize: Strip
*DEEP BREATH*-style LLM intro artifacts from all top-level bullets, not just the first ([dd09942], [c837143]) - Summarize: Fix model name in output summary ([e253d45])
- Logger: Actually remove the default DEBUG stderr handler instead of stacking a second sink on top of it ([2f3b295])
- Env: Match
--debug/--verboseby exact argv token, not substring, preventing false positives from argument values ([e183729]) - Loaders: Better check for empty documents ([5ebbbd2], [586bc5c])
- YouTube: Add troubleshooting instructions on failed extraction ([397d133]); fix default language handling ([29207e8])
- Audio: Fix
WDOC_WHISPER_API_KEYhandling whenOPENAI_API_KEYis unset ([23b6b1f]) - Setup: Guard
openparse-downloadbehind animport openparseprobe ([f4445b4]); scope yt-dlp pre-release upgrade to[youtube]users only ([9242566])
♻️ Refactors
- Setup: Split
install_requiresinto modular extras[youtube],[audio],[anki],[office],[logseq],[full]([7bb4744]) - CLI: Centralize all
sys.argvmutations inArgvStatehelper class ([1098157], [a22f56d]) - Logger: Move handler setup out of import side-effects into
setup_cli_logging(), called only from__main__.py([203ab6f])
🧪 Tests
- Cover
ArgvStatehelpers with unit tests ([6bc897a], [a22f56d]) - Move API-key precheck from
test_wdoc.pytorun_all_tests.shfor faster fail ([dbf4410]) - Skip
test_parse_docxon HTTP 429 instead of failing ([a02d684]) - Improve venv management in
run_all_tests.sh([b6a0dd8])
📚 Docs
- Clarify
uvx wdoc[full]usage throughout README andexamples.md([0f72eaf], [7653e9a]) - Add/fix
[anki]extra in Anki parse example ([0f72eaf]) - Improve installation instructions recommending
uvx([d88c461]) - Clarify how to use a cloned repository ([b8b4b5e])
Commits details since the last release
- [3e5834a] by @thiswillbeyourgithub, 2 hours ago:
bump version 5.0.1 -> 5.1.0
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [dd09942] by @thiswillbeyourgithub, 25 minutes ago:
fix(summarize): clean LLM intro artifacts on all top-level bullets
Extract the 'deep breath' / "i'll summarize" cleanup into
_strip_llm_intro_artifacts and run it on every top-level line, not just
the first one. Previously a source reference on line 1 would leave a
later deep-breath bullet untouched.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
wdoc/utils/tasks/summarize.py
- [c2c2ae5] by @thiswillbeyourgithub, 84 minutes ago:
add done todo
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [b6a0dd8] by @thiswillbeyourgithub, 89 minutes ago:
test: improved run_all_test.sh venv management
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [a02d684] by @thiswillbeyourgithub, 2 hours ago:
test(parsing): skip test_parse_docx on HTTP 429 instead of failing
The test downloads a sample DOCX from freetestdata.com, which sometimes
returns 429 (rate limited). That is not a wdoc bug, so skip rather than
fail in that case.
tests/test_parsing.py
- [c0779a1] by @thiswillbeyourgithub, 2 hours ago:
feat(prompts): use timecodes as per-bullet source for YouTube single-source
Extends the single-source citation exception: when the unique source is a
YouTube video (or other timecoded media), don't drop citations entirely.
Mention the video source once at the top, then use each bullet's timecode
(e.g. [02:17:33]) as its precise per-bullet pointer. Applied to both the
summary prompt (Sam) and the combine prompt (Carl).
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
wdoc/utils/prompts.py
- [c0097b7] by @thiswillbeyourgithub, 2 hours ago:
feat(prompts): skip per-bullet citations when only one source
Avoids wasting tokens by repeating the same page/WDOC_ID citation on every
bullet point when all information shares a single unique source. In that
case the citation is mentioned once at the top instead. Applies to both
the summary prompt (Sam) and the combine prompt (Carl).
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
wdoc/utils/prompts.py
- [c837143] by @thiswillbeyourgithub, 2 hours ago:
fix(summarize): strip "DEEP BREATH -" style prefixes from first line
Permissive on asterisks, "breath"/"breaths", and the separator character
so variants like "- DEEP BREATH - ", "DEEP BREATHS: ", "DEEP BREATH, "
are handled while preserving the bullet marker.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
wdoc/utils/tasks/summarize.py
- [e253d45] by @thiswillbeyourgithub, 2 hours ago:
fix: model name in output summary
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/summarize.py
- [c67f564] by @thiswillbeyourgithub, 2 hours ago:
docs(setup): note nltk punkt_tab download is likely redundant
unstructured already lazily downloads punkt_tab on first tokenize call,
so the eager post-install download is probably duplicate work. Keep it
as a safety net (and to front-load the network hit at install time
instead of on the first office-document parse), but document it.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
setup.py
- [f4445b4] by @thiswillbeyourgithub, 2 hours ago:
fix(setup): only run openparse-download when openparse is installed
Guard the post-install weight download with animport openparseprobe
so a stripped-down install (no openparse[ml] in install_requires) does
not call a missing console-script and emit a confusing error.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
setup.py
- [9242566] by @thiswillbeyourgithub, 2 hours ago:
fix(setup): scope yt-dlp pre-release upgrade to [youtube] users
yt-dlp lives in the optional [youtube] extra, but the post-install hook
was force-installing it for everyone (with--user, which is wrong
inside a venv and quietly drops the install outside the env). Probe for
yt_dlp first and only run thepip install -U --pre yt-dlpif it's
already there. This keeps yt-dlp truly optional while still letting
[youtube] users track YouTube extractor fixes that land in pre-releases.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
setup.py
- [eb9eba4] by @thiswillbeyourgithub, 2 hours ago:
refactor(setup): declare audioop-lts via the [audio] extra
Move the audioop-lts 3.13+ install out of the imperative post-install
hook and into the[audio]extra with apython_version>='3.13'
environment marker. audioop-lts is only needed because pydub needs it,
and pydub already lives in[audio], so the conditional belongs there.
This also makes the dependency visible to non-python setup.py install
installers (pip install wdoc[audio], uv, pipx, etc.) which never ran
the post-install hook in the first place.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
setup.py
- [bafb379] by @thiswillbeyourgithub, 2 hours ago:
chore(setup): drop python-magic git install from post-install
The git install existed to get the FIFO/pipe fix from upstream PR for
issue #261, used viamagic.from_bufferon stdin bytes. That code path
is commented out in batch_file_loader.py, and the two remaining call
sites (magic.from_filein batch_file_loader.py and pdf.py) work fine
with the released 0.4.27 wheel on PyPI. Both call sites are already
wrapped in try/except, so python-magic stays optional at runtime.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
setup.py
- [203ab6f] by @thiswillbeyourgithub, 2 hours ago:
refactor(logger): move handler setup out of import side effects
When wdoc was imported as a library (e.g. as an open-webui tool),
wdoc/utils/logger.py mutated the global loguru logger at import time:
removing the default stderr sink and adding its own stdout/stderr/file
sinks. That clobbered the host application's loguru configuration.
Wrap the handler installation in a setup_cli_logging() function that
is called explicitly from wdoc/main.py. Library users get whatever
loguru handlers the host already configured (since loguru is a
singleton, wdoc's records will flow through them automatically); CLI
users get the customized colorized stdout/stderr plus the rotated
file log.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
wdoc/init.py
wdoc/main.py
wdoc/utils/logger.py
wdoc/wdoc.py
- [b8b4b5e] by @thiswillbeyourgithub, 2 hours ago:
doc: clarify how to use a cloned repository
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [dbf4410] by @thiswillbeyourgithub, 2 hours ago:
test(env): move API-key precheck from test_wdoc.py to run_all_tests.sh
Fails fast at the shell level before spinning up the venv and pytest,
rather than only when test_wdoc.py is imported.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
tests/run_all_tests.sh
tests/test_wd...
Release 5.0.1
What's new
feat
- Summary citations (
dcdecc86,6cbf025c,e5d7840b,9d7802aa)- Per-chunk metadata (page, source) injected as XML into summarization input
- LLM prompted to add
[p.N]citations; Python fallback adds them to uncited top-level bullets - Multi-file summaries use
[p.N, filename.pdf]format with shortest disambiguating path - New
citation_url_templateparameter turns page citations into clickable markdown links (e.g.{source}#page={page})
- Query output anchor links (
8f72df69): WDOC_ID citations now render as[N](#document-N)with HTML anchors for in-page navigation
change
- Default models switched to OpenRouter/DeepSeek (
14ac41e4,be3dced3)WDOC_DEFAULT_MODELandWDOC_DEFAULT_QUERY_EVAL_MODELnow default toopenrouter/deepseek/deepseek-v4-proandopenrouter/deepseek/deepseek-v4-flash- Routes through OpenRouter instead of calling the DeepSeek provider directly
fix
b555a904: Coerceinttofloatfor CLI kwargs type checking (fixes Gradio UI sending integer values for float parameters)4015a3a9: Better removal of "deep breath" mentions in summarization prompts
test
61c8238c,0ea3d13d,b34e2e3f: Crash early at import time when required API keys (OPENROUTER_API_KEY,OPENAI_API_KEY,MISTRAL_API_KEY,WDOC_WHISPER_API_KEY) are missing1d07163c: Warn and skip instead of crash when whisper test hits a 502 error8380d8ad: Skip ollama embedding test when the ollama port is unreachablee672e092: Better test cleanup inrun_all_tests.sh
add
72819fab:bump_default_models.shhelper script — dry-run by default,--applyto write; syncs model names across docs, README, SKILL, ARCHITECTURE, anddocker/env.example
doc
9d7802aa,ec09cf1e,71560800,c400738f,96f9ac23: CLAUDE.md and ARCHITECTURE.md updated with new settings documentation requirements,sphinx-apidoccommand,bump_default_models.shusage, and citation feature docs057bb75f: Added DeepWiki badge to README3f59670e: SVG updated to remove outdated default model reference"}
Commits details since the last release
- [0c74f20] by @thiswillbeyourgithub, 10 minutes ago:
bump version 5.0.0 -> 5.0.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [1d07163] by @thiswillbeyourgithub, 31 minutes ago:
test: warn instead of crash when whisper test hits 502 error
A 502 from the whisper endpoint means the upstream is unavailable, not
that the code under test failed. Skip the test in that case so the run
reports not-tested rather than a false negative.
tests/test_wdoc.py
- [b34e2e3] by @thiswillbeyourgithub, 51 minutes ago:
test: crash early if WDOC_WHISPER_API_KEY is not set
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
tests/test_wdoc.py
- [0ea3d13] by @thiswillbeyourgithub, 55 minutes ago:
test: also crash early if MISTRAL_API_KEY is missing for a mistral/ test model
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
tests/test_wdoc.py
- [61c8238] by @thiswillbeyourgithub, 7 hours ago:
test: crash early when required API key for a test model is missing
Check that OPENROUTER_API_KEY / OPENAI_API_KEY are defined when any of the
test models (or their default-model fallbacks) starts with 'openrouter/' or
'openai/'. Fails fast at import time with a clear message instead of
producing opaque auth errors deep in a test run.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
tests/test_wdoc.py
- [8380d8a] by @thiswillbeyourgithub, 7 hours ago:
test: skip ollama embedding test when ollama port is unreachable
Probe OLLAMA_HOST (default 127.0.0.1:11434) before test_ollama_embeddings
and skip with a clear message instead of failing when ollama is not running.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
tests/test_wdoc.py
- [ec09cf1] by @thiswillbeyourgithub, 10 hours ago:
doc: mention bump_default_models.sh in CLAUDE.md and ARCHITECTURE.md
CLAUDE.md gets a new section explaining when and how to run the helper.
ARCHITECTURE.md gets a short pointer next to the default-models table.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
ARCHITECTURE.md
CLAUDE.md
- [be3dced] by @thiswillbeyourgithub, 10 hours ago:
change: prefix default models with 'openrouter/'
deepseek/deepseek-v4-pro -> openrouter/deepseek/deepseek-v4-pro
deepseek/deepseek-v4-flash -> openrouter/deepseek/deepseek-v4-flash
Routes the defaults through OpenRouter rather than calling the deepseek
provider directly. Applied via bump_default_models.sh --apply.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
README.md
SKILL.md
docker/env.example
wdoc/docs/help.md
wdoc/utils/env.py
- [72819fa] by @thiswillbeyourgithub, 10 hours ago:
add: bump_default_models.sh helper
Bumpver-style script for changing WDOC_DEFAULT_MODEL and
WDOC_DEFAULT_QUERY_EVAL_MODEL: reads the current values from
wdoc/utils/env.py, replaces both the full id and its basename across
docs/README/SKILL/ARCHITECTURE, and re-syncs key=value lines in
docker/env.example. Dry-run by default; --apply to write; never commits.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
bump_default_models.sh
- [3f59670] by @thiswillbeyourgithub, 10 hours ago:
svg should not mention an outdated default model
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/svg/summary.svg
- [14ac41e] by @thiswillbeyourgithub, 10 hours ago:
change: switch default models to deepseek-v4-pro / deepseek-v4-flash
Update WDOC_DEFAULT_MODEL and WDOC_DEFAULT_QUERY_EVAL_MODEL defaults, and
align README, SKILL, ARCHITECTURE, and help docs to match.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
ARCHITECTURE.md
README.md
SKILL.md
docker/env.example
wdoc/docs/help.md
wdoc/utils/env.py
- [e672e09] by @thiswillbeyourgithub, 10 hours ago:
better test cleanup
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [057bb75] by @thiswillbeyourgithub, 30 hours ago:
doc: add deepwiki badge
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [c400738] by @thiswillbeyourgithub, 4 weeks ago:
doc: add sphinx-apidoc command to CLAUDE.md
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
CLAUDE.md
- [7156080] by @thiswillbeyourgithub, 4 weeks ago:
doc: add new settings documentation requirements to CLAUDE.md
Detail that new settings must be documented in help.md and examples.md,
explain env var re-read behavior, list misc.py variables to keep updated,
and add a guide for adding new filetype support.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
CLAUDE.md
- [b555a90] by @thiswillbeyourgithub, 4 weeks ago:
fix: coerce int to float for cli_kwargs type checking
The Gradio UI can send integer values (e.g. 0, 1) for float parameters
like doccheck_min_lang_prob, causing a type check failure. Now int
values are automatically coerced to float when float is the expected type.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
wdoc/wdoc.py
-
[1b08a1e] by @thiswillbeyourgithub, 4 weeks ago:
Merge branch 'enh_citations' into dev -
[96f9ac2] by @thiswillbeyourgithub, 4 weeks ago:
added a CLAUDE.md file
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
CLAUDE.md
- [4015a3a] by @thiswillbeyourgithub, 4 weeks ago:
fix: better remover of "deep breath" mention
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/summarize.py
- [9d7802a] by @thiswillbeyourgithub, 4 weeks ago:
doc: document citation features in help, examples, README FAQ, and add tests - Add citation_url_template docs to help.md
- Add PDF citation example to examples.md
- Add FAQ entry about source citations in README.md
- Add unit tests for source_replace anchor links and citation URL template
- Fix double-bracket bug in citation URL link generation
Developed with Claude Code.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
README.md
tests/test_wdoc.py
wdoc/docs/examples.md
wdoc/docs/help.md
wdoc/utils/tasks/summarize.py
- [e5d7840] by @thiswillbeyourgithub, 4 weeks ago:
feat: add citation_url_template parameter for clickable citation links
When set (e.g. "{source}#page={page}"), page citations like [p.42]
become markdown links p.42 in summary output.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py
- [dcdecc8] by @thiswillbeyourgithub, 4 weeks ago:
feat: add page citations to summaries with hybrid LLM + Python fallback - Prompt instructs LLM to add [p.N] citations from chunk_metadata
- Python post-processing adds fallback citations to uncited top-level bullets
- Multi-file summaries use [p.N, filename.pdf] format
- Ambiguous filenames resolved with shortest distinguishing path
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
wdoc/utils/prompts.py
wdoc/utils/tasks/summarize.py
- [6cbf025] by @thiswillbeyourgithub, 4 weeks ago:
feat: inject per-chunk metadata (page, source) as XML into summarization input
Each chunk now includes its page number and source path as XML metadata
before the text content, giving the LLM context for citations.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
wdoc/utils/tasks/summarize.py
- [8f72df6] by @thiswillbeyourgithub, 4 weeks ago:
feat: replace WDOC_ID plain numbers with markdown anchor links in query output
Citations now render as N instead of plain numbers,
and document sections include HTML anchors for in-page navigation.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
wdoc/utils/tasks/query.py
wdoc/wdoc.py
Release 5.0.0
What's new
Major Release: Docker Web UI, Python 3.13 Support, and Architecture Improvements
This major release introduces experimental Docker-based web interface, upgrades Python version requirements, migrates to modern LangChain modules, and includes breaking changes with license updates.
✨ Features
-
Docker Web Interface [7da29d1, 214835b, 6a4715b, 3f42424, cf9d937, 8690723, 6dcf7d2, 2ab086e, 92564b2, a2403f9, a8fa5cd, 5475b8b, cdcbe71, 94912ad, 87fef83, 56ffc4c, 24a9d8c, 0ce2649, c8ab404, e6de627, 2d5816c, 12cc4a7, 4f84c0e, f339966, 756c7ca, c16d5ac, af9c7fc, d1f0fed, 9128bec, 9fb2c7e, 90de7d8]
- Add experimental Gradio-based web UI for Docker deployments
- Support for environment variable configuration via web interface
- Dynamic filetype arguments and advanced settings accordions
- Log capture and display with monospace formatting
- PWA mode enabled with two-tab interface
- Sequential processing queue and non-root user security
- Load/save embeddings options in query GUI
-
Performance Improvements [770a524, a28a4be, 6caa011, 6e54b78, e3327a0]
- Avoid recomputing token length of documents with caching
- Dynamically create batches using token count and document limits
- Enable concurrency for answering documents
🔧 Refactoring & Breaking Changes
-
Python Version Upgrade [126026f, 2d1f8ab, b476dba]
- Require Python 3.13+ (breaking change)
- Update to Python 3.13.5
- Add audioop-lts post-install script for Python 3.13+
-
LangChain Migration [830edd4, 4eb303c, 4763b55, 908d536, e190d60, 5846ae0]
- Migrate to langchain_core and langchain_text_splitters modules
- Update imports from outdated langchain modules
- Require langchain >= 1.2.0
- Update CacheBackedEmbeddings import paths
-
License Change [f30fcda, b9e8eb2]
- Switch from GPLv3 to AGPLv3 (breaking change)
-
Async Operations [c412233]
- Use asyncio tqdm instead of regular tqdm for better async support
📚 Documentation
-
Docker Documentation [10be9b0, 69a3ed5, 555e1d5, 09cf656, 87d6442, ba75fd0, 44d419c, def3f97]
- Add comprehensive Docker README with setup instructions
- Include Gradio interface screenshot
- Add troubleshooting for permission issues
- Integrate Docker documentation into online docs
-
Diagrams [6bbb106, abbb8ed, f54d380, 0c70508, 9842b09, 14018a6, 4576785]
- Update workflow diagrams with titles
- Add combined diagram image with all three flowcharts
- Display diagrams side by side in README
- Improve Mermaid diagram structure with subgraphs
-
General Documentation [701645c, 19a4555, 2938f36, 1a4cf2d, 019c640, a96acd9, 0a6af6d, 4273ed1, 2cf0be4, 5ea630d, b747f05, 817380c, 0acca05, 659dcf8, 082b016, 86aeda4, 5af143a, 0abd268]
- Add website link and author information
- Clarify installation instructions with uvx commands
- Remove duplicate documentation sections
- Fix image links for Read the Docs compatibility
- Mention experimental WebUI feature
🐛 Fixes
-
Docker Environment [257ca2c, f1a2156, 6b0370f, cfb551c, fe6c5e1, eeaf300, e694fa1, 185bc09, 4b64e4d, 5f4890e, 5ae9c55, 84f0b5c, e681226, c592650]
- Add WDOC_IN_DOCKER environment variable detection
- Prevent pdb usage in Docker containers
- Fix cache directory permissions
- Configure build args for flexible wdoc installation
- Simplify parse task to text format with markdown
-
Python 3.11 Compatibility [c76240f, a058251, d710043, e3a4524]
- Unpinned package versions for Open WebUI compatibility
- Made compatible with Python 3.11 (later reverted to 3.13+ requirement)
-
General Fixes [c5d77b5, b80ce97, bf001a3, 27f4810, a42a3ad, 627b66b]
- Fix lazy loading loaders in pytest
- Fix exception chaining for pdb compatibility
- Fix globals arg for exec as positional argument
- Update required audioop-lts version
- Better assert explanations
📦 Dependencies
Commits details since the last release
- [7da29d1] by @thiswillbeyourgithub, 81 seconds ago:
bump version 4.1.2 -> 5.0.0
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [6bbb106] by @thiswillbeyourgithub, 5 minutes ago:
update the diagrams with titles
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
images/diagram_query.png
images/diagram_search.png
images/diagram_summary.png
- [abbb8ed] by @thiswillbeyourgithub, 8 minutes ago:
add title to the mermaid diagrams
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
images/diagram_query.mmd
images/diagram_search.mmd
images/diagram_summary.mmd
- [f54d380] by @thiswillbeyourgithub, 20 minutes ago:
Display the three images side by side
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [0c70508] by @thiswillbeyourgithub, 27 minutes ago:
add a single image containing all three flowcharts
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
images/all.mmd
images/all.png
- [701645c] by @thiswillbeyourgithub, 43 minutes ago:
doc: add link to your website
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/index.rst
- [1a4cf2d] by @thiswillbeyourgithub, 46 minutes ago:
fix: cleanup duplicate getting started doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [019c640] by @thiswillbeyourgithub, 46 minutes ago:
fix: remove duplicate documentation
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [19a4555] by @thiswillbeyourgithub, 64 minutes ago:
doc: add link to my website
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [2938f36] by @thiswillbeyourgithub, 64 minutes ago:
mention I'm a psychiatry resident now
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [555e1d5] by @thiswillbeyourgithub, 65 minutes ago:
doc: better docker documentation
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
docker/README.md
- [09cf656] by @thiswillbeyourgithub, 73 minutes ago:
fix: use relative link for docker doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [c412233] by @thiswillbeyourgithub, 87 minutes ago:
fix: use asyncio tqdm instead of regular tqdm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
wdoc/utils/embeddings.py
wdoc/utils/filters.py
wdoc/utils/loaders/anki.py
wdoc/utils/loaders/pdf.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py
- [e3327a0] by @thiswillbeyourgithub, 2 hours ago:
perf: forgot to enable concurrency for answering documents
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [23bef7d] by @thiswillbeyourgithub, 2 hours ago:
add the docker documentation to the manifest.in file
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
MANIFEST.in
- [994aa1d] by @thiswillbeyourgithub, 2 hours ago:
mention that the webui is experimental
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docker/README.md
docker/gui.py
- [69a3ed5] by @thiswillbeyourgithub, 2 hours ago:
add picture of the docker ui
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docker/README.md
images/gradio_interface.png
- [32ec5bd] by @thiswillbeyourgithub, 3 hours ago:
bump dep of langchain-litellm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [770a524] by @thiswillbeyourgithub, 12 days ago:
new: avoid recomputing the token length of documents
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/embeddings.py
wdoc/utils/misc.py
wdoc/wdoc.py
- [a28a4be] by @thiswillbeyourgithub, 12 days ago:
new: make sure get_tkn_length avoids recomputing document tokens
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [6caa011] by @thiswillbeyourgithub, 12 days ago:
refactor: dynamically create batches using token count and document limit
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/utils/embeddings.py
- [6e54b78] by @thiswillbeyourgithub, 12 days ago:
minor
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/embeddings.py
- [830edd4] by @thiswillbeyourgithub, 13 days ago:
fix: import retrievers in the new langchain
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
wdoc/utils/retrievers.py
- [4eb303c] by @thiswillbeyourgithub, 13 days ago:
make sure langchain is after 1.2.0
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [4763b55] by @thiswillbeyourgithub, 13 days ago:
fix import cachebackedembeddings from outdated langchain
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/embeddings.py
- [908d536] by @thiswillbeyourgithub, 13 days ago:
fix outdated import in comment
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/compressed_embeddings_cacher.py
wdoc/utils/embeddings.py
- [e190d60] by @thiswillbeyourgithub, 13 ...
Release 4.1.2
What's new
What's new
This patch release fixes an optional dependency installation issue and improves hashing performance.
🐛 Fixes
- Fixed chonkie optional install dependency name (
semanticnotsemantics) [18a2f9d5]
⚡ Performance
- Switched from SHA256 to BLAKE3 for faster hashing [835e43dc]
- Updated in
setup.py,tests/test_parsing.py, andwdoc/utils/misc.py
- Updated in
Commits details since the last release
- [cb522e6] by @thiswillbeyourgithub, 10 seconds ago:
bump version 4.1.1 -> 4.1.2
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [18a2f9d] by @thiswillbeyourgithub, 30 minutes ago:
fix: chonkie optional install is called semantic not semantics
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [835e43d] by @thiswillbeyourgithub, 31 minutes ago:
perf: use blake3 instead of sha256
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
tests/test_parsing.py
wdoc/utils/misc.py
Release 4.1.1
What's new
This release focuses on integrating Chonkie for semantic chunking, improving test reliability, and code quality enhancements through comprehensive linting.
Features
- Chonkie Semantic Chunking Integration
- Implemented
ChonkieSemanticSplitterusing semantic chunking with memoization ([081e81a]) - Added
transform_documentsmethod to ChonkieSemanticSplitter ([534cc90]) - Replaced
RecursiveCharacterTextSplitterwithChonkieSemanticSplitterin summarize.py ([77f1652]) - Added chonkie to requirements ([7234f86])
- Merged chonkie branch into dev ([f89390a])
- Implemented
Fixes
-
Logging & Display
-
Parsing & Type Hints
Refactor
- Split batch file loader into two files ([a0420fd])
- Comprehensive ruff linter run across codebase ([d9f7eac])
- Switched from black to ruff ([2d8a51b])
- Made ruff configuration less strict ([e04fc8d])
Tests
- DDG Test Improvements
Chore
- Updated pyfiglet font ([fd49cca])
- Cleaned up completed TODO items from README ([113c008], [ea4a99b], [8656327], [7206234], [4a4f4e8])
- Minor improvements ([b873469], [ca55c13])
Commits details since the last release
- [766c373] by @thiswillbeyourgithub, 4 seconds ago:
bump version 4.1.0 -> 4.1.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [e1b2a87] by @thiswillbeyourgithub, 66 minutes ago:
test: finally fixed the ddg error not capturing the output
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.sh
- [a02ffbc] by @thiswillbeyourgithub, 2 hours ago:
test: dont use alias of grep
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.sh
- [66bf47c] by @thiswillbeyourgithub, 2 hours ago:
test: print output before the error message
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.sh
- [4a4f4e8] by @thiswillbeyourgithub, 2 hours ago:
todo: done the chonkie integration
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [d9c5ae9] by @thiswillbeyourgithub, 2 hours ago:
test: capture the ddg output
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.sh
- [96c5186] by @thiswillbeyourgithub, 2 hours ago:
test: better way to print the output
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.sh
- [ccdffd1] by @thiswillbeyourgithub, 5 hours ago:
test: set max ddg results to 10 because it fails too often
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.sh
tests/test_wdoc.py
- [a0420fd] by @thiswillbeyourgithub, 5 hours ago:
new: split batch file loader into two files
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
wdoc/utils/load_recursive.py
- [7234f86] by @thiswillbeyourgithub, 6 hours ago:
add chonkie to requirements
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
-
[f89390a] by @thiswillbeyourgithub, 25 hours ago:
Merge branch 'chonkie' into dev -
[534cc90] by @thiswillbeyourgithub, 27 hours ago:
feat: add transform_documents method to ChonkieSemanticSplitter
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/utils/misc.py
- [77f1652] by @thiswillbeyourgithub, 29 hours ago:
refactor: replace RecursiveCharacterTextSplitter with ChonkieSemanticSplitter in summarize.py
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/utils/tasks/summarize.py
- [081e81a] by @thiswillbeyourgithub, 29 hours ago:
feat: implement ChonkieSemanticSplitter using semantic chunking with memoization
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/utils/misc.py
- [615828a] by @thiswillbeyourgithub, 3 days ago:
fix: typehint error for topk autoincrease
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/query.py
- [fd49cca] by @thiswillbeyourgithub, 3 days ago:
better pyfiglet font
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [99502e7] by @thiswillbeyourgithub, 3 days ago:
fix: colors where not appearing in loguru
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [83e7fb9] by @thiswillbeyourgithub, 3 days ago:
fix wrong logic for stdout color
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [b873469] by @thiswillbeyourgithub, 3 days ago:
minor
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [d2bca84] by @thiswillbeyourgithub, 3 days ago:
fix: allow llm to mention thinking inside it's thinking
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [a50ec42] by @thiswillbeyourgithub, 3 days ago:
fix: error message when parsing thinking
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [ca55c13] by @thiswillbeyourgithub, 4 days ago:
minor
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [113c008] by @thiswillbeyourgithub, 4 days ago:
todo: no need to mention karakeep becauseit will be a loder
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [ea4a99b] by @thiswillbeyourgithub, 4 days ago:
todo: no more need to make an llm plugin because we support pipes now
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [8656327] by @thiswillbeyourgithub, 4 days ago:
todo: remove todo to move the task to their own file because it's done
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [7206234] by @thiswillbeyourgithub, 4 days ago:
todo: remove need for using dataclass to store tasks as its done
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [d9f7eac] by @thiswillbeyourgithub, 4 days ago:
run ruff linter everywhere
scripts/AnkiFiltered/AnkiFilteredDeckCreator.py
scripts/NtfySummarizer/NtfySummarizer.py
scripts/TheFiche/TheFiche.py
tests/test_parsing.py
tests/test_vectorstores.py
tests/test_wdoc.py
wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/customs/binary_faiss_vectorstore.py
wdoc/utils/customs/litellm_embeddings.py
wdoc/utils/embeddings.py
wdoc/utils/env.py
wdoc/utils/filters.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/loaders/init.py
wdoc/utils/loaders/anki.py
wdoc/utils/loaders/local_audio.py
wdoc/utils/loaders/local_html.py
wdoc/utils/loaders/local_video.py
wdoc/utils/loaders/logseq_markdown.py
wdoc/utils/loaders/online_media.py
wdoc/utils/loaders/pdf.py
wdoc/utils/loaders/shared_audio.py
wdoc/utils/loaders/youtube.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/shared_query_search.py
wdoc/utils/tasks/types.py
wdoc/wdoc.py
- [e04fc8d] by @thiswillbeyourgithub, 4 days ago:
less strict ruff
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
.pre-commit-config.yaml
- [2d8a51b] by @thiswillbeyourgithub, 4 days ago:
switch from black to ruff
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
.pre-commit-config.yaml
setup.py
Release 4.1.0
What's new
What's new
This release focuses on robustness improvements, particularly around language detection, file loading, and error handling.
Features
- Task type system: Introduced dataclass-based task type storage for better type safety [7c95e3c]
- Source tag logging: Added failure count and success rate tracking to source tag logging [69dca45]
Fixes
- PowerPoint loader: Fixed TypeError when loading PowerPoint files [ebfc66c]
- Anki loader: Resolved forward reference error [73924e1]
- Language detection: Fixed potential edge case issue [2d928ab]
- Infinite loop detection:
Enhancements
- Language detection improvements:
- Batch file loader: Reduced verbosity of progress logging [d207d98]
- Testing: Improved model detection logic [5257c5a]
- Post-install: Use logger.error instead of print during installation [c0795e9]
Refactoring
- wdoc class: Added dynamic interaction_settings property [f806b98]
- Type hints: Improved type annotations across multiple modules [a94a889, 920e5d3]
Documentation
- Help text: Fixed powerpoint filetype documentation incorrectly mentioning .doc/.docx instead of .ppt/.pptx [e9b29eb]
Dependencies
- Bumped litellm to enable latest OpenRouter pricing [577e6f6]
Maintenance
- Removed debug print statement [80f7f32]
- Better warning messages [faa5d3b]
- Fixed setup.py logger usage [4a672c1]
Commits details since the last release
- [5adc87e] by @thiswillbeyourgithub, 72 seconds ago:
bump version 4.0.4 -> 4.1.0
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [0b9c6da] by @thiswillbeyourgithub, 2 hours ago:
enh: better exception catcher in language detecction
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [80f7f32] by @thiswillbeyourgithub, 3 hours ago:
remove a debug print
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/init.py
- [d7589cc] by @thiswillbeyourgithub, 3 hours ago:
enh: language detector reduce debug logs
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [c0e2ce7] by @thiswillbeyourgithub, 3 hours ago:
enh: language detector
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [2d928ab] by @thiswillbeyourgithub, 3 hours ago:
fix: potential issue in edge case when detecting language
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [69dca45] by @thiswillbeyourgithub, 4 hours ago:
feat: add failure count and success rate to source tag logging
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/utils/batch_file_loader.py
- [f806b98] by @thiswillbeyourgithub, 4 hours ago:
refactor: add dynamic interaction_settings property to wdoc class
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/wdoc.py
- [a94a889] by @thiswillbeyourgithub, 4 hours ago:
minor: type hints
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [7c95e3c] by @thiswillbeyourgithub, 4 hours ago:
new: use a dataclass to store the type of tasks
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/loaders/init.py
wdoc/utils/misc.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/summarize.py
wdoc/utils/tasks/types.py
wdoc/wdoc.py
- [5257c5a] by @thiswillbeyourgithub, 5 hours ago:
better way to check for testing model
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/llm.py
wdoc/utils/misc.py
wdoc/wdoc.py
- [73924e1] by @thiswillbeyourgithub, 7 hours ago:
fix: forward reference error in anki
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/anki.py
- [ebfc66c] by @thiswillbeyourgithub, 7 hours ago:
fix: typeerror when loading powerpoint files
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/powerpoint.py
- [d207d98] by @thiswillbeyourgithub, 25 hours ago:
enh: reduice verbosity of something that looked like an infinite loop but was not
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
- [920e5d3] by @thiswillbeyourgithub, 25 hours ago:
fix: typehint in pdf
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/pdf.py
- [bb147b3] by @thiswillbeyourgithub, 25 hours ago:
refactor: replace loop counter with hash-based infinite loop detection
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/utils/batch_file_loader.py
- [4a672c1] by @thiswillbeyourgithub, 25 hours ago:
fix: actually no we can't use loguru in setup.py
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [faa5d3b] by @thiswillbeyourgithub, 26 hours ago:
minor: better warning
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
- [fcf9ca5] by @thiswillbeyourgithub, 26 hours ago:
fix: the loop counter has to be high enough to detect infinite loop
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
- [e9b29eb] by @thiswillbeyourgithub, 26 hours ago:
doc: powerpoint filetype doc mentionned .doc and .docx instead of .ppt and .pptx
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
- [577e6f6] by @thiswillbeyourgithub, 5 days ago:
bump litellm, allows using the latest openrouter price by litellm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [c0795e9] by @thiswillbeyourgithub, 6 days ago:
enh: use logger.error instead of print during the postinstall process
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
Release 4.0.2
What's new
What's new
This release focuses on bug fixes, performance improvements, and code cleanup related to docstore filtering and retriever functionality.
🐛 Fixes
-
Docstore filtering improvements
-
Retriever fixes
⚡ Performance
- Do not store nor serialize the unfiltered docstore ([d29d3a5], [a9a0a35])
- Renamed
filter_docstoretofilter_vectorstorefor clarity
- Renamed
✨ Features
- Added timing measurements for docstore serialization and deletion ([375c1a1])
🧹 Chores
Commits details since the last release
- [83f23dd] by @thiswillbeyourgithub, 9 seconds ago:
bump version 4.0.1 -> 4.0.2
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [a9a0a35] by @thiswillbeyourgithub, 17 minutes ago:
rename filter_docstore to filter_vectorstore
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/filters.py
wdoc/wdoc.py
- [d29d3a5] by @thiswillbeyourgithub, 18 minutes ago:
perf: do not store nor serialize the unfiltered docsstore
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/filters.py
wdoc/wdoc.py
- [cf9171d] by @thiswillbeyourgithub, 24 minutes ago:
fix: parent retriever when loading from embeddings
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/retrievers.py
- [de18c65] by @thiswillbeyourgithub, 37 minutes ago:
remove unused import
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/retrievers.py
- [39951a8] by @thiswillbeyourgithub, 37 minutes ago:
fix: typehint of retrievers in an edge case
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/retrievers.py
- [375c1a1] by @thiswillbeyourgithub, 45 minutes ago:
feat: add timing measurements for docstore serialization and deletion
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat
wdoc/utils/filters.py
- [6525464] by @thiswillbeyourgithub, 48 minutes ago:
remove unused import
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/filters.py
- [9c1d967] by @thiswillbeyourgithub, 48 minutes ago:
fix: actually the unfiltered docstore is serialized
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/filters.py
wdoc/wdoc.py
- [ee9cc6f] by @thiswillbeyourgithub, 67 minutes ago:
fix: wrong type hint for create_filter_metadata
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/filters.py
- [d96c2f3] by @thiswillbeyourgithub, 69 minutes ago:
fix: wrong type hint for filter_docstore
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/filters.py
- [1a2442d] by @thiswillbeyourgithub, 70 minutes ago:
fix: forgot to pass arguments to filter_docstore
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
Release 4.0.1
What's new
What's new
This release focuses on langfuse v3 compatibility and improved error handling.
🐛 Fixes
-
Langfuse v3 compatibility
-
Document loading robustness
📝 Documentation
- [56866d1] Add warning for using youtube audio backend instead of whisper or deepgram
🔧 Maintenance
- [fb49e60] Bump version 4.0.0 → 4.0.1
Commits details since the last release
- [fb49e60] by @thiswillbeyourgithub, 13 seconds ago:
bump version 4.0.0 -> 4.0.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [07257e0] by @thiswillbeyourgithub, 3 minutes ago:
fix: use langfuse opentelemetry for v3
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [56866d1] by @thiswillbeyourgithub, 11 minutes ago:
doc: add warning for using the youtube audio backend instead of whisper or deepgram
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/youtube.py
- [89f5132] by @thiswillbeyourgithub, 14 minutes ago:
fix: langfuse callback import changed for langfuse v3
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [3039bcf] by @thiswillbeyourgithub, 20 minutes ago:
fix: do not crash if no documents after transform_documents is ran
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/init.py
- [101c7f7] by @thiswillbeyourgithub, 29 minutes ago:
add assert that docs were found
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/init.py