Skip to content

Releases: thiswillbeyourgithub/wdoc

Release 5.1.3

23 Jun 17:42

Choose a tag to compare

What's new

PLACEHOLDER

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

  • [6cea6b0] by @thiswillbeyourgithub, 2 hours ago:
    feat: fall back to WHISPER_API_KEY for whisper transcription
    When neither WDOC_WHISPER_API_KEY nor OPENAI_API_KEY is set, resolve the
    whisper API key from a WHISPER_API_KEY environment variable if present.
    Resolution order: WDOC_WHISPER_API_KEY > OPENAI_API_KEY > WHISPER_API_KEY.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

wdoc-skill/REFERENCE.md
wdoc/docs/help.md
wdoc/utils/loaders/shared_audio.py

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

setup.py

setup.py
wdoc/utils/loaders/online_media.py

Release 5.1.1

23 Jun 14:18

Choose a tag to compare

What's new

PLACEHOLDER

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

setup.py

  • [ed81e2a] by @thiswillbeyourgithub, 7 days ago:
    docs: restructure SKILL.md into the wdoc-skill/ skill directory
    Split the monolithic root SKILL.md into a Claude Code skill directory
    following the write-a-skill conventions:
  • wdoc-skill/SKILL.md: concise orientation with frontmatter + trigger
    description, quick start, the four tasks, and core mechanics (73 lines)
  • wdoc-skill/REFERENCE.md: full CLI args, env vars, filetypes, loader
    options, and the complete Python API tables
  • wdoc-skill/EXAMPLES.md: copy-pasteable shell and Python recipes

Polished wording, added the skill frontmatter, and removed em-dashes from
the reference content. Updated every reference to the old root SKILL.md
(bump_default_models.sh, README.md, CLAUDE.md, ARCHITECTURE.md) and the
Claude Code install instructions to the new three-file layout.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

ARCHITECTURE.md
CLAUDE.md
README.md
bump_default_models.sh
wdoc-skill/EXAMPLES.md
wdoc-skill/REFERENCE.md
wdoc-skill/SKILL.md

  • [af68f46] by @thiswillbeyourgithub, 9 days ago:
    docs: note how self-contained adding a new filetype is
    Add a compact lead-in to CLAUDE.md's "Adding Support for a New Filetype"
    section describing the two routes (standalone loader vs recursive fan-out)
    and citing the recent zotero/karakeep types, plus a short note in SKILL.md
    after the recursive filetypes table pointing contributors at CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

CLAUDE.md
SKILL.md

  • [a5473f4] by @thiswillbeyourgithub, 9 days ago:
    fix(summarize): dump LLM output on empty content, fix recursion dedup
    Follow-up to the previous empty-summary fix, addressing review feedback.

  • Empty message content: instead of retrying (which just reproduces the
    same blank when the cause is structural), raise a ValueError that dumps
    every attribute of the LLM generations and llm_output. For some
    reasoning models litellm leaves the message content empty and puts the
    answer in additional_kwargs['reasoning_content'], so the dump reveals
    when we are simply reading the wrong field
    (generations[0].text == message.content).

  • Recursion dedup bug: if summary_text not in recursive_summaries tested
    membership against the dict's integer level keys, so it was always True
    and forced recursion to stop after the first pass. Now compares against
    recursive_summaries.values(), which also makes the "identical summary"
    warning accurate.

  • doc_reading_length: default to 0.0 (float) instead of 0 (int) when a
    doc has no doc_reading_time, otherwise wdocSummary construction crashes
    under beartype (WDOC_TYPECHECKING=crash).

Tests: the empty-content test now asserts the generation dump is included;
adds a recursion test proving summarization continues past the first pass
(fails as 2 != 3 with the old predicate).

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

tests/test_wdoc.py
wdoc/utils/tasks/summarize.py

  • [7eb0a61] by @thiswillbeyourgithub, 9 days ago:
    fix(summarize): retry then raise on empty LLM completion
    A reasoning model can return an empty completion (all of its budget spent
    on reasoning tokens) with finish_reason "stop". _summarize previously
    turned that into a silently empty summary, which only surfaced much later
    as a confusing assert 'monkey' in '' in
    test_summary_tim_urban_default_model.

Now each chunk generation retries once while bypassing the cache on an
empty completion, and raises an explicit ValueError if it is still empty.
Token usage from every attempt is accumulated so the cost accounting
stays correct.

Adds a basic regression test that drives _summarize with a fake LLM
returning empty content and asserts the explicit error is raised.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

tests/test_wdoc.py
wdoc/utils/tasks/summarize.py

  • [10c3b40] by @thiswillbeyourgithub, 9 days ago:
    docs: drop em-dashes introduced in the SKILL.md sync edits
    Replace the em-dashes I added (CLAUDE.md SKILL.md bullet, SKILL.md
    install comments) with colons, per the project's no-em-dash convention.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

CLAUDE.md
SKILL.md

  • [85289a9] by @thiswillbeyourgithub, 9 days ago:
    docs(skill): document the citation_url_template argument
    It was a real but undocumented CLI/constructor argument (turns page
    citations into clickable links in summaries). Add it to the Summary
    Arguments table and the Python constructor signature.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

SKILL.md

  • [8db96a1] by @thiswillbeyourgithub, 9 days ago:
    docs(skill): document youtube subtitle changes and kebab-case flags
    Reflect changes since the reference was written: --youtube_language now
    auto-detects the original-language (-orig) subtitle track when unset,
    --yt_* flags are rewritten to --youtube_*, and CLI flags accept
    kebab-case (dashes normalized to underscores).

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

SKILL.md

  • [65b619c] by @thiswillbeyourgithub, 9 days ago:
    docs(skill): document the zotero and karakeep filetypes
    Both were added since the reference was last written. Add them to the
    Recursive Filetypes table, document their loader-specific arguments and
    --path selector syntax in the DocDict section, and add shell examples.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

SKILL.md

  • [f1b2693] by @thiswillbeyourgithub, 9 days ago:
    docs(skill): bump reference to v5.1.0 and list modular install extras
    The reference was pinned to v5.0.0; the repo is now v5.1.0. Also expand
    the Installation section to document the modular extras introduced when
    install_requires was split (youtube/audio/anki/office/logseq/zotero/
    karakeep/...), since the base package only ships pdf + url loaders.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

SKILL.md

  • [36db392] by @thiswillbeyourgithub, 9 days ago:
    docs(claude): require keeping SKILL.md in sync with user-facing changes
    SKILL.md is a hand-maintained comprehensive reference that drifts unless
    updated by hand. Add it to the "Adding New Settings" and "Adding Support
    for a New Filetype" checklists and add a dedicated "Keeping SKILL.md in
    Sync" section so future changes update it.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

CLAUDE.md

  • [43e7e8b] by @thiswillbeyourgithub, 10 days ago:
    test(run_all_tests): validate MISTRAL_API_KEY before the suite runs
    test_mistral_embeddings hardcoded "mistral/mistral-embed", so the
    existing _check_provider_key "mistral/" MISTRAL_API_KEY guard in
    run_all_tests.sh never matched anything and the key was never checked
    up-front: a missing MISTRAL_API_KEY only surfaced late, mid-suite.

Make the embed model overridable via WDOC_TEST_MISTRAL_EMBED_MODEL
(default mistral/mistral-embed, full id incl. provider) and add it to
_ALL_TEST_MODELS so the guard now matches and crashes early.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

tests/run_all_tests.sh
tests/test_wdoc.py

  • [1d4ea7a] by @thiswillbeyourgithub, 10 days ago:
    test(karakeep): pad the roundtrip bookmark text past the reading-length guard
    A bare marker string is too short and trips wdoc's 'total reading length
    is suspiciously low' assertion (> 0.1 min). Repeat a deterministic
    sentence around the unique marker so the loaded doc clears the guard
    while the assertions still match on the marker.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

tests/test_karakeep.py

  • [14cb323] by @thiswillbeyourgithub, 10 days ago:
    Merge branch 'karakeep_source' into dev

  • [3ce4a93] by @thiswillbeyourgithub, 11 days ago:
    test(karakeep): make the api test self-contained via a bookmark lifecycle
    Instead of depending on a pre-existing library and a KARAKEEP_TEST_SELECTOR env
    var, the api test now creates its own temporary text bookmark (deterministic,
    unlike a link whose html is crawled asynchronously), loads it through wdoc's
    loader, asserts the round-trip, then deletes it (mirrors karakeep_python_api's
    own create/delete lifecycle tests). It also asserts the live bookmark carries
    the structural keys the FakeKarakeep fixtures encode, so the fast basic tests
    cannot silently drift from the real schema. Drop the now-unneeded
    KARAKEEP_TEST_SELECTOR guard from run_all_tests.sh.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

tests/run_all_tests.sh
tests/test_karakeep.py

  • [b3add40] by @thiswillbeyourgithub, 11 days ago:
    docs(karakeep): document the karakeep filetype
    Add the karakeep filetype, its --path selector syntax and karakeep_* args to
    help.md, usage examples to examples.md, and short mentions to README.md and
    ARCHITECTURE.md. Built with the help of Claude Code.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

ARCHITECTURE.md
README.md
wdoc/docs/examples.md
wdoc/docs/help.md

tests/run_all_tests.sh

  • [5de76cb] by @thiswillbeyourgithub, 11 days ago:
    test(karakeep): cover selector parsing, fan-out, resilience and caching
    A fake Karakeep client drives the basic tests (no network): selector parsing,
    link/text/asset fan-out, the native-source skip path, the loading_failure=warn
    resilience guarantee, and the doc_loaders_cache hit. Adds an api-gated real
    instance test driven by KARAKEEP_TEST_SELECTOR.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

tests/test_karakeep.py

Read more

Release 5.1.0

15 May 14:05

Choose a tag to compare

What's new

This release focuses on modular installation extras, CLI robustness improvements, and a sweep of bug fixes across summarization, logging, and setup.

✨ Features

  • CLI: Accept kebab-case flags (--foo-bar--foo_bar) automatically ([e9bfb80])
  • CLI: Warn on every sys.argv mutation via ArgvState ([4fe38f2]); accept --yt_* as shorthand for --youtube_doc_* ([f800805])
  • YouTube: Auto-detect original-language subtitle track (-orig), falling back to en/en-US ([0753b00])
  • Prompts: Skip per-bullet citations when only one source; mention it once at the top instead ([c0097b7])
    • Exception: for YouTube/timecoded sources, use per-bullet timecodes (e.g. [02:17:33]) ([c0779a1])

🐛 Fixes

  • Summarize: Strip *DEEP BREATH*-style LLM intro artifacts from all top-level bullets, not just the first ([dd09942], [c837143])
  • Summarize: Fix model name in output summary ([e253d45])
  • Logger: Actually remove the default DEBUG stderr handler instead of stacking a second sink on top of it ([2f3b295])
  • Env: Match --debug/--verbose by exact argv token, not substring, preventing false positives from argument values ([e183729])
  • Loaders: Better check for empty documents ([5ebbbd2], [586bc5c])
  • YouTube: Add troubleshooting instructions on failed extraction ([397d133]); fix default language handling ([29207e8])
  • Audio: Fix WDOC_WHISPER_API_KEY handling when OPENAI_API_KEY is unset ([23b6b1f])
  • Setup: Guard openparse-download behind an import openparse probe ([f4445b4]); scope yt-dlp pre-release upgrade to [youtube] users only ([9242566])

♻️ Refactors

  • Setup: Split install_requires into modular extras [youtube], [audio], [anki], [office], [logseq], [full] ([7bb4744])
    • Move audioop-lts into [audio] extra with python_version>='3.13' marker ([eb9eba4])
    • Move py_ankiconnect into [anki] extra with requests fallback ([fe2d9c0])
    • Drop python-magic git install from post-install hook ([bafb379])
  • CLI: Centralize all sys.argv mutations in ArgvState helper class ([1098157], [a22f56d])
  • Logger: Move handler setup out of import side-effects into setup_cli_logging(), called only from __main__.py ([203ab6f])

🧪 Tests

  • Cover ArgvState helpers with unit tests ([6bc897a], [a22f56d])
  • Move API-key precheck from test_wdoc.py to run_all_tests.sh for faster fail ([dbf4410])
  • Skip test_parse_docx on HTTP 429 instead of failing ([a02d684])
  • Improve venv management in run_all_tests.sh ([b6a0dd8])

📚 Docs

  • Clarify uvx wdoc[full] usage throughout README and examples.md ([0f72eaf], [7653e9a])
  • Add/fix [anki] extra in Anki parse example ([0f72eaf])
  • Improve installation instructions recommending uvx ([d88c461])
  • Clarify how to use a cloned repository ([b8b4b5e])

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

  • [dd09942] by @thiswillbeyourgithub, 25 minutes ago:
    fix(summarize): clean LLM intro artifacts on all top-level bullets
    Extract the 'deep breath' / "i'll summarize" cleanup into
    _strip_llm_intro_artifacts and run it on every top-level line, not just
    the first one. Previously a source reference on line 1 would leave a
    later deep-breath bullet untouched.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/tasks/summarize.py

README.md

tests/run_all_tests.sh

  • [a02d684] by @thiswillbeyourgithub, 2 hours ago:
    test(parsing): skip test_parse_docx on HTTP 429 instead of failing
    The test downloads a sample DOCX from freetestdata.com, which sometimes
    returns 429 (rate limited). That is not a wdoc bug, so skip rather than
    fail in that case.

tests/test_parsing.py

  • [c0779a1] by @thiswillbeyourgithub, 2 hours ago:
    feat(prompts): use timecodes as per-bullet source for YouTube single-source
    Extends the single-source citation exception: when the unique source is a
    YouTube video (or other timecoded media), don't drop citations entirely.
    Mention the video source once at the top, then use each bullet's timecode
    (e.g. [02:17:33]) as its precise per-bullet pointer. Applied to both the
    summary prompt (Sam) and the combine prompt (Carl).

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/prompts.py

  • [c0097b7] by @thiswillbeyourgithub, 2 hours ago:
    feat(prompts): skip per-bullet citations when only one source
    Avoids wasting tokens by repeating the same page/WDOC_ID citation on every
    bullet point when all information shares a single unique source. In that
    case the citation is mentioned once at the top instead. Applies to both
    the summary prompt (Sam) and the combine prompt (Carl).

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/prompts.py

  • [c837143] by @thiswillbeyourgithub, 2 hours ago:
    fix(summarize): strip "DEEP BREATH -" style prefixes from first line
    Permissive on asterisks, "breath"/"breaths", and the separator character
    so variants like "- DEEP BREATH - ", "DEEP BREATHS: ", "DEEP BREATH, "
    are handled while preserving the bullet marker.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/tasks/summarize.py

wdoc/utils/tasks/summarize.py

  • [c67f564] by @thiswillbeyourgithub, 2 hours ago:
    docs(setup): note nltk punkt_tab download is likely redundant
    unstructured already lazily downloads punkt_tab on first tokenize call,
    so the eager post-install download is probably duplicate work. Keep it
    as a safety net (and to front-load the network hit at install time
    instead of on the first office-document parse), but document it.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [f4445b4] by @thiswillbeyourgithub, 2 hours ago:
    fix(setup): only run openparse-download when openparse is installed
    Guard the post-install weight download with an import openparse probe
    so a stripped-down install (no openparse[ml] in install_requires) does
    not call a missing console-script and emit a confusing error.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [9242566] by @thiswillbeyourgithub, 2 hours ago:
    fix(setup): scope yt-dlp pre-release upgrade to [youtube] users
    yt-dlp lives in the optional [youtube] extra, but the post-install hook
    was force-installing it for everyone (with --user, which is wrong
    inside a venv and quietly drops the install outside the env). Probe for
    yt_dlp first and only run the pip install -U --pre yt-dlp if it's
    already there. This keeps yt-dlp truly optional while still letting
    [youtube] users track YouTube extractor fixes that land in pre-releases.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [eb9eba4] by @thiswillbeyourgithub, 2 hours ago:
    refactor(setup): declare audioop-lts via the [audio] extra
    Move the audioop-lts 3.13+ install out of the imperative post-install
    hook and into the [audio] extra with a python_version>='3.13'
    environment marker. audioop-lts is only needed because pydub needs it,
    and pydub already lives in [audio], so the conditional belongs there.
    This also makes the dependency visible to non-python setup.py install
    installers (pip install wdoc[audio], uv, pipx, etc.) which never ran
    the post-install hook in the first place.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [bafb379] by @thiswillbeyourgithub, 2 hours ago:
    chore(setup): drop python-magic git install from post-install
    The git install existed to get the FIFO/pipe fix from upstream PR for
    issue #261, used via magic.from_buffer on stdin bytes. That code path
    is commented out in batch_file_loader.py, and the two remaining call
    sites (magic.from_file in batch_file_loader.py and pdf.py) work fine
    with the released 0.4.27 wheel on PyPI. Both call sites are already
    wrapped in try/except, so python-magic stays optional at runtime.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [203ab6f] by @thiswillbeyourgithub, 2 hours ago:
    refactor(logger): move handler setup out of import side effects
    When wdoc was imported as a library (e.g. as an open-webui tool),
    wdoc/utils/logger.py mutated the global loguru logger at import time:
    removing the default stderr sink and adding its own stdout/stderr/file
    sinks. That clobbered the host application's loguru configuration.

Wrap the handler installation in a setup_cli_logging() function that
is called explicitly from wdoc/main.py. Library users get whatever
loguru handlers the host already configured (since loguru is a
singleton, wdoc's records will flow through them automatically); CLI
users get the customized colorized stdout/stderr plus the rotated
file log.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/init.py
wdoc/main.py
wdoc/utils/logger.py
wdoc/wdoc.py

README.md

  • [dbf4410] by @thiswillbeyourgithub, 2 hours ago:
    test(env): move API-key precheck from test_wdoc.py to run_all_tests.sh
    Fails fast at the shell level before spinning up the venv and pytest,
    rather than only when test_wdoc.py is imported.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

tests/run_all_tests.sh
tests/test_wd...

Read more

Release 5.0.1

13 May 17:37

Choose a tag to compare

What's new

feat

  • Summary citations (dcdecc86, 6cbf025c, e5d7840b, 9d7802aa)
    • Per-chunk metadata (page, source) injected as XML into summarization input
    • LLM prompted to add [p.N] citations; Python fallback adds them to uncited top-level bullets
    • Multi-file summaries use [p.N, filename.pdf] format with shortest disambiguating path
    • New citation_url_template parameter turns page citations into clickable markdown links (e.g. {source}#page={page})
  • Query output anchor links (8f72df69): WDOC_ID citations now render as [N](#document-N) with HTML anchors for in-page navigation

change

  • Default models switched to OpenRouter/DeepSeek (14ac41e4, be3dced3)
    • WDOC_DEFAULT_MODEL and WDOC_DEFAULT_QUERY_EVAL_MODEL now default to openrouter/deepseek/deepseek-v4-pro and openrouter/deepseek/deepseek-v4-flash
    • Routes through OpenRouter instead of calling the DeepSeek provider directly

fix

  • b555a904: Coerce int to float for CLI kwargs type checking (fixes Gradio UI sending integer values for float parameters)
  • 4015a3a9: Better removal of "deep breath" mentions in summarization prompts

test

  • 61c8238c, 0ea3d13d, b34e2e3f: Crash early at import time when required API keys (OPENROUTER_API_KEY, OPENAI_API_KEY, MISTRAL_API_KEY, WDOC_WHISPER_API_KEY) are missing
  • 1d07163c: Warn and skip instead of crash when whisper test hits a 502 error
  • 8380d8ad: Skip ollama embedding test when the ollama port is unreachable
  • e672e092: Better test cleanup in run_all_tests.sh

add

  • 72819fab: bump_default_models.sh helper script — dry-run by default, --apply to write; syncs model names across docs, README, SKILL, ARCHITECTURE, and docker/env.example

doc

  • 9d7802aa, ec09cf1e, 71560800, c400738f, 96f9ac23: CLAUDE.md and ARCHITECTURE.md updated with new settings documentation requirements, sphinx-apidoc command, bump_default_models.sh usage, and citation feature docs
  • 057bb75f: Added DeepWiki badge to README
  • 3f59670e: SVG updated to remove outdated default model reference"}

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

  • [1d07163] by @thiswillbeyourgithub, 31 minutes ago:
    test: warn instead of crash when whisper test hits 502 error
    A 502 from the whisper endpoint means the upstream is unavailable, not
    that the code under test failed. Skip the test in that case so the run
    reports not-tested rather than a false negative.

tests/test_wdoc.py

tests/test_wdoc.py

tests/test_wdoc.py

  • [61c8238] by @thiswillbeyourgithub, 7 hours ago:
    test: crash early when required API key for a test model is missing
    Check that OPENROUTER_API_KEY / OPENAI_API_KEY are defined when any of the
    test models (or their default-model fallbacks) starts with 'openrouter/' or
    'openai/'. Fails fast at import time with a clear message instead of
    producing opaque auth errors deep in a test run.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

tests/test_wdoc.py

  • [8380d8a] by @thiswillbeyourgithub, 7 hours ago:
    test: skip ollama embedding test when ollama port is unreachable
    Probe OLLAMA_HOST (default 127.0.0.1:11434) before test_ollama_embeddings
    and skip with a clear message instead of failing when ollama is not running.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

tests/test_wdoc.py

  • [ec09cf1] by @thiswillbeyourgithub, 10 hours ago:
    doc: mention bump_default_models.sh in CLAUDE.md and ARCHITECTURE.md
    CLAUDE.md gets a new section explaining when and how to run the helper.
    ARCHITECTURE.md gets a short pointer next to the default-models table.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

ARCHITECTURE.md
CLAUDE.md

  • [be3dced] by @thiswillbeyourgithub, 10 hours ago:
    change: prefix default models with 'openrouter/'
    deepseek/deepseek-v4-pro -> openrouter/deepseek/deepseek-v4-pro
    deepseek/deepseek-v4-flash -> openrouter/deepseek/deepseek-v4-flash

Routes the defaults through OpenRouter rather than calling the deepseek
provider directly. Applied via bump_default_models.sh --apply.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

README.md
SKILL.md
docker/env.example
wdoc/docs/help.md
wdoc/utils/env.py

  • [72819fa] by @thiswillbeyourgithub, 10 hours ago:
    add: bump_default_models.sh helper
    Bumpver-style script for changing WDOC_DEFAULT_MODEL and
    WDOC_DEFAULT_QUERY_EVAL_MODEL: reads the current values from
    wdoc/utils/env.py, replaces both the full id and its basename across
    docs/README/SKILL/ARCHITECTURE, and re-syncs key=value lines in
    docker/env.example. Dry-run by default; --apply to write; never commits.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

bump_default_models.sh

wdoc/docs/svg/summary.svg

  • [14ac41e] by @thiswillbeyourgithub, 10 hours ago:
    change: switch default models to deepseek-v4-pro / deepseek-v4-flash
    Update WDOC_DEFAULT_MODEL and WDOC_DEFAULT_QUERY_EVAL_MODEL defaults, and
    align README, SKILL, ARCHITECTURE, and help docs to match.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

ARCHITECTURE.md
README.md
SKILL.md
docker/env.example
wdoc/docs/help.md
wdoc/utils/env.py

tests/run_all_tests.sh

README.md

CLAUDE.md

  • [7156080] by @thiswillbeyourgithub, 4 weeks ago:
    doc: add new settings documentation requirements to CLAUDE.md
    Detail that new settings must be documented in help.md and examples.md,
    explain env var re-read behavior, list misc.py variables to keep updated,
    and add a guide for adding new filetype support.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

CLAUDE.md

  • [b555a90] by @thiswillbeyourgithub, 4 weeks ago:
    fix: coerce int to float for cli_kwargs type checking
    The Gradio UI can send integer values (e.g. 0, 1) for float parameters
    like doccheck_min_lang_prob, causing a type check failure. Now int
    values are automatically coerced to float when float is the expected type.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/wdoc.py

CLAUDE.md

wdoc/utils/tasks/summarize.py

  • [9d7802a] by @thiswillbeyourgithub, 4 weeks ago:
    doc: document citation features in help, examples, README FAQ, and add tests
  • Add citation_url_template docs to help.md
  • Add PDF citation example to examples.md
  • Add FAQ entry about source citations in README.md
  • Add unit tests for source_replace anchor links and citation URL template
  • Fix double-bracket bug in citation URL link generation

Developed with Claude Code.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

README.md
tests/test_wdoc.py
wdoc/docs/examples.md
wdoc/docs/help.md
wdoc/utils/tasks/summarize.py

  • [e5d7840] by @thiswillbeyourgithub, 4 weeks ago:
    feat: add citation_url_template parameter for clickable citation links
    When set (e.g. "{source}#page={page}"), page citations like [p.42]
    become markdown links p.42 in summary output.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/tasks/summarize.py
wdoc/wdoc.py

  • [dcdecc8] by @thiswillbeyourgithub, 4 weeks ago:
    feat: add page citations to summaries with hybrid LLM + Python fallback
  • Prompt instructs LLM to add [p.N] citations from chunk_metadata
  • Python post-processing adds fallback citations to uncited top-level bullets
  • Multi-file summaries use [p.N, filename.pdf] format
  • Ambiguous filenames resolved with shortest distinguishing path

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/prompts.py
wdoc/utils/tasks/summarize.py

  • [6cbf025] by @thiswillbeyourgithub, 4 weeks ago:
    feat: inject per-chunk metadata (page, source) as XML into summarization input
    Each chunk now includes its page number and source path as XML metadata
    before the text content, giving the LLM context for citations.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/tasks/summarize.py

  • [8f72df6] by @thiswillbeyourgithub, 4 weeks ago:
    feat: replace WDOC_ID plain numbers with markdown anchor links in query output
    Citations now render as N instead of plain numbers,
    and document sections include HTML anchors for in-page navigation.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/tasks/query.py
wdoc/wdoc.py

Release 5.0.0

04 Jan 15:08

Choose a tag to compare

What's new

Major Release: Docker Web UI, Python 3.13 Support, and Architecture Improvements

This major release introduces experimental Docker-based web interface, upgrades Python version requirements, migrates to modern LangChain modules, and includes breaking changes with license updates.

✨ Features

🔧 Refactoring & Breaking Changes

  • Python Version Upgrade [126026f, 2d1f8ab, b476dba]

    • Require Python 3.13+ (breaking change)
    • Update to Python 3.13.5
    • Add audioop-lts post-install script for Python 3.13+
  • LangChain Migration [830edd4, 4eb303c, 4763b55, 908d536, e190d60, 5846ae0]

    • Migrate to langchain_core and langchain_text_splitters modules
    • Update imports from outdated langchain modules
    • Require langchain >= 1.2.0
    • Update CacheBackedEmbeddings import paths
  • License Change [f30fcda, b9e8eb2]

    • Switch from GPLv3 to AGPLv3 (breaking change)
  • Async Operations [c412233]

    • Use asyncio tqdm instead of regular tqdm for better async support

📚 Documentation

🐛 Fixes

📦 Dependencies

  • [32ec5bd] Bump langchain-litellm dependency
  • [23bef7d] Add docker documentation to MANIFEST.in"}

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

images/diagram_query.png
images/diagram_search.png
images/diagram_summary.png

images/diagram_query.mmd
images/diagram_search.mmd
images/diagram_summary.mmd

README.md

images/all.mmd
images/all.png

docs/source/index.rst

README.md

README.md

README.md

README.md

README.md
docker/README.md

README.md

wdoc/utils/batch_file_loader.py
wdoc/utils/embeddings.py
wdoc/utils/filters.py
wdoc/utils/loaders/anki.py
wdoc/utils/loaders/pdf.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py

wdoc/wdoc.py

MANIFEST.in

docker/README.md
docker/gui.py

docker/README.md
images/gradio_interface.png

setup.py

wdoc/utils/embeddings.py
wdoc/utils/misc.py
wdoc/wdoc.py

wdoc/utils/misc.py

wdoc/utils/embeddings.py

wdoc/utils/embeddings.py

setup.py
wdoc/utils/retrievers.py

setup.py

wdoc/utils/embeddings.py

wdoc/utils/customs/compressed_embeddings_cacher.py
wdoc/utils/embeddings.py

Read more

Release 4.1.2

28 Oct 09:12

Choose a tag to compare

What's new

What's new

This patch release fixes an optional dependency installation issue and improves hashing performance.

🐛 Fixes

  • Fixed chonkie optional install dependency name (semantic not semantics) [18a2f9d5]

⚡ Performance

  • Switched from SHA256 to BLAKE3 for faster hashing [835e43dc]
    • Updated in setup.py, tests/test_parsing.py, and wdoc/utils/misc.py

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

setup.py

setup.py
tests/test_parsing.py
wdoc/utils/misc.py

Release 4.1.1

27 Oct 18:11

Choose a tag to compare

What's new

This release focuses on integrating Chonkie for semantic chunking, improving test reliability, and code quality enhancements through comprehensive linting.

Features

  • Chonkie Semantic Chunking Integration
    • Implemented ChonkieSemanticSplitter using semantic chunking with memoization ([081e81a])
    • Added transform_documents method to ChonkieSemanticSplitter ([534cc90])
    • Replaced RecursiveCharacterTextSplitter with ChonkieSemanticSplitter in summarize.py ([77f1652])
    • Added chonkie to requirements ([7234f86])
    • Merged chonkie branch into dev ([f89390a])

Fixes

  • Logging & Display

    • Fixed colors not appearing in loguru ([99502e7])
    • Fixed wrong logic for stdout color ([83e7fb9])
  • Parsing & Type Hints

    • Allow LLM to mention "thinking" inside its thinking ([d2bca84])
    • Fixed error message when parsing thinking ([a50ec42])
    • Fixed typehint error for topk autoincrease ([615828a])

Refactor

  • Split batch file loader into two files ([a0420fd])
  • Comprehensive ruff linter run across codebase ([d9f7eac])
  • Switched from black to ruff ([2d8a51b])
  • Made ruff configuration less strict ([e04fc8d])

Tests

  • DDG Test Improvements
    • Finally fixed DDG error not capturing output ([e1b2a87])
    • Capture DDG output properly ([d9c5ae9])
    • Set max DDG results to 10 to reduce failures ([ccdffd1])
    • Print output before error message ([66bf47c])
    • Better way to print output ([96c5186])
    • Don't use alias of grep ([a02ffbc])

Chore

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

tests/test_cli.sh

tests/test_cli.sh

tests/test_cli.sh

README.md

tests/test_cli.sh

tests/test_cli.sh

tests/test_cli.sh
tests/test_wdoc.py

wdoc/utils/batch_file_loader.py
wdoc/utils/load_recursive.py

setup.py

wdoc/utils/misc.py

  • [77f1652] by @thiswillbeyourgithub, 29 hours ago:
    refactor: replace RecursiveCharacterTextSplitter with ChonkieSemanticSplitter in summarize.py
    Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat

wdoc/utils/tasks/summarize.py

wdoc/utils/misc.py

wdoc/utils/tasks/query.py

wdoc/wdoc.py

wdoc/utils/logger.py

wdoc/utils/logger.py

wdoc/utils/logger.py

wdoc/utils/misc.py

wdoc/utils/misc.py

README.md

README.md

README.md

README.md

README.md

scripts/AnkiFiltered/AnkiFilteredDeckCreator.py
scripts/NtfySummarizer/NtfySummarizer.py
scripts/TheFiche/TheFiche.py
tests/test_parsing.py
tests/test_vectorstores.py
tests/test_wdoc.py
wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/customs/binary_faiss_vectorstore.py
wdoc/utils/customs/litellm_embeddings.py
wdoc/utils/embeddings.py
wdoc/utils/env.py
wdoc/utils/filters.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/loaders/init.py
wdoc/utils/loaders/anki.py
wdoc/utils/loaders/local_audio.py
wdoc/utils/loaders/local_html.py
wdoc/utils/loaders/local_video.py
wdoc/utils/loaders/logseq_markdown.py
wdoc/utils/loaders/online_media.py
wdoc/utils/loaders/pdf.py
wdoc/utils/loaders/shared_audio.py
wdoc/utils/loaders/youtube.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/shared_query_search.py
wdoc/utils/tasks/types.py
wdoc/wdoc.py

.pre-commit-config.yaml

.pre-commit-config.yaml
setup.py

Release 4.1.0

21 Oct 18:25

Choose a tag to compare

What's new

What's new

This release focuses on robustness improvements, particularly around language detection, file loading, and error handling.

Features

  • Task type system: Introduced dataclass-based task type storage for better type safety [7c95e3c]
  • Source tag logging: Added failure count and success rate tracking to source tag logging [69dca45]

Fixes

  • PowerPoint loader: Fixed TypeError when loading PowerPoint files [ebfc66c]
  • Anki loader: Resolved forward reference error [73924e1]
  • Language detection: Fixed potential edge case issue [2d928ab]
  • Infinite loop detection:
    • Replaced simple loop counter with hash-based detection mechanism [bb147b3]
    • Adjusted loop counter threshold [fcf9ca5]

Enhancements

  • Language detection improvements:
    • Better exception handling [0b9c6da]
    • Reduced debug log verbosity [d7589cc]
    • General improvements [c0e2ce7]
  • Batch file loader: Reduced verbosity of progress logging [d207d98]
  • Testing: Improved model detection logic [5257c5a]
  • Post-install: Use logger.error instead of print during installation [c0795e9]

Refactoring

  • wdoc class: Added dynamic interaction_settings property [f806b98]
  • Type hints: Improved type annotations across multiple modules [a94a889, 920e5d3]

Documentation

  • Help text: Fixed powerpoint filetype documentation incorrectly mentioning .doc/.docx instead of .ppt/.pptx [e9b29eb]

Dependencies

  • Bumped litellm to enable latest OpenRouter pricing [577e6f6]

Maintenance

  • Removed debug print statement [80f7f32]
  • Better warning messages [faa5d3b]
  • Fixed setup.py logger usage [4a672c1]

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

wdoc/utils/misc.py

wdoc/utils/loaders/init.py

wdoc/utils/misc.py

wdoc/utils/misc.py

wdoc/utils/misc.py

wdoc/utils/batch_file_loader.py

wdoc/wdoc.py

wdoc/wdoc.py

wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/loaders/init.py
wdoc/utils/misc.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/summarize.py
wdoc/utils/tasks/types.py
wdoc/wdoc.py

wdoc/utils/llm.py
wdoc/utils/misc.py
wdoc/wdoc.py

wdoc/utils/loaders/anki.py

wdoc/utils/loaders/powerpoint.py

wdoc/utils/batch_file_loader.py

wdoc/utils/loaders/pdf.py

wdoc/utils/batch_file_loader.py

setup.py

wdoc/utils/batch_file_loader.py

wdoc/utils/batch_file_loader.py

wdoc/docs/help.md

setup.py

setup.py

Release 4.0.2

15 Oct 08:51

Choose a tag to compare

What's new

What's new

This release focuses on bug fixes, performance improvements, and code cleanup related to docstore filtering and retriever functionality.

🐛 Fixes

  • Docstore filtering improvements

    • Fixed missing arguments when calling filter_docstore ([1a2442d])
    • Fixed type hints for filter_docstore ([d96c2f3]) and create_filter_metadata ([ee9cc6f])
    • Corrected docstore serialization behavior ([9c1d967])
  • Retriever fixes

    • Fixed parent retriever when loading from embeddings ([cf9171d])
    • Fixed type hint for retrievers in edge cases ([39951a8])

⚡ Performance

  • Do not store nor serialize the unfiltered docstore ([d29d3a5], [a9a0a35])
    • Renamed filter_docstore to filter_vectorstore for clarity

✨ Features

  • Added timing measurements for docstore serialization and deletion ([375c1a1])

🧹 Chores

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

wdoc/utils/filters.py
wdoc/wdoc.py

wdoc/utils/filters.py
wdoc/wdoc.py

wdoc/utils/retrievers.py

wdoc/utils/retrievers.py

wdoc/utils/retrievers.py

wdoc/utils/filters.py

wdoc/utils/filters.py

wdoc/utils/filters.py
wdoc/wdoc.py

wdoc/utils/filters.py

wdoc/utils/filters.py

wdoc/wdoc.py

Release 4.0.1

07 Oct 23:02

Choose a tag to compare

What's new

What's new

This release focuses on langfuse v3 compatibility and improved error handling.

🐛 Fixes

  • Langfuse v3 compatibility

    • [89f5132] Update callback import for langfuse v3
    • [07257e0] Use langfuse opentelemetry for v3
  • Document loading robustness

    • [3039bcf] Prevent crash when no documents remain after transform_documents
    • [101c7f7] Add assertion to verify documents were found

📝 Documentation

  • [56866d1] Add warning for using youtube audio backend instead of whisper or deepgram

🔧 Maintenance

  • [fb49e60] Bump version 4.0.0 → 4.0.1

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

wdoc/utils/misc.py

wdoc/utils/loaders/youtube.py

wdoc/utils/misc.py

wdoc/utils/loaders/init.py

wdoc/utils/loaders/init.py