KnowledgeXLab · tulongshaonian771 · May 13, 2026 · May 13, 2026 · May 13, 2026 · May 13, 2026
diff --git a/.github/workflows/pypi-publish.yml b/.github/workflows/pypi-publish.yml
@@ -0,0 +1,36 @@
+name: publish
+
+on:
+  release:
+    types: [published]
+
+permissions:
+  contents: read
+  id-token: write
+
+jobs:
+  pypi:
+    name: build and publish to PyPI
+    runs-on: ubuntu-latest
+    environment: pypi
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+
+      - name: Install build tooling
+        run: python -m pip install --upgrade build twine
+
+      - name: Build distribution
+        run: python -m build
+
+      - name: Check distribution
+        run: python -m twine check dist/*
+
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,9 @@ __pycache__/
 .pytest_cache/
 .mypy_cache/
 .ruff_cache/
+.coverage
+coverage.xml
+htmlcov/
 .venv/
 dist/
 build/
@@ -11,3 +14,34 @@ workspace/
 output/
 .codex/
 .agents/
+
+# Local secrets/configuration
+.env
+.env.*
+!.env.example
+
+# Runtime data and generated indexes
+*.db
+*.db-*
+*.sqlite
+*.sqlite3
+*.sqlite3-*
+
+# Logs and temporary run output
+*.log
+*.out
+*.tmp
+*.bak
+
+# Large local artifacts
+*.tar
+*.tar.gz
+*.tgz
+*.zip
+*.7z
+
+# Local staging and evaluation artifacts
+staging/
+
+# Local generated previews
+docs/*preview*.html
diff --git a/README.md b/README.md
@@ -1,106 +1,240 @@
 # Little Heta
 
-Little Heta is a lightweight command line tool for personal knowledge, memory,
-and document intelligence workflows. It converts local documents into a
-Markdown wiki, keeps wiki page identity stable, and can maintain a SQLite
-vector index for faster semantic retrieval.
+<p align="center">
+  <img src="docs/assets/little-heta-banner.png" alt="Little Heta banner">
+</p>
+
+<p align="center">
+  <a href="README.md">English</a> ·
+  <a href="docs/i18n/README.zh-CN.md">简体中文</a> ·
+  <a href="docs/i18n/README.zh-TW.md">繁體中文</a> ·
+  <a href="docs/i18n/README.ja.md">日本語</a> ·
+  <a href="docs/i18n/README.ko.md">한국어</a> ·
+  <a href="docs/i18n/README.es.md">Español</a> ·
+  <a href="docs/i18n/README.pt.md">Português</a> ·
+  <a href="docs/i18n/README.fr.md">Français</a> ·
+  <a href="docs/i18n/README.de.md">Deutsch</a>
+</p>
+
+<p align="center">
+  <a href="https://pypi.org/project/little-heta/"><img src="https://img.shields.io/badge/pypi-v0.1.0-3775A9?style=for-the-badge&logo=pypi&logoColor=white" alt="PyPI v0.1.0"></a>
+  <img src="https://img.shields.io/badge/python-3.10%2B-2B6CB0?style=for-the-badge&logo=python&logoColor=white" alt="Python 3.10+">
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-2EA44F?style=for-the-badge" alt="License: MIT"></a>
+  <a href="https://knowledgexlab.github.io/"><img src="https://img.shields.io/badge/KnowledgeXLab-Little%20Heta-111827?style=for-the-badge&logo=github&logoColor=white" alt="KnowledgeXLab"></a>
+</p>
+
+Little Heta is a local CLI knowledge infrastructure for personal documents,
+agent memory, and document intelligence. It turns PDFs, Office files, images,
+audio, code, HTML, Markdown, and notes into a stable Markdown wiki, adds
+semantic vector retrieval, and lets agents reuse distilled knowledge through a
+memory layer.
 
-## Status
-
-This repository is an early `v0.1.0` implementation. The current focus is a
-fast local workflow for initialization, document insertion, wiki maintenance,
-and optional vector indexing.
-
-## Features
+## Install
 
-- Interactive first-time setup with `heta init`
-- Provider configuration for Qwen, ChatGPT, or Gemini
-- Optional MinerU integration for PDF parsing
-- Markdown wiki generation under the Little Heta workspace
-- Stable numeric wiki page ids in page filenames
-- Optional SQLite + sqlite-vec wiki chunk index
-- CLI status view with provider, MinerU, KB, wiki, and space usage summaries
+Install from PyPI:
 
-## Install
+```bash
+pip install little-heta
+```
 
 From a local checkout:
 
 ```bash
 pip install -e .
 ```
 
-For development dependencies:
+For development:
 
 ```bash
 pip install -e ".[dev]"
 ```
 
-## Quick Start
-
-Initialize Little Heta:
+The package installs the `heta` command:
 
 ```bash
-heta init
+heta --help
 ```
 
-The wizard writes configuration to:
-
-```text
-~/.heta/heta.yaml
-```
+## Initialize
 
-Check the current workspace and provider status:
+Run the first-time setup:
 
 ```bash
-heta status
+heta init
 ```
 
-Insert one file or a directory:
+You need to prepare:
 
-```bash
-heta insert ./docs
+- An LLM API key for one provider: Qwen, ChatGPT, or Gemini.
+- Optional MinerU access for PDF and Office parsing. Apply or learn more at
+  [MinerU](https://mineru.net/apiManage/docs).
+
+`heta init` writes config and workspace data under:
+
+```text
+~/.heta/
 ```
 
-Large PDFs are profiled and split before parsing by default. Little Heta gives a
-lightweight PDF profile to a planning agent, validates the returned page ranges,
-and falls back to fixed page windows when planning is unavailable. Disable this
-behavior when you want to parse a PDF as one source file:
+It also installs the Little Heta agent skill automatically into:
 
-```bash
-heta insert --no-pdf-planning ./large.pdf
+```text
+~/.codex/skills/heta
+~/.claude/skills/heta
 ```
 
-Ask a read-only question against the wiki:
+## Use with Codex and Claude Code
+
+After `heta init`, Codex and Claude Code can discover the Little Heta skill
+globally. The skill tells the agent when to use:
 
 ```bash
-heta query "What is HetaGen?"
+heta ask "..."
+heta query "..."
+heta recall "..."
+heta remember "..."
 ```
 
-Clean wiki pages and the vector database while keeping raw files:
+You can refresh or reinstall the skill at any time:
 
 ```bash
-heta clean
+heta skill
 ```
 
-Manage vector indexing:
+For other agent frameworks, copy these two files:
 
-```bash
-heta vector status
-heta vector on
-heta vector off
+```text
+~/.heta/skills/heta/SKILL.md
+~/.heta/skills/heta/COMMANDS.md
 ```
 
+## What You Get
+
+Most personal knowledge bases eventually become a `/raw` folder: papers,
+slides, screenshots, audio clips, code files, notes, and half-finished drafts
+all pile up together. A normal agent can read those files directly, but every
+question pays the same cost again: open the index, guess which page matters,
+read long pages, and spend tokens rediscovering context it already found before.
+
+Little Heta separates the external knowledge base from the agent's internal
+memory. The KB remains the source of truth: a structured, versioned wiki built
+from the user's files. Memory, by contrast, is the agent's persistent working
+layer, storing reusable information that helps the agent reason, route, and
+avoid repeated deep retrieval. This creates a memory-first, KB-grounded
+retrieval loop.
+
+Little Heta turns that pile into a persistent agent workspace:
+
+- **Wiki foundation**: raw files are compiled into stable Markdown pages with
+  numeric page ids, clean `[[Wiki Links]]`, and Git history.
+- **Vector Wiki**: each page is chunked by Markdown structure, so `heta query`
+  can jump to the right section instead of relying only on sparse `index.md`
+  summaries.
+- **Memory-first retrieval**: `heta ask` stores distilled KB insights after
+  expensive lookups, allowing later questions to reuse prior KB understanding
+  instead of repeating the same deep wiki traversal.
+- **Synchronized memory + KB management**: memory stays tied to the evolving
+  wiki. When KB content changes, related memories can be invalidated to prevent
+  stale cached insights from drifting away from the source of truth.
+- **Agent reuse**: larger teams and multi-agent workflows benefit because useful
+  KB discoveries can be reused across later questions, sessions, and agents.
+
+Heta's memory architecture stores four complementary types of information:
+
+- **Raw dialogue memory**: original user-agent interaction history, preserving
+  full context and wording.
+- **Atomic fact memory**: compact factual statements extracted from
+  interactions, useful for precise attribute or preference recall.
+- **Episodic memory**: event-level summaries that capture tasks, decisions,
+  temporal context, and multi-step work sessions.
+- **KB insight memory**: distilled insights produced after KB retrieval,
+  storing what the agent learned from external documents so future questions
+  can reuse that understanding without repeating the same expensive traversal.
+
+Retrieval quality depends heavily on corpus structure. In corpora where
+important details are buried deep inside long wiki pages and poorly represented
+by summaries, index-only wiki navigation can suffer severe retrieval collapse.
+In our initial stress scenarios, Vector Wiki and memory-backed retrieval
+improved answer accuracy by roughly **1.25x-5x+**, with some cases recovering
+from **0% to 100%** accuracy.
+
+Memory-backed reuse used **82.1% fewer tokens** than index-only wiki query and
+answered **2.58x faster** in a multi-page comparison setting. This gap is expected to
+grow in larger or messier workspaces, because index-only wiki navigation scales
+with the number and length of pages an agent may need to inspect, while
+memory-backed reuse resolves repeated questions from previously distilled
+insights. The main extra cost is the first pass that creates the reusable
+insight.
+
+## Core CLI
+
+The main commands are:
+
+- `heta init`: set up providers, workspace, and agent skills.
+- `heta status`: show provider, MinerU, wiki, memory, and space status.
+- `heta insert`: add files or folders to the knowledge base.
+- `heta query`: ask a read-only question against inserted documents.
+- `heta ask`: answer using memory and the document KB together.
+- `heta remember`: save a fact, decision, or preference.
+- `heta recall`: retrieve saved memory.
+- `heta clean`: remove generated wiki pages and vector DB while keeping raw files.
+- `heta vector`: turn document vector indexing on, off, or show status.
+- `heta insert-planning`: turn smart insert planning on, off, or show status.
+- `heta mem-show`: inspect stored KB memories.
+- `heta mem-clean`: erase memory data.
+- `heta skill`: install or refresh agent skills.
+
+Detailed command docs:
+
+- [init](docs/cli/init.md)
+- [status](docs/cli/status.md)
+- [insert](docs/cli/insert.md)
+- [query](docs/cli/query.md)
+- [ask](docs/cli/ask.md)
+- [remember](docs/cli/remember.md)
+- [recall](docs/cli/recall.md)
+- [clean](docs/cli/clean.md)
+- [vector](docs/cli/vector.md)
+- [insert-planning](docs/cli/insert-planning.md)
+- [mem-show](docs/cli/mem-show.md)
+- [mem-clean](docs/cli/mem-clean.md)
+- [skill](docs/cli/skill.md)
+
+## Supported Files
+
+Little Heta can insert:
+
+- Markdown and text: `.md`, `.markdown`, `.txt`
+- PDF and Office: `.pdf`, `.doc`, `.docx`, `.ppt`, `.pptx`, `.xls`, `.xlsx`
+- Images: `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`
+- Audio and video transcripts: `.mp3`, `.wav`, `.m4a`, `.flac`, `.ogg`, `.mp4`
+- Code and config files: `.py`, `.js`, `.ts`, `.tsx`, `.jsx`, `.java`, `.go`,
+  `.rs`, `.cpp`, `.c`, `.h`, `.hpp`, `.sh`, `.sql`, `.yaml`, `.yml`, `.json`,
+  `.toml`
+- HTML: `.html`, `.htm`
+
+PDF and Office parsing require MinerU. Images and audio/video require a
+multimodal or transcription-capable LLM provider.
+
 ## Workspace
 
-Little Heta stores local runtime data under:
+Runtime data lives under:
 
 ```text
 ~/.heta/
 ```
 
-The workspace contains raw source files, generated wiki pages, worktrees, and
-the local database used by the vector index. Runtime workspace data is not
-intended to be committed to this repository.
+Important paths:
+
+```text
+~/.heta/heta.yaml                              config
+~/.heta/workspace/kb/raw                       archived source files
+~/.heta/workspace/kb/wiki/index.md            wiki entry index
+~/.heta/workspace/kb/wiki/pages/              generated Markdown wiki pages
+~/.heta/workspace/kb/wiki/log.md              wiki operation log
+~/.heta/workspace/kb/db/wiki_vectors.sqlite3  local wiki vector database
+~/.heta/workspace/mem/mem.sqlite3             local memory database
+~/.heta/skills/heta/                          portable Little Heta agent skill
+```
 
 ## Development
 
@@ -113,11 +247,18 @@ pytest
 Project layout:
 
 ```text
-src/heta/          CLI, config, providers, and KB implementation
+src/heta/          CLI, config, assistants, memory, and KB implementation
+docs/              user and technical documentation
 tests/             unit tests
 pyproject.toml     package metadata and dependencies
 ```
 
+## Community
+
+If Little Heta is useful to you, please consider giving the project a star. If
+you run into bugs, rough edges, or missing workflows, open an issue and tell us
+what happened.
+
 ## License
 
 Little Heta is released under the MIT License. See [LICENSE](LICENSE).
diff --git a/docs/assets/little-heta-banner.png b/docs/assets/little-heta-banner.png