Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ jobs:
MAX_ATTEMPTS=30
SLEEP_SECONDS=10

echo "Attempting to install es2==${WHEEL_VERSION} from TestPyPI..."
echo "Attempting to install pyenvector==${WHEEL_VERSION} from TestPyPI..."
ATTEMPTS=0
while true; do
ATTEMPTS=$((ATTEMPTS + 1))
Expand Down
4 changes: 1 addition & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,8 @@ keys/
VECTORSTORE.md

# External symlinks (local workspace references)
es2-msa
es2-msa/
es2-deploy
es2-deploy/
envector-deployment/

# Local helper scripts
run_unit_tests.py
Expand Down
8 changes: 4 additions & 4 deletions CONTRIBUTE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ Thanks for your interest in improving the project! This guide covers local setup

## Testing
- **Unit tests** (fakes only): `python run_unit_tests.py`
- **Integration tests** (requires ES2 server + keys):
- Export `ES2_ADDRESS`, `ES2_KEY_PATH`, `ES2_KEY_ID`
- Optional: `ES2_USE_EMBEDDINGS=1`, `ES2_EMB_MODEL`, `ES2_USE_HF_DATASET=1`
- **Integration tests** (requires EnVector server + keys):
- Export `ENVECTOR_ADDRESS`, `ENVECTOR_KEY_PATH`, `ENVECTOR_KEY_ID`
- Optional: `ENVECTOR_USE_EMBEDDINGS=1`, `ENVECTOR_EMB_MODEL`, `ENVECTOR_USE_HF_DATASET=1`
- Run `pytest -m integration -s`

Please run relevant tests before submitting a PR and mention coverage in the description.

## Development Guidelines
- Keep code, comments, and docs in English.
- Prefer the high-level `es2` SDK APIs; avoid direct gRPC/indexer calls unless required.
- Prefer the high-level `pyenvector` SDK APIs; avoid direct gRPC/indexer calls unless required.
- Keep changes focused and documented; update README or notebooks when behavior changes.
- Follow existing formatting and type-hint conventions.

Expand Down
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# LangChain Envector Integration

Encrypted vector search for LangChain using Envector (ES2), powered by homomorphic encryption (CKKS). This repo ships a LangChain-compatible VectorStore and retriever utilities built on the high-level `es2` Python SDK.
Encrypted vector search for LangChain using Envector, powered by homomorphic encryption (CKKS). This repo ships a LangChain-compatible VectorStore and retriever utilities built on the high-level `pyenvector` Python SDK.

## Features
- LangChain `VectorStore` interface with `similarity_search`, `from_texts`, etc.
Expand All @@ -13,10 +13,10 @@ Encrypted vector search for LangChain using Envector (ES2), powered by homomorph
- `python3.11 -m venv .venv && source .venv/bin/activate`
- Install runtime dependencies:
- `pip install -U pip setuptools wheel`
- `pip install es2 langchain sentence-transformers`
- `pip install pyenvector langchain sentence-transformers`

## Usage Overview
1. Configure Envector using `EnvectorConfig`, pointing to your ES2 endpoint and keys.
1. Configure Envector using `EnvectorConfig`, pointing to your EnVector endpoint and keys.
2. Initialize embeddings (or provide pre-computed vectors).
3. Instantiate `Envector(config=cfg, embeddings=emb)` and call `add_texts`, `add_documents`, or use `as_retriever`.
4. Run `similarity_search` or plug the retriever into your LangChain pipeline.
Expand All @@ -25,13 +25,13 @@ Encrypted vector search for LangChain using Envector (ES2), powered by homomorph

## Configuration
Key dataclasses live in `libs/envector/config.py`:
- `ConnectionConfig`: address or host/port for ES2.
- `ConnectionConfig`: address or host/port for EnVector.
- `KeyConfig`: key path, key ID, optional preset/eval mode.
- `IndexSettings`: index name, dimension (32–4096), query encryption mode, optional output fields and fetch parameters.
- `EnvectorConfig`: wraps the above and enables auto-creation via `create_if_missing`.

## Data Model
- Each vector stores a single `metadata` string in ES2.
- Each vector stores a single `metadata` string in EnVector.
- To align with LangChain’s `Document`, inserts wrap data as JSON: `{"text": ..., "metadata": ...}`.
- Retrieval unwraps JSON, returning `Document(page_content=text, metadata={...})`.
- Client-side filtering requires the JSON envelope to include an object under `metadata`.
Expand All @@ -48,12 +48,12 @@ Key dataclasses live in `libs/envector/config.py`:

cfg = EnvectorConfig(
connection=ConnectionConfig(
address=ES2_ADDRESS,
access_token=ES2_ACCESS_TOKEN
address=ENVECTOR_ADDRESS,
access_token=ENVECTOR_ACCESS_TOKEN
),
key=KeyConfig(
key_path=ES2_KEY_PATH,
key_id=ES2_KEY_ID,
key_path=ENVECTOR_KEY_PATH,
key_id=ENVECTOR_KEY_ID,
preset="ip",
eval_mode="rmp"
),
Expand Down Expand Up @@ -100,18 +100,18 @@ Key dataclasses live in `libs/envector/config.py`:
The methods `similarity_search` and `similarity_search_with_vector` (with `embeddings.embed_query()`) are also available to perform vector search.
Comment thread
euphoria0-0 marked this conversation as resolved.

## Troubleshooting
- Connection issues: verify ES2 address and registered keys.
- Connection issues: verify EnVector address and registered keys.
- Embeddings mismatch: ensure embedding dimension equals `index.dim` when supplying vectors.
- Unexpected raw strings: confirm inserts used the JSON envelope.
- Key Issues: check key's metadata to sync with the registered key if facing any key issue.

## Testing Without ES2
- Run unit tests offline (no ES2 or SDK required):
## Testing Without EnVector
- Run unit tests offline (no EnVector or SDK required):
- `python -m pytest -q -m "not integration"`
- or `python scripts/run_unit_tests.py`
- Run integration tests (requires server and keys):
- Export `ES2_ADDRESS`, `ES2_KEY_PATH`, `ES2_KEY_ID`
- Optional: `ES2_USE_EMBEDDINGS=1`, `ES2_EMB_MODEL`, `ES2_USE_HF_DATASET=1`
- Export `ENVECTOR_ADDRESS`, `ENVECTOR_KEY_PATH`, `ENVECTOR_KEY_ID`
- Optional: `ENVECTOR_USE_EMBEDDINGS=1`, `ENVECTOR_EMB_MODEL`, `ENVECTOR_USE_HF_DATASET=1`
- `python -m pytest -q -m integration -s`

## Contributing
Expand Down
6 changes: 3 additions & 3 deletions libs/envector/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Envector (LangChain VectorStore)

High-level VectorStore adaptor for Envector (ES2), using the `es2` SDK. Vectors are always encrypted on the server; the SDK performs required crypto client-side.
High-level VectorStore adaptor for Envector, using the `pyenvector` SDK. Vectors are always encrypted on the server; the SDK performs required crypto client-side.

Key points
- Use high-level `es2.ES2` and `es2.Index`; avoid low-level `es2.api.Indexer`/gRPC.
- Use high-level `pyenvector.EnvectorClient` and `pyenvector.Index`; avoid low-level `pyenvector.api.Indexer`/gRPC.
- Index encryption is fixed to `cipher`. Query can be `plain` or `cipher`.
- Metadata is stored as a single JSON string per item: `{id, text, metadata}`.

Files
- `config.py`: Configuration dataclasses (connection, key, index).
- `client.py`: Initializes ES2 + index and returns an `Index` instance.
- `client.py`: Initializes EnVector + index and returns an `Index` instance.
- `vectorstore.py`: `Envector` VectorStore implementation.
- `retriever.py`: Optional wrapper retriever.
- `examples/`: Minimal examples.
Expand Down
2 changes: 1 addition & 1 deletion libs/envector/examples/basic_usage.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Basic usage example for Envector VectorStore.

Requirements:
- `es2`
- `pyenvector`
- `langchain` (version providing VectorStore APIs)
- An embeddings backend, e.g. sentence-transformers
"""
Expand Down
2 changes: 1 addition & 1 deletion libs/envector/examples/ingest_synthetic_1k.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Ingest the synthetic 1K dataset into Envector.

Requires:
- ES2 server and keys.
- EnVector server and keys.
- Dataset at `data/synthetic_rag_1k.jsonl` (run scripts/make_synthetic_rag_dataset.py).

Usage:
Expand Down
2 changes: 1 addition & 1 deletion libs/envector/langchain_envector/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Envector LangChain integration package.

Provides a LangChain-compatible VectorStore that wraps the high-level `es2` SDK.
Provides a LangChain-compatible VectorStore that wraps the high-level `pyenvector` SDK.
All code and comments are in English as per project rules.
"""

Expand Down
32 changes: 16 additions & 16 deletions libs/envector/langchain_envector/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,45 @@


class EnvectorClient:
"""Thin convenience client around the high-level `es2` SDK.
"""Thin convenience client around the high-level `pyenvector` SDK.

- Establishes a connection
- Initializes key and index configuration
- Optionally creates the index if missing
- Provides access to the ES2 `Index` instance
- Provides access to the envector `Index` instance
"""

def __init__(self, config: EnvectorConfig):
self.config = config
self._es2 = None
self._ev = None
self._index = None

def init(self):
import es2
import pyenvector as ev

c = self.config.connection
k = self.config.key
i = self.config.index

es2_client = es2.ES2()
ev_client = ev.EnvectorClient()

# Connection
if c.address:
es2_client.init_connect(address=c.address, access_token=c.access_token)
ev_client.init_connect(address=c.address, access_token=c.access_token)
else:
if not (c.host and c.port):
raise ValueError("Either address or host+port must be provided.")
es2_client.init_connect(
ev_client.init_connect(
host=c.host, port=c.port, access_token=c.access_token
)

# Key path baseline for Index
from es2.index import Index as _Index
from pyenvector.index import Index as _Index

_Index.init_key_path(k.key_path)

# Index config + key setup
es2_client.init_index_config(
ev_client.init_index_config(
index_name=i.index_name,
dim=i.dim,
key_path=k.key_path,
Expand All @@ -59,13 +59,13 @@ def init(self):

# Create index if missing
if self.config.create_if_missing:
idx_list = es2_client.get_index_list()
idx_list = ev_client.get_index_list()
if i.index_name not in idx_list:
es2_client.create_index(index_name=i.index_name, dim=i.dim)
ev_client.create_index(index_name=i.index_name, dim=i.dim)

# Bind index instance
self._index = es2.Index(i.index_name)
self._es2 = es2_client
self._index = ev.Index(i.index_name)
self._ev = ev_client
return self

@property
Expand All @@ -75,7 +75,7 @@ def index(self):
return self._index

@property
def es2(self):
if self._es2 is None:
def ev(self):
if self._ev is None:
raise RuntimeError("Client not initialized. Call init().")
return self._es2
return self._ev
8 changes: 4 additions & 4 deletions libs/envector/langchain_envector/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ class SearchResult:


def pack_metadata(text: str, metadata: Optional[Dict[str, Any]] = None) -> str:
"""Pack text and metadata into a single JSON string field accepted by ES2.
"""Pack text and metadata into a single JSON string field accepted by pyenvector.

ES2 metadata API stores lists of strings; we store a single JSON blob per item.
pyenvector metadata API stores lists of strings; we store a single JSON blob per item.
Item-level IDs are not persisted/addressable.
"""
import json
Expand All @@ -46,7 +46,7 @@ def pack_metadata(text: str, metadata: Optional[Dict[str, Any]] = None) -> str:
def unpack_metadata(raw: Any) -> Dict[str, Any]:
"""Return metadata as a dict regardless of the raw payload type.

Recent ES2 versions may return decrypted metadata as a Python dict instead
Recent pyenvector versions may return decrypted metadata as a Python dict instead
of the JSON string we originally stored. We normalise the payload here so
downstream code always works with a dictionary.
"""
Expand Down Expand Up @@ -79,7 +79,7 @@ def unpack_metadata(raw: Any) -> Dict[str, Any]:
if isinstance(data, dict):
return data
except Exception:
# Some ES2 responses return Python-literal strings (single quotes).
# Some pyenvector responses return Python-literal strings (single quotes).
try:
import ast

Expand Down
12 changes: 6 additions & 6 deletions libs/envector/langchain_envector/vectorstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ def __init__(


class Envector(VectorStore): # type: ignore[misc]
"""LangChain-compatible VectorStore adaptor for Envector (ES2).
"""LangChain-compatible VectorStore adaptor for Envector.

This class wraps the high-level `es2` SDK. It does not use low-level
gRPC stubs or `es2.api.Indexer` directly.
This class wraps the high-level `pyenvector` SDK. It does not use low-level
gRPC stubs or `pyenvector.api.Indexer` directly.
"""

def __init__(
Expand Down Expand Up @@ -89,7 +89,7 @@ def add_texts(
# Prepare metadata JSON strings per item
packed = [pack_metadata(t, m) for t, m in zip(texts, metadatas)]

# Insert using high-level ES2 Index
# Insert using high-level pyenvector Index
result_ids = self.client.index.insert(data=vectors, metadata=packed)

# Return ephemeral placeholders to satisfy VectorStore interface,
Expand All @@ -111,7 +111,7 @@ def _similarity_search_with_scores(
results = self.client.index.search(
query=embedding, top_k=top_k, output_fields=self.config.index.output_fields
)
# ES2 Index.search returns a list for each query; we passed single query
# pyenvector Index.search returns a list for each query; we passed single query
result = (
results[0]
if isinstance(results, list) and results and isinstance(results[0], list)
Expand Down Expand Up @@ -265,7 +265,7 @@ def add_documents(
extracting `page_content` and `metadata` from each Document.

Notes:
- Manual `ids` are ignored (ES2 does not support user-provided IDs).
- Manual `ids` are ignored (EnVector does not support user-provided IDs).
- When `embeddings` is not configured, you must supply `vectors`.
- Returns ephemeral IDs as produced by the client insert.
"""
Expand Down
8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ build-backend = "setuptools.build_meta"

[project]
name = "langchain-envector"
version = "0.1.2"
description = "LangChain VectorStore integration for Envector (ES2) encrypted vector search"
version = "0.1.3"
description = "LangChain VectorStore integration for Envector"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.9,<3.14"
authors = [
{ name = "Envector Contributors" }
]
dependencies = [
"es2",
"pyenvector",
"langchain>=0.2.0",
]
keywords = ["langchain", "vectorstore", "homomorphic-encryption", "ckks", "encrypted-search", "envector", "es2"]
keywords = ["langchain", "vectorstore", "homomorphic-encryption", "ckks", "encrypted-search", "envector", "pyenvector"]
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
Expand Down
2 changes: 1 addition & 1 deletion pytest.ini
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[pytest]
markers =
integration: tests that require a running ES2 server and the real es2 SDK
integration: tests that require a running EnVector server and the real EnVector SDK
testpaths =
tests

Loading