Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

NeuralCache is a drop-in reranker for Retrieval-Augmented Generation (RAG) that learns which context the model actually uses.

License

NotificationsYou must be signed in to change notification settings

Maverick0351a/neuralcache

Repository files navigation

Carnot Engine

NeuralCache 🧠⚡

Adaptive reranker for Retrieval-Augmented Generation (RAG)

PyPIDockerCodeQLLicenseGitHub starsCoverage

NeuralCache is a lightweight reranker for RAG pipelines thatactually remembers what helped. It blends dense semantic similarity with a narrative memory of past wins and stigmergic pheromones that reward helpful passages while decaying stale ones—then spices in MMR diversity and ε-greedy exploration. The result: more relevant context for your LLM without rebuilding your stack.

NeuralCache is an opinionated, stateful reranking layer designed to increase practical usefulness of RAG retrieval results by remembering what historically mattered, decaying stale signals, maintaining diversity, and optimizing compute via intelligent gating. The repository is production-minded (CI, packaging, adapters, metrics) yet approachable with minimal dependencies out of the box. Its architecture cleanly separates scoring components, adapters, and API surfaces, making it a solid foundation for iterative improvement and integration into existing LLM pipelines.

This repository open-sources the NeuralCache reranker. The broader “Cognitive Tetrad” engine remains proprietary IP and is not included here.


⚡ 60-second quickstart

# 1. Installpip install neuralcache# 2. Launch the API (Ctrl+C to stop)uvicorn neuralcache.api.server:app --port 8080 --reload# 3. Hit the rerankercurl -s -X POST http://127.0.0.1:8080/rerank \  -H"Content-Type: application/json" \  -d'{    "query":"What is stigmergy?",    "documents":[      {"id":"a","text":"Stigmergy is indirect coordination via shared context."},      {"id":"b","text":"Vector DBs store embeddings for retrieval."}    ],    "top_k":2  }'| python -m json.tool

Prefer a single command? 👇

pip install neuralcache&& \uvicorn neuralcache.api.server:app --port 8080 --reload& \server_pid=$!&& sleep 3&& \curl -s -X POST http://127.0.0.1:8080/rerank -H"Content-Type: application/json" \     -d'{"query":"What is stigmergy?","documents":[{"id":"a","text":"Stigmergy is indirect coordination."},{"id":"b","text":"Vector DBs store embeddings."}],"top_k":2}'| python -m json.tool&& \kill$server_pid

Need batch reranking or Prometheus metrics?

pip install neuralcache[ops]uvicorn neuralcache.api.server_plus:app --port 8081 --reload
  • Batch endpoint:POST http://127.0.0.1:8081/rerank/batch
  • Metrics scrape:GET http://127.0.0.1:8081/metrics (requires theprometheus-client dependency supplied by theops extra)
  • Legacy routes remain available under/v1/...

Why teams choose NeuralCache

  • Drop-in reranker for any retriever that can send JSON. Works with Pinecone, Weaviate, Qdrant, Chroma—or your own Postgres table.
  • Narrative memory (EMA) keeps track of passages that consistently helped users, biasing future reranks toward them.
  • Stigmergic pheromones reward useful documents but decay over time, preventing filter bubbles.
  • MMR + ε-greedy introduces diversity without tanking relevance.
  • Zero external dependencies by default. Uses a hashing trick for embeddings so you can see results instantly, but slots in any vector model when you’re ready.
  • Adapters included. LangChain and LlamaIndex adapters ship inneuralcache.adapters; install them on demand withpip install "neuralcache[adapters]".
  • CLI + REST API + FastAPI docs give you multiple ways to integrate and debug.
  • Plus API adds/rerank/batch and Prometheus-ready/metrics endpoints when you runuvicorn neuralcache.api.server_plus:app (install theneuralcache[ops] extra for dependencies).
  • SQLite persistence out of the box.neuralcache.storage.sqlite_state.SQLiteState keeps narrative + pheromone state durable across workers without JSON file juggling.
  • Cognitive gating right-sizes the rerank set on the fly, trimming obvious non-starters to save downstream tokens without losing recall.
  • Transparent scoring spec documented indocs/SCORING_MODEL.md for auditability and reproducible benchmarks.

Use cases

  • Customer support copilots → surface articles with the exact resolution steps.
  • Internal knowledge bases → highlight documents that past agents actually referenced.
  • Vertical SaaS (legal/health/finance) → pair compliance-ready snippets with LLM summaries.
  • Evaluation harnesses → measure and tune Context-Use@K uplift before going live.

How it works

SignalWhat it capturesWhy it matters
Dense similarityCosine distance over embeddings (hash-based fallback out of the box)Makes sure obviously relevant passages rank high.
Narrative EMAExponential moving average of successful context windowsRemembers story arcs across multi-turn conversations.
Stigmergic pheromonesExposure-aware reinforcement with decayRewards docs that helpedrecently while fading stale ones.
MMR diversityMaximal Marginal RelevanceReduces redundancy and surfaces complementary evidence.
ε-greedy explorationOccasional exploration of long-tail docsKeeps fresh signals flowing so the model doesn’t get stuck.

All of this is orchestrated byneuralcache.rerank.Reranker, configurable throughSettings or environment variables (NEURALCACHE_*).


Cognitive gating

NeuralCache now ships with an entropy-aware gating layer that decides how many candidates to score for each query. The gate looks at the dense similarity distribution, estimates uncertainty with a softmax entropy probe, and then uses a logistic curve to select a candidate budget between your configured min/max bounds.

  • Modes:off (never trims),auto (entropy-driven; default),on (always apply gating using provided thresholds).
  • Overrides: Pass agating_overrides dict on/rerank or/rerank/batch calls to tweak mode, min/max candidates, threshold, or temperature per request.
  • Observability: Enablereturn_debug=true to receivegating telemetry (mode, uncertainty, chosen candidate count, masked ids) alongside the rerank results.

Gating plugs in before narrative, pheromone, and MMR scoring—so downstream memories and pheromones still receive consistent updates even when the candidate pool shrinks.


Multi-tenancy & namespaces

NeuralCache now supports lightweight logical isolation using a namespace header:

X-NeuralCache-Namespace: tenantA

If omitted, thedefault namespace is used. Narrative + pheromone feedback effects do not bleed across namespaces. SeeMULTITENANCY.md for deeper design notes.

SettingPurposeDefault
NEURALCACHE_NAMESPACE_HEADERHeader key to read namespaceX-NeuralCache-Namespace
NEURALCACHE_DEFAULT_NAMESPACEFallback namespace when header missingdefault
NEURALCACHE_NAMESPACE_PATTERNValidation regex (400 on mismatch)^[a-zA-Z0-9_.-]{1,64}$
NEURALCACHE_MAX_NAMESPACESOptional cap on total in-memory namespaces (including default); LRU evicts oldest non-default when exceededunset
NEURALCACHE_NAMESPACE_EVICTION_POLICYEviction strategy (currently onlylru)lru
NEURALCACHE_METRICS_NAMESPACE_LABELIftrue, addsnamespace label to rerank metrics familiesfalse
NEURALCACHE_NAMESPACED_PERSISTENCEIftrue, per-namespace narrative + pheromone JSON files are usedfalse
NEURALCACHE_NARRATIVE_STORE_TEMPLATETemplate for per-namespace narrative filenarrative.{namespace}.json
NEURALCACHE_PHEROMONE_STORE_TEMPLATETemplate for per-namespace pheromone filepheromones.{namespace}.json

Invalid namespaces return a standardized error envelope:

{"error": {"code":"BAD_REQUEST","message":"Invalid namespace","detail":null  }}

Standardized error envelopes

All errors (including validation) resolve to a stable shape documented indocs/ERROR_ENVELOPES.md:

{"error": {"code":"VALIDATION_ERROR","message":"Validation failed","detail": [ {"loc": ["body","query"],"msg":"Field required" } ]  }}

Common codes:BAD_REQUEST,UNAUTHORIZED,NOT_FOUND,ENTITY_TOO_LARGE,VALIDATION_ERROR,RATE_LIMITED,INTERNAL_ERROR.


Privacy & data handling

A concise operator playbook for data classification, retention, and namespace isolation is available inPRIVACY.md. Before production, review bothPRIVACY.md andSECURITY.md and set appropriate retention and auth settings.


Configuration essentials

Env varPurposeDefault
NEURALCACHE_WEIGHT_DENSEWeight on dense similarity1.0
NEURALCACHE_WEIGHT_NARRATIVEWeight on narrative memory0.6
NEURALCACHE_WEIGHT_PHEROMONEWeight on pheromone signal0.3
NEURALCACHE_MAX_DOCUMENTSSafety cap on rerank set size128
NEURALCACHE_MAX_TEXT_LENGTHHard limit on document length (characters)8192
NEURALCACHE_STORAGE_DIRWhere SQLite + JSON state is storedstorage/
NEURALCACHE_STORAGE_PERSISTENCE_ENABLEDDisable to keep narrative + pheromones in-memory onlytrue
NEURALCACHE_STORAGE_RETENTION_DAYSDays before old state is purged on boot (supports SQLite + JSON)unset
NEURALCACHE_STORAGE_RETENTION_SWEEP_INTERVAL_SInterval (seconds) for background retention sweeper (0 disables)0
NEURALCACHE_STORAGE_RETENTION_SWEEP_ON_STARTRun a purge cycle synchronously at startup when truetrue
NEURALCACHE_GATING_MODECognitive gate mode (off,auto,on)auto
NEURALCACHE_GATING_THRESHOLDUncertainty threshold for trimming0.45
NEURALCACHE_GATING_MIN_CANDIDATESLower bound for rerank candidates8
NEURALCACHE_GATING_MAX_CANDIDATESUpper bound for rerank candidates48
NEURALCACHE_GATING_TEMPERATURESoftmax temperature when estimating entropy1.0
NEURALCACHE_DETERMINISTICForce deterministic reranks (seed RNG, disable exploration)false
NEURALCACHE_DETERMINISTIC_SEEDSeed used when deterministic mode is enabled1337
NEURALCACHE_EPSILONOverride ε-greedy exploration rate (0-1). Ignored when deterministic.unset
NEURALCACHE_MMR_LAMBDA_DEFAULTDefault MMR lambda when request omits/nullsmmr_lambda0.5
NEURALCACHE_NAMESPACE_HEADERHeader key to read namespaceX-NeuralCache-Namespace
NEURALCACHE_DEFAULT_NAMESPACEFallback namespace when header missingdefault
NEURALCACHE_NAMESPACE_PATTERNValidation regex (400 on mismatch)^[a-zA-Z0-9_.-]{1,64}$

Adjust everything via.env, environment variables, or directSettings(...) instantiation.NEURALCACHE_EPSILON (when set) takes precedence overepsilon_greedy setting unless deterministic mode is active.NEURALCACHE_MMR_LAMBDA_DEFAULT supplies fallback diversity weighting when omitted.

Persistence happens automatically using SQLite (or JSON fallback) so narrative and pheromone stores survive restarts. PointNEURALCACHE_STORAGE_DIR at shared storage for multi-worker deployments, or importSQLiteState directly if you need to wire the persistence layer into an existing app container. Under the hood the SQLite state:

  • enablesWAL mode withsynchronous=NORMAL so multiple workers can read while a writer appends.
  • tracks ametadata row with the current schema version (SQLiteState.schema_version()), raising if a newer schema is encountered so upgrades can run explicit migrations before boot.
  • stores pheromone exposures and timestamps so retention/evaporation policies can prune long-lived records.

Evaluation: prove the uplift

We shipscripts/eval_context_use.py to measure Context-Use@K on any JSONL dataset (query, docs, answer). It can compare a baseline retriever with a NeuralCache-powered candidate. Install theneuralcache[ops] extra to pull in therequests dependency used by the script and Prometheus exporters in one go.

Want to stress-test gating specifically? Runscripts/eval_gating.py to generate a synthetic A/B comparison between the entropy-driven gate and a control configuration. The script logs summaries to stdout and writes a CSV artifact you can pull into spreadsheets or dashboards.

python scripts/eval_context_use.py \  --api http://localhost:8080 \  --data data/sample_rag.jsonl \  --out reports/neuralcache_eval.csv \  --top-k 5# Optional: compare against another API hostpython scripts/eval_context_use.py \  --api http://localhost:8000 --data data/sample_rag.jsonl \  --compare-api http://localhost:8080 --out reports/compare.csv

Example output (toy dataset):

Eval complete in 4.82s | Baseline Context-Use@5: 9/20 | NeuralCache: 13/20

Use the generated CSV to inspect which queries improved, regressions, and latency statistics.

Sample datasets

We ship a small, neutral illustrative dataset atdata/sample_eval.jsonl (5 queries) covering:

  • Stigmergy concept recall
  • MMR rationale
  • ε-greedy exploration purpose
  • Pheromone decay motivation
  • Narrative memory function

Each line contains:

{"query":"...","docs": [{"id":"d1","text":"..."},...],"answer":"..."}

Run a smoke eval against a locally running API:

python scripts/eval_context_use.py \  --api http://127.0.0.1:8080 \  --data data/sample_eval.jsonl \  --out reports/sample_eval.csv \  --top-k 3

Inspectreports/sample_eval.csv for per-query hits. Extend by appending more JSONL lines that follow the same schema; avoid sensitive data—this file is published.


Project layout

neuralcache/├─ assets/                # Logos, diagrams, and other static media├─ examples/              # Quickstart notebooks and scripts├─ scripts/               # Evaluation + operational tooling├─ src/neuralcache/│  ├─ api/                # FastAPI app exposing REST endpoints│  ├─ adapters/           # LangChain + LlamaIndex integrations│  ├─ metrics/            # Context-Use@K helpers & Prometheus hooks│  ├─ gating.py           # Cognitive gating heuristics│  ├─ narrative.py        # Narrative memory tracker│  ├─ pheromone.py        # Pheromone store with decay/exposure logic│  ├─ rerank.py           # Core reranking orchestrator│  └─ config.py           # Pydantic Settings (env + .env aware)├─ tests/                 # Pytest suite (unit + adapter sanity)└─ .github/workflows/     # CI, lint, release, docker, code scanning

Metrics & observability

  • /metrics exposes Prometheus counters for request volume, success rate, and Context-Use@K proxy. Install theneuralcache[ops] extra (bundlesprometheus-client) and run the Plus API for an out-of-the-box scrape target.
  • Structured logging (viarich + standard logging) shows rerank decisions with scores.
  • Extend telemetry by dropping in OpenTelemetry exporters or shipping events to your own observability stack.

Roadmap

  • ✅ SQLite persistence (drop-in)
  • ✅ Batch/rerank endpoint
  • ✅ LangChain + LlamaIndex adapters
  • ✅ Namespace eviction (LRU)
  • ✅ Namespaced persistence (optional JSON templates)
  • ✅ Metrics namespace labeling (opt-in)
  • ☐ Semantic Context-Use@K metric
  • ☐ Prometheus/OpenTelemetry exporters
  • ☐ Optional Rust / Numba core for hot loops

Have ideas?Open an issue or grab a ticket.


Contributing & community

pip install -e .[dev,test]pre-commit installruff check&& mypy&& pytest --cov=neuralcache --cov-report=term-missing
  • Look forgood first issues.
  • Add test coverage for user-visible changes.
  • Coverage gate currently enforces >=89%. We'll continue to ratchet this upward as core adaptive components gain additional tests (latest uplift added namespace isolation, eviction, namespaced persistence, metrics namespace labeling, narrative purge stale, CR empty candidate fallback, encoder unknown-backend warning, rate limiting & API auth envelopes, batch gating debug, malformed envelopes, retention sweeper, pheromone purge, gating overrides, epsilon override, and narrative resize/skip branches).

Namespace eviction

SetNEURALCACHE_MAX_NAMESPACES to constrain memory growth in multi-tenant scenarios (edge cases where thousands of low-traffic tenants appear). When the cap is reached, the least recently used non-default namespace is evicted (policylru). The default namespace is never evicted. Access updates recency automatically.

Metrics namespace labeling

Opt-in viaNEURALCACHE_METRICS_NAMESPACE_LABEL=true to export parallel Prometheus metrics with anamespace label. Useful for per-tenant latency SLOs and request volume dashboards. When disabled, metrics remain cardinality-safe for large tenant counts.

Namespaced persistence

EnableNEURALCACHE_NAMESPACED_PERSISTENCE=true to write per-namespace narrative + pheromone JSON stores using the templates:

NEURALCACHE_NARRATIVE_STORE_TEMPLATE=narrative.{namespace}.jsonNEURALCACHE_PHEROMONE_STORE_TEMPLATE=pheromones.{namespace}.json

This allows selective archival or scrubbing of a single tenant’s adaptive state. SQLite mode continues to provide shared durable state; the namespaced JSON layer is most useful when running the lightweight default (non-SQLite) persistence path or when you want filesystem-level isolation.

  • PRs with docs, demos, and eval improvements are extra appreciated.

Optionally, join the discussion in#neuralcache on Discord (coming soon—watch this space).


Upgrading

0.3.2

Release 0.3.2 introduces multi-tenant operational features. All changes arebackward compatible; existing deployments that do nothing will behave exactly as before.

Key additions:

  • Namespace cap & eviction: setNEURALCACHE_MAX_NAMESPACES (with policyNEURALCACHE_NAMESPACE_EVICTION_POLICY=lru) to bound memory; default is unlimited.
  • Namespaced persistence: opt-in withNEURALCACHE_NAMESPACED_PERSISTENCE=true to emit per-namespace JSON state files (templates overrideable withNEURALCACHE_NARRATIVE_STORE_TEMPLATE /NEURALCACHE_PHEROMONE_STORE_TEMPLATE).
  • Metrics namespace labeling: enableNEURALCACHE_METRICS_NAMESPACE_LABEL=true to expose parallel Prometheus metric families with anamespace label. Leavefalse to avoid high-cardinality metrics.
  • Version constant bumped to 0.3.2 (neuralcache.__version__).

No breaking schema migrations were required. SQLite schema version unchanged. If you previously relied on the absence of eviction, simply leaveNEURALCACHE_MAX_NAMESPACES unset (or remove it) and behavior matches 0.3.1.

Upgrading checklist

  1. Bump dependency:pip install --upgrade neuralcache.
  2. (Optional) Export per-tenant metrics: setNEURALCACHE_METRICS_NAMESPACE_LABEL=true (assess Prometheus cardinality first).
  3. (Optional) Constrain namespace memory: setNEURALCACHE_MAX_NAMESPACES=<cap>.
  4. (Optional) Enable namespaced JSON persistence:NEURALCACHE_NAMESPACED_PERSISTENCE=true (ensure filesystem ACLs align with privacy expectations).
  5. Restart your API workers; confirm/metrics and rerank endpoints behave as expected.

Future versions will continue to maintain stability for existingSettings fields; newly added fields default to safe inactive behavior unless explicitly enabled.


License

Apache-2.0. The NeuralCache reranker is open source; the broader Cognitive Tetrad engine remains proprietary.


Automation details

Need to replicate our CI? Expand the sections below for workflow templates.

.github/workflows/ci.yml — lint, type-check, test
name:CIon:pull_request:push:branches:[ main ]jobs:ci:runs-on:ubuntu-lateststrategy:matrix:python-version:["3.11", "3.12"]steps:      -uses:actions/checkout@v4      -uses:actions/setup-python@v5with:python-version:${{ matrix.python-version }}      -uses:actions/cache@v4with:path:~/.cache/pipkey:pip-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('pyproject.toml') }}restore-keys:pip-${{ runner.os }}-${{ matrix.python-version }}-      -name:Installrun:|          python -m pip install --upgrade pip          pip install -e .[dev,test]      -name:Ruff (lint + format check)run:ruff check .      -name:Type-check (mypy)run:mypy src      -name:Pytestrun:pytest -q --maxfail=1 --disable-warnings --cov=neuralcache --cov-report=xml      -name:Upload coverage artifactuses:actions/upload-artifact@v4with:name:coverage-xmlpath:coverage.xml
.github/workflows/lint.yml — pre-commit
name:Linton:pull_request:push:branches:[ main ]jobs:precommit:runs-on:ubuntu-lateststeps:      -uses:actions/checkout@v4      -uses:actions/setup-python@v5with:python-version:"3.11"      -name:Installrun:|          python -m pip install --upgrade pip          pip install -e .[dev]      -name:Run pre-commitrun:pre-commit run --all-files
.github/workflows/tests.yml — scheduled coverage
name:Testson:workflow_dispatch:schedule:    -cron:"0 7 * * *"# daily @ 07:00 UTCjobs:tests:runs-on:ubuntu-lateststeps:      -uses:actions/checkout@v4      -uses:actions/setup-python@v5with:python-version:"3.11"      -name:Installrun:|          python -m pip install --upgrade pip          pip install -e .[test]      -name:Pytestrun:pytest -q --maxfail=1 --disable-warnings --cov=neuralcache --cov-report=xml
.github/workflows/release.yml — PyPI publish
name:Releaseon:push:tags:      -"v*.*.*"jobs:pypi:runs-on:ubuntu-lateststeps:      -uses:actions/checkout@v4      -uses:actions/setup-python@v5with:python-version:"3.11"      -name:Build sdist & wheelrun:|          python -m pip install --upgrade pip build          python -m build      -name:Publish to PyPIuses:pypa/gh-action-pypi-publish@release/v1with:password:${{ secrets.PYPI_API_TOKEN }}
.github/workflows/docker.yml — GHCR images
name:Dockeron:push:branches:[ main ]tags:      -"v*.*.*"jobs:docker:runs-on:ubuntu-latestpermissions:contents:readpackages:writesteps:      -uses:actions/checkout@v4      -name:Login to GHCRuses:docker/login-action@v3with:registry:ghcr.iousername:${{ github.actor }}password:${{ secrets.GITHUB_TOKEN }}      -name:Extract versionid:metarun:|          REF="${GITHUB_REF##*/}"          if [[ "$GITHUB_REF" == refs/tags/* ]]; then            echo "tag=$REF" >> $GITHUB_OUTPUT          else            echo "tag=latest" >> $GITHUB_OUTPUT          fi      -name:Build & pushuses:docker/build-push-action@v6with:context:.push:truetags:|            ghcr.io/${{ github.repository_owner }}/neuralcache:${{ steps.meta.outputs.tag }}            ghcr.io/${{ github.repository_owner }}/neuralcache:latest
.github/dependabot.yml
version:2updates:  -package-ecosystem:"pip"directory:"/"schedule:interval:"weekly"open-pull-requests-limit:5  -package-ecosystem:"github-actions"directory:"/"schedule:interval:"weekly"

Support the project

If NeuralCache saves you time, consider starring the repo or sharing a demo with the community. Contributions, bug reports, and evaluation results are the best way to help the project grow.


Debug envelope fields

Each/rerank response may include adebug object (structure stable across patch releases). For standardized error envelope format seedocs/ERROR_ENVELOPES.md.

FieldDescription
gatingCognitive gating decision telemetry (mode, uncertainty, counts)
deterministicTrue when deterministic mode is active (exploration disabled)
epsilon_usedEffective epsilon after env override & deterministic suppression
mmr_lambda_usedFinal MMR lambda applied (request value clamped or default)

Use this for audit logs or offline evaluation dashboards. Avoid parsing internal sub-keys ofgating beyond those documented—future versions may extend it.

About

NeuralCache is a drop-in reranker for Retrieval-Augmented Generation (RAG) that learns which context the model actually uses.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Languages


[8]ページ先頭

©2009-2026 Movatter.jp