NotificationsYou must be signed in to change notification settings
Fork1
Star12

Commitba7ed3f

committed

feat(eval): add sample evaluation dataset, README docs, smoke CI job

1 parent26c9116 commitba7ed3fCopy full SHA for ba7ed3f

File tree

4 files changed

+65

-1

lines changed

4 files changed

+65

-1

lines changed

`‎.github/workflows/ci.yml‎`

Lines changed: 30 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -51,3 +51,33 @@ jobs:`
`51`	`51`	`with:`
`52`	`52`	`name:coverage-xml-${{ matrix.python-version }}-${{ github.run_id }}`
`53`	`53`	`path:coverage.xml`
	`54`	`+`
	`55`	`+smoke-eval:`
	`56`	`+runs-on:ubuntu-latest`
	`57`	`+needs:ci`
	`58`	`+steps:`
	`59`	`+ -uses:actions/checkout@v4`
	`60`	`+ -name:Setup Python`
	`61`	`+uses:actions/setup-python@v5`
	`62`	`+with:`
	`63`	`+python-version:'3.11'`
	`64`	`+ -name:Install (runtime + ops extra)`
	`65`	`+run:\|`
	`66`	`+ python -m pip install --upgrade "pip>=25.1.1,<25.2" setuptools`
	`67`	`+ pip install .[ops]`
	`68`	`+ -name:Launch API (background)`
	`69`	`+run:\|`
	`70`	`+ uvicorn neuralcache.api.server:app --port 8080 &`
	`71`	`+ echo $! > api.pid`
	`72`	`+ sleep 2`
	`73`	`+ -name:Smoke eval (sample dataset)`
	`74`	`+run:\|`
	`75`	`+ python scripts/eval_context_use.py --api http://127.0.0.1:8080 --data data/sample_eval.jsonl --out sample_eval.csv --top-k 3`
	`76`	`+ test -f sample_eval.csv`
	`77`	`+ -name:Show eval summary`
	`78`	`+if:always()`
	`79`	`+run:head -n 10 sample_eval.csv \|\| true`
	`80`	`+ -name:Stop API`
	`81`	`+if:always()`
	`82`	`+run:\|`
	`83`	`+ kill $(cat api.pid) \|\|true`

`‎CHANGELOG.md‎`

Lines changed: 5 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,17 +8,22 @@ All notable changes to this project will be documented in this file. The format`
`8`	`8`	`- SECURITY.md with disclosure process, supported versions guidance, and dependency audit integration (pip-audit)`
`9`	`9`	- Optional extras separation clarified:`adapters`,`ops`,`embeddings` now explicitly documented in README
`10`	`10`	`- README summary paragraph positioning NeuralCache as opinionated, stateful reranking layer`
	`11`	`+- Structured API success + error envelopes with standardized error codes (see README)`
	`12`	+- Scoring pipeline specification (`docs/SCORING_MODEL.md`) detailing dense, narrative, pheromone, MMR, and exploration fusion formula
	`13`	+- Sample evaluation dataset (`data/sample_eval.jsonl`) for quick Context-Use@K smoke tests
`11`	`14`
`12`	`15`	`###Changed`
`13`	`16`	`- Build bootstrap hardening: enforce safe pip range excluding 25.2 (GHSA-4xh5-x5gv-qwph) and updated setuptools minimum`
`14`	`17`	`- CI workflow now upgrades both pip and setuptools prior to installation; pip-audit runs post-install`
	`18`	`+- Broadened dependency ranges (FastAPI, Starlette, Uvicorn) to reduce upgrade churn while retaining safety bounds`
`15`	`19`
`16`	`20`	`###Fixed`
`17`	`21`	- Prevent accidental install of vulnerable pip version 25.2 by pinning`<25.2`
`18`	`22`	`- Improved alignment between PyPI metadata and README summary`
`19`	`23`
`20`	`24`	`###Notes`
`21`	`25`	`- Next minor (0.4.x) will introduce formal API versioning header, deterministic mode, standardized error envelopes, and scoring spec documentation (tracked in issues)`
	`26`	`+(Structured envelopes + scoring spec landed early in 0.3.1; versioning header + deterministic mode still pending.)`
`22`	`27`
`23`	`28`	`###Added`
`24`	`29`	`- Cognitive gating layer with entropy-aware candidate trimming and configuration overrides`

`‎README.md‎`

Lines changed: 25 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -234,7 +234,31 @@ Use the generated CSV to inspect which queries improved, regressions, and latenc`
`234`	`234`
`235`	`235`	`###Sample datasets`
`236`	`236`
`237`		-The previous synthetic Context-Use demo is being redesigned. We’ll publish a refreshed walkthrough once the new baseline is validated. In the meantime you can point`scripts/eval_context_use.py` at your own JSONL datasets to measure uplift between any two rerankers.
	`237`	+We ship a small, neutral illustrative dataset at`data/sample_eval.jsonl` (5 queries) covering:
	`238`	`+`
	`239`	`+- Stigmergy concept recall`
	`240`	`+- MMR rationale`
	`241`	`+- ε-greedy exploration purpose`
	`242`	`+- Pheromone decay motivation`
	`243`	`+- Narrative memory function`
	`244`	`+`
	`245`	`+Each line contains:`
	`246`	`+`
	`247`	+```json
	`248`	`+{"query":"...","docs": [{"id":"d1","text":"..."},...],"answer":"..."}`
	`249`	+```
	`250`	`+`
	`251`	`+Run a smoke eval against a locally running API:`
	`252`	`+`
	`253`	+```bash
	`254`	`+python scripts/eval_context_use.py \`
	`255`	`+ --api http://127.0.0.1:8080 \`
	`256`	`+ --data data/sample_eval.jsonl \`
	`257`	`+ --out reports/sample_eval.csv \`
	`258`	`+ --top-k 3`
	`259`	+```
	`260`	`+`
	`261`	+Inspect`reports/sample_eval.csv` for per-query hits. Extend by appending more JSONL lines that follow the same schema; avoid sensitive data—this file is published.
`238`	`262`
`239`	`263`	`---`
`240`	`264`

`‎data/sample_eval.jsonl‎`

Lines changed: 5 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,5 @@`
	`1`	`+{"query":"What is stigmergy?","docs": [{"id":"d1","text":"Stigmergy is indirect coordination mediated by environment-modifying traces."}, {"id":"d2","text":"Reinforcement learning uses rewards to optimize actions."}, {"id":"d3","text":"Indirect coordination systems appear in social insects like ants."}, {"id":"d4","text":"A vector database stores embeddings for similarity search."}],"answer":"Stigmergy is indirect coordination via environmental traces."}`
	`2`	`+{"query":"Why apply MMR in reranking?","docs": [{"id":"d5","text":"MMR reduces redundancy by balancing relevance and diversity."}, {"id":"d6","text":"Cosine similarity measures angle between embedding vectors."}, {"id":"d7","text":"Diversity in retrieved contexts prevents repeated evidence."}],"answer":"To reduce redundancy; MMR balances relevance with diversity."}`
	`3`	`+{"query":"What does epsilon-greedy do?","docs": [{"id":"d8","text":"Epsilon-greedy occasionally explores random candidates."}, {"id":"d9","text":"Exploration helps avoid local optima in adaptive systems."}, {"id":"d10","text":"Embedding models map text into vector space."}],"answer":"It occasionally explores alternatives (epsilon-greedy) to avoid local optima."}`
	`4`	`+{"query":"Why decay pheromones?","docs": [{"id":"d11","text":"Pheromone decay prevents stale documents from dominating."}, {"id":"d12","text":"EMA narrative memory emphasizes recent successful context windows."}, {"id":"d13","text":"Caching strategies can include LRU or LFU."}],"answer":"Decay prevents stale documents from dominating future reranks."}`
	`5`	`+{"query":"What is narrative memory?","docs": [{"id":"d14","text":"Narrative memory tracks successful multi-turn context windows with an EMA."}, {"id":"d15","text":"Transformers use self-attention to model token dependencies."}, {"id":"d16","text":"SQLite provides transactional persistence for small apps."}],"answer":"It tracks successful multi-turn context windows (EMA narrative memory)."}`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commitba7ed3f

File tree

4 files changed

4 files changed

`‎.github/workflows/ci.yml‎`

`‎CHANGELOG.md‎`

`‎README.md‎`

`‎data/sample_eval.jsonl‎`

0 commit comments