Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitba7ed3f

Browse files
committed
feat(eval): add sample evaluation dataset, README docs, smoke CI job
1 parent26c9116 commitba7ed3f

File tree

4 files changed

+65
-1
lines changed

4 files changed

+65
-1
lines changed

‎.github/workflows/ci.yml‎

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,33 @@ jobs:
5151
with:
5252
name:coverage-xml-${{ matrix.python-version }}-${{ github.run_id }}
5353
path:coverage.xml
54+
55+
smoke-eval:
56+
runs-on:ubuntu-latest
57+
needs:ci
58+
steps:
59+
-uses:actions/checkout@v4
60+
-name:Setup Python
61+
uses:actions/setup-python@v5
62+
with:
63+
python-version:'3.11'
64+
-name:Install (runtime + ops extra)
65+
run:|
66+
python -m pip install --upgrade "pip>=25.1.1,<25.2" setuptools
67+
pip install .[ops]
68+
-name:Launch API (background)
69+
run:|
70+
uvicorn neuralcache.api.server:app --port 8080 &
71+
echo $! > api.pid
72+
sleep 2
73+
-name:Smoke eval (sample dataset)
74+
run:|
75+
python scripts/eval_context_use.py --api http://127.0.0.1:8080 --data data/sample_eval.jsonl --out sample_eval.csv --top-k 3
76+
test -f sample_eval.csv
77+
-name:Show eval summary
78+
if:always()
79+
run:head -n 10 sample_eval.csv || true
80+
-name:Stop API
81+
if:always()
82+
run:|
83+
kill $(cat api.pid) ||true

‎CHANGELOG.md‎

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,22 @@ All notable changes to this project will be documented in this file. The format
88
- SECURITY.md with disclosure process, supported versions guidance, and dependency audit integration (pip-audit)
99
- Optional extras separation clarified:`adapters`,`ops`,`embeddings` now explicitly documented in README
1010
- README summary paragraph positioning NeuralCache as opinionated, stateful reranking layer
11+
- Structured API success + error envelopes with standardized error codes (see README)
12+
- Scoring pipeline specification (`docs/SCORING_MODEL.md`) detailing dense, narrative, pheromone, MMR, and exploration fusion formula
13+
- Sample evaluation dataset (`data/sample_eval.jsonl`) for quick Context-Use@K smoke tests
1114

1215
###Changed
1316
- Build bootstrap hardening: enforce safe pip range excluding 25.2 (GHSA-4xh5-x5gv-qwph) and updated setuptools minimum
1417
- CI workflow now upgrades both pip and setuptools prior to installation; pip-audit runs post-install
18+
- Broadened dependency ranges (FastAPI, Starlette, Uvicorn) to reduce upgrade churn while retaining safety bounds
1519

1620
###Fixed
1721
- Prevent accidental install of vulnerable pip version 25.2 by pinning`<25.2`
1822
- Improved alignment between PyPI metadata and README summary
1923

2024
###Notes
2125
- Next minor (0.4.x) will introduce formal API versioning header, deterministic mode, standardized error envelopes, and scoring spec documentation (tracked in issues)
26+
(Structured envelopes + scoring spec landed early in 0.3.1; versioning header + deterministic mode still pending.)
2227

2328
###Added
2429
- Cognitive gating layer with entropy-aware candidate trimming and configuration overrides

‎README.md‎

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,31 @@ Use the generated CSV to inspect which queries improved, regressions, and latenc
234234

235235
###Sample datasets
236236

237-
The previous synthetic Context-Use demo is being redesigned. We’ll publish a refreshed walkthrough once the new baseline is validated. In the meantime you can point`scripts/eval_context_use.py` at your own JSONL datasets to measure uplift between any two rerankers.
237+
We ship a small, neutral illustrative dataset at`data/sample_eval.jsonl` (5 queries) covering:
238+
239+
- Stigmergy concept recall
240+
- MMR rationale
241+
- ε-greedy exploration purpose
242+
- Pheromone decay motivation
243+
- Narrative memory function
244+
245+
Each line contains:
246+
247+
```json
248+
{"query":"...","docs": [{"id":"d1","text":"..."},...],"answer":"..."}
249+
```
250+
251+
Run a smoke eval against a locally running API:
252+
253+
```bash
254+
python scripts/eval_context_use.py \
255+
--api http://127.0.0.1:8080 \
256+
--data data/sample_eval.jsonl \
257+
--out reports/sample_eval.csv \
258+
--top-k 3
259+
```
260+
261+
Inspect`reports/sample_eval.csv` for per-query hits. Extend by appending more JSONL lines that follow the same schema; avoid sensitive data—this file is published.
238262

239263
---
240264

‎data/sample_eval.jsonl‎

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{"query":"What is stigmergy?","docs": [{"id":"d1","text":"Stigmergy is indirect coordination mediated by environment-modifying traces."}, {"id":"d2","text":"Reinforcement learning uses rewards to optimize actions."}, {"id":"d3","text":"Indirect coordination systems appear in social insects like ants."}, {"id":"d4","text":"A vector database stores embeddings for similarity search."}],"answer":"Stigmergy is indirect coordination via environmental traces."}
2+
{"query":"Why apply MMR in reranking?","docs": [{"id":"d5","text":"MMR reduces redundancy by balancing relevance and diversity."}, {"id":"d6","text":"Cosine similarity measures angle between embedding vectors."}, {"id":"d7","text":"Diversity in retrieved contexts prevents repeated evidence."}],"answer":"To reduce redundancy; MMR balances relevance with diversity."}
3+
{"query":"What does epsilon-greedy do?","docs": [{"id":"d8","text":"Epsilon-greedy occasionally explores random candidates."}, {"id":"d9","text":"Exploration helps avoid local optima in adaptive systems."}, {"id":"d10","text":"Embedding models map text into vector space."}],"answer":"It occasionally explores alternatives (epsilon-greedy) to avoid local optima."}
4+
{"query":"Why decay pheromones?","docs": [{"id":"d11","text":"Pheromone decay prevents stale documents from dominating."}, {"id":"d12","text":"EMA narrative memory emphasizes recent successful context windows."}, {"id":"d13","text":"Caching strategies can include LRU or LFU."}],"answer":"Decay prevents stale documents from dominating future reranks."}
5+
{"query":"What is narrative memory?","docs": [{"id":"d14","text":"Narrative memory tracks successful multi-turn context windows with an EMA."}, {"id":"d15","text":"Transformers use self-attention to model token dependencies."}, {"id":"d16","text":"SQLite provides transactional persistence for small apps."}],"answer":"It tracks successful multi-turn context windows (EMA narrative memory)."}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2026 Movatter.jp