- Notifications
You must be signed in to change notification settings - Fork132
Implement semantic map + filter#209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Implement semantic map + filter#209
Conversation
ccbd704 to0be21c7Comparelotus/models/lm.py Outdated
| # Only sleep if the batch was faster than the required time | ||
| # Each request should be spaced by min_interval_per_request | ||
| required_time_for_batch=len(sub_batch)*min_interval_per_request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
repeated lines
lotus/sem_ops/sem_filter.py Outdated
| ifsafe_mode: | ||
| estimated_total_calls=len(docs) | ||
| estimated_total_cost=sum(model.count_tokens(input)forinputininputs) | ||
| estimated_total_calls=len(docs)*max(1,n_sample) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
we should raise a value error if n_sample is <= 0 since that does not make sense
lotus/sampling_utils.py Outdated
| return"yes"ifmean>=0.5else"no" | ||
| defapply_ensemble(strategy:Optional[str],all_outputs:List[List[str]],*,default_yes:bool)->List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Its better to have Enums for the stategy similar to WebSearchCorpus. That makes the values standardized and its clear to user what are the available choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Also I think strategy should not be allowed to be None. It does not make sense to apply "None" ensemble. If needed we can add "pick first" strategy and make it default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Further, the strategies will probably be different for different sem operators (majority vote is not making sense to me for sem_map since it is just asking LM for a generation and can have lot of variations). We should validate this as well
lotus/sampling_utils.py Outdated
| defresample_batch( | ||
| call_once:Callable[[bool,Optional[float],bool,str],Any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
we can make this general to accept any input and use args+kwargs in resample_batch as arguments. Is there a reason to limit it?
lotus/sem_ops/sem_map.py Outdated
| # call model | ||
| lm_output:LMOutput=model(inputs,progress_bar_desc=progress_bar_desc,**model_kwargs) | ||
| def_call_once(_want_logprobs,temp,spb,desc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
why are there arguments in this function? We can just capture it from parent since its defined inside the sem_* function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
unless there is something changing with different runs, there is no need for args
lotus/sampling_utils.py Outdated
| iflen(run)!=batch: | ||
| raiseValueError(f"Inconsistent batch sizes across runs: expected{batch}, got{len(run)}") | ||
| chosen:Optional[List[Any]]=logs[0]if (want_logprobsandlen(logs)>0)elseNone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We should decide the logprobs based on which output was finally chosen.
README.md Outdated
| url = {https://doi.org/10.14778/3749646.3749685}, | ||
| } | ||
| @article{patel2024semanticoperators, | ||
| @misc{patel2024semanticoperators, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
why is this changed?
lotus/sampling_utils.py Outdated
| return"yes"ifmean>=0.5else"no" | ||
| defapply_ensemble(strategy:Optional[str],all_outputs:List[List[str]],*,default_yes:bool)->List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why are extra parameters allowed when they are not used?
lotus/sampling_utils.py Outdated
| fromtypingimportAny,Callable,List,Optional,Sequence,Tuple | ||
| def_norm(x:Any)->str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
this is redoing the job of postprocessors. If needed, postprocessors should be applied before emsembling
| ) | ||
| assertisinstance(filtered,pd.DataFrame) | ||
| assert"Text"infiltered.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
we should add assertions checking the output columns are as expected
Uh oh!
There was an error while loading.Please reload this page.
Purpose
This PR implements the requested enhancements for sem_filter, addressing [Issue#200]Specifically:
n_sample parameter for repeated sampling at test time.
Ensembling strategies (e.g., majority_vote, average_prob/mean_prob) to stabilize outputs.
Stores per-run outputs:
preds,parsed_outputs,explanations,logprobsPlaced before "SemanticFilterOutput" to avoid forward reference issues
New .github/tests e2e tests using Ollama for df.sem_filter().
Summary of Changes
New: lotus/sampling_utils.py
resample_batch(call_once, n_sample, *args, **kwargs) – generic, args/kwargs forwarded (no fixed signature).
apply_ensemble(strategy: EnsembleStrategy, ...) – collapses [n_sample][batch] → [batch].
Supports PICK_FIRST, MAJORITY (generic hashable), and MEAN_BOOL (booleans only).
Updated:
Adds n_sample, ensemble: EnsembleStrategy, temperature.
Validates n_sample > 0.
Uses resample_batch → postprocess each run → apply_ensemble (returns labels + chosen indices).
Raw outputs, explanations, and (optionally) logprobs are taken from the chosen run per item.
Preserves existing cascade logic; helper model path also honors sampling and ensembling for its own pass.
Returns complete rollout data in
SemanticFilterOutput.raw_outputs_all_runsPreserves existing cascade logic; helper model path also honors sampling and ensembling
Test Plan
I tested the new sem_filter and sem_map features by running the full pytest suite with the required environment variables set.Ran .github/tests/test_sem_filter.py to validate df.sem_filter() on simple sentiment filtering and the sampling path (n_sample, temperature).
Behavioral checks:
Ensured outputs exist, are strings, and negative examples are excluded in filtering.
Confirmed tolerant assertions .
Verified new parameters (n_sample, ensemble, temperature) integrate without breaking defaults.
Static checks:
Passed ruff, ruff-format, and mypy locally.
Test Results
All tests for sem_filter and sem_map passed locally, confirming correct behavior and stability(Optional) Documentation Update
Type of Change
Checklist
BEFORE SUBMITTING, PLEASE READhttps://github.com/lotus-data/lotus/blob/main/CONTRIBUTING.md
anything written below this line will be removed by GitHub Actions