forstepinsteps:mover=step.change.canonical.moverifnotstep.change.acceptedandisinstance(mover,my_mover_type):trials=paths.SampleSet(step.change.canonical.trials)ifmy_ensembleintrials.ensembles:sample=trials[my_ensemble]# do something with that sample

Here the hypothetical user is checking rejected trials of a specific mover and a specific ensemble. It's some messy code already a few levels of indent in, and with all thatstep.change stuff they need to remember. (And this is even using the hack that you can get a sample for a specific ensemble by converting the list of samples to aSampleSet.)

[EDIT: I updated the above to actually reflect the same behavior as below -- it's a little more complicated]

Here's how it works with the filters I'm introducing here:

filt=SampleFilter(step_filter=RejectedSteps&CanonicalMover(my_mover_type),sample_selector=TrialSamples,sample_filter=Ensemble(my_ensemble),)forsampleinfilt(steps,flatten=True):# do something with that sample

I guess it doesn't save too much vertical space, but it does save some characters, and I think it will be much easier to learn/teach this approach.

More usage in the notebook added in this PR. For easy use cases, it's really nice (for step in RejectedSteps(steps))

For discussion: does it make more sense stylistically to be:

RejectedSteps&CanonicalMover(my_mover_type)

rejected_steps&canonical_mover(my_mover_type)

Formally, I think PEP8 would advise

REJECTED_STEPS&CanonicalMover(my_mover_type)

since rejected states is actually a constant instance, while canonical mover really is a class. But consistency in user experience with filters is more important than foolish consistency with PEP8.

Start to analysis filters

9144729

dwhswenson added the feature label

May 15, 2021

Copy link

codecovbot commentedMay 15, 2021•
edited
Loading

Codecov Report

Base:81.75% // Head:81.65% // Decreases project coverage by-0.10%⚠️

Coverage data is based on head(96495a3) compared to base(02102f3).
Patch coverage: 73.73% of modified lines in pull request are covered.

❗ Current head96495a3 differs from pull request most recent head999bea2. Consider uploading reports for the commit999bea2 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@##           master    #1026      +/-   ##==========================================- Coverage   81.75%   81.65%   -0.11%==========================================  Files         142      143       +1       Lines       15612    15810     +198     ==========================================+ Hits        12764    12910     +146- Misses       2848     2900      +52

Impacted Files	Coverage Δ
openpathsampling/analysis/filters.py	`73.73% <73.73%> (ø)`

Help us with your feedback. Take ten seconds to tell ushow you rate us. Have a feature suggestion?Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment?Let us know in this issue.

Copy link

MemberAuthor

dwhswenson commentedMay 15, 2021•
edited
Loading

I started to play with using the__matmul__ special method to create a simple syntax. I'd appreciate feedback on whether this syntax is understandable or if it is too obscure. The filter described above would be defined as:

filt=TrialSamples @ (RejectedSteps&CanonicalMover(my_mover_type),Ensemble(my_ensemble))

Eventually I'd like to remove the tuple and make it so a user can use& where the comma is, but that would require dealing with the distributive property of boolean algebra, which I'm not terribly interested in implementing right now. We'd probably continue to support this older notation, since it's easy to check whether the argument is a 2-tuple (and the implementation is trivial).

A few simpler examples of this syntax:

TrialSamples @ (RejectedSteps,None)# all rejected samplesActiveSamples @ (None,Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleTrialSamples @ (RejectedSteps,Ensemble(my_ensemble))# rejected samples for this ensemble

EDIT: Not liking theNone above, the following syntaxes will also be available. Would appreciate feedback on the choice ofusing --where seemed like another option:

TrialSamples @ (RejectedSteps,AllSamples)# all rejected samplesTrialSamples.using(RejectedSteps)# all rejected samplesTrialSamples.using(step_filter=RejectedSteps)# all rejected samplesActiveSamples @ (AllSteps,Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleActiveSamples.using(sample_filter=Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleTrialSamples.using(RejectedSteps,Ensemble(my_ensemble))# rejected samples for this ensemble

EDIT: What those simpler examples would look like with the target syntax (probably not near future, though):

TrialSamples @RejectedSteps# all rejected samplesActiveSamples @Ensemble(my_ensemble)# samples repeated with correct weight for this ensembleTrialSamples @ (RejectedSteps&Ensemble(my_ensemble))# rejected samples for this ensemble

Parentheses are necessary for logical combinations because@ has higher precedence than&,|, etc..

Copy link

Member

sroet commentedMay 17, 2021

For discussion: does it make more sense stylistically to be:

RejectedSteps & CanonicalMover(my_mover_type)

Definetly go for the double CamelCase ( you can even makeRejectedSteps a class that just selects the rejected steps (with subclasses for the different types of rejections)

About the__matmul__ override, I think that that might be a bit obscure to be user facing. But it might be a welcome addition inside core OPS. This is mainly due to the strict tuple order for the filters, which could probably lead to strange user errors if done incorrectly.

One question about the output order. This seems to always return an outer loop over the steps with an inner loop over the ensembles, is there any ways for a user to invert those two loops with this syntax?

Add shooting points; misc updates to filters

61d2570

Copy link

MemberAuthor

dwhswenson commentedMay 17, 2021•
edited
Loading

Thanks for the feedback!

you can even makeRejectedSteps a class that just selects the rejected steps (with subclasses for the different types of rejections)

I originally had it as a class, but I didn't like the parentheses afterRejectedSteps(). Actually, I really like the way this is currently implemented:

RejectedSteps=StepFilterCondition(lambdastep:notstep.change.accepted,name="RejectedSteps")

(where thename is required for error strings; I may make that optional) This means you can easily create any weird custom filter. For example, let's say you wanted to select all the steps wherecv(shooting_point) < 0.3:

defmy_condition(step):# did I mention that I added an extractor called `ShootingPoints`?returncv(ShootingPoints(step))<0.3CustomFilter=StepFilterCondition(my_condition,name="Custom")

You might need to use that filter asShootingSteps & CustomFilter, but still -- I think that's pretty accessible even to novice users.

About the__matmul__ override, I think that that might be a bit obscure to be user facing. But it might be a welcome addition inside core OPS. This is mainly due to the strict tuple order for the filters, which could probably lead to strange user errors if done incorrectly.

Fair enough. Any thoughts on theusing (or possiblywhere) syntax? Example:

TrialSamples.using(RejectedSteps,Ensemble(my_ensemble))

This still requires both arguments, but with this you have all the standard introspection into Python functions that IDEs use for help. I'm starting to think this should be the preferred syntax -- I think it reads pretty well. Internally, it's just a wrapper around creating theSampleFilter.

One question about the output order. This seems to always return an outer loop over the steps with an inner loop over the ensembles, is there any ways for a user to invert those two loops with this syntax?

No, and I'm not sure that's even possible. The inner loop is overSamples; the samples come from inside the step, and so aren't defined until you have picked a step.

I'm guessing you're thinking about something like standard TIS analysis, where you want to group trajectories by the ensemble? We have thesteps_to_weighted_trajectories method for that. I could add agroup_by_ensemble method to return a dict mapping each ensemble to list of samples (probably list here, not generator, because of user confusion when the generator is exahausted):

# samples with unique accepted trajectories, sorted by ensemble# assuming all trajectory generation is in shooting steps (no shifting, etc.)unique_traj_samples=TrialSamples.using(AcceptedSteps&ShootingSteps)group_by_ensemble(unique_traj_samples(steps))

Did you have a use case in mind other than a group-by operation?

Copy link

Member

sroet commentedMay 17, 2021

Fair enough. Any thoughts on theusing (or possiblywhere) syntax?

With the python introspection, this seems like a good idea. I have a slight preference forusing as it does not remind me ofnp.where (which requires explicit conditions).

Did you have a use case in mind other than a group-by operation?

Nope, the group-by was the one I was thinking about.

Copy link

MemberAuthor

dwhswenson commentedMay 18, 2021

I've been thinking more about the nested levels of filtering, and it seems that my thinking was a little off before. First, some terminology I've been using (would love to know if there's more official terms for these things):

condition filter:AcceptedSteps,RejectedSteps,Replica, etc are condition filters, i.e., filters that are based on some condition.AcceptedSteps is a step condition filter,Replica is a sample condition filter.
extractor:TrialSamples andActiveSamples are extractors. Extractors extract some piece of information from aMCStep object.
extractor-filter:SampleFilter (or anything made byextractor.using()) is an extractor-filter. An extractor-filter combines an extractor and a filter (surprising, right?).

When I started on this, the extractors I was working on all returned lists of samples, so it made sense to add a second layer of filtering for the samples. But for other extractors, it doesn't make sense:ShootingPoints doesn't return samples -- you might want a step condition filter calledTrialReplica orTrialEnsemble, which checks if any of the trials involved the given replica/ensemble. However, the filters to extract the right sample from a list of samples aren't relevant toShootingPoints, because there's no list of samples.

So for a general extractor (ShootingPoints will probably be a good example), it will be:

extractor.using(step_filter)

For extractors that return lists of objects, here are the possible syntaxes:

# (A) current implementationextractor.using(step_filter,secondary_filter)TrialSamples.using(RejectedSteps,Replica(0))# (B) Add a `with_filter` method to extractor-filters that sets the secondary filterextractor.using(step_filter).with_filter(secondary_filter)TrialSamples.using(RejectedSteps).with_filter(Replica(0))

In (B) I called itwith_filter instead of justfilter as a reminder that this is actually setting an attribute of the object being created.

Any thoughts? I think I'm leaning toward (B) since that should make it possible to make only one class forExtractorFilter. Another nice thing about (B) is that I'd feel more comfortable makingsteps an optional parameter tousing:

# allows thisforstepinTrialSamples.using(RejectedSteps,steps=steps):    ...# instead offorstepinTrialSamples.using(RejectedSteps)(steps):# the )( often confuses new users    ...# orcustom_filter=TrialSamples.using(RejectedSteps)# gets rid of )( but adds lineforstepincustom_filter(steps):    ...# it even allowsforstepinTrialSamples.using(steps=steps):# default filter is AllSteps    ...

Copy link

Member

sroet commentedMay 18, 2021

I've been thinking more about the nested levels of filtering, and it seems that my thinking was a little off before. First, some terminology I've been using (would love to know if there's more official terms for these things):

condition filter: AcceptedSteps, RejectedSteps, Replica, etc are condition filters, i.e., filters that are based on some condition. AcceptedSteps is a step condition filter, Replica is a sample condition filter.
extractor: TrialSamples and ActiveSamples are extractors. Extractors extract some piece of information from a MCStep object.
extractor-filter: SampleFilter (or anything made by extractor.using()) is an extractor-filter. An extractor-filter combines an extractor and a filter (surprising, right?).

So the quickest somewhat similar reference is frompandas.Dataframes.
If you mimic their case, then yourcondition filter is alocalisation based on a boolean mask on the condition of certain columns inside certain rows. While yourextractor would befiltering based on a singlecolumn/row name.

Now, this is just my personal impression as someone who likes to think of this data as a DataFrame with a row for every MCStep, which can be pivoted arbitrarily (like apandas.DataFrame).

I personally have no preference between A and B, but as you describe it I do think B would lead to cleaner user facing code

Copy link

MemberAuthor

dwhswenson commentedMay 18, 2021

I think of it as more of filter and apply, in the pandas sense, because technically it can apply any function to the step. But I think that, to users, "apply" might seem like an odd term here. So I guess I’ll stick with extract!

I also just noticed that the syntax usingwith_filter (which I've now implemented) reminds me a bit of the syntax for list comprehensions. Consider:

# to get the ensemble for a replica at each MC stepfollow_replica=ActiveSamples.using(AllSteps).with_filter(Replica(0))history= [sample.ensembleforsampleinfollow_replica(storage.steps)]# write things in the order of:# what you want, what to iterate over, the optional if-statement to filter

updates to analysis filters

0b8aa53

Copy link

MemberAuthor

dwhswenson commentedJun 15, 2021

@sroet (and@gyorgy-hantal or anyone else who might be interested) : Before I write tests for this stuff, could you take a look at thedocumentation notebook and give any feedback on this implementation?

I am thinking I'll probably convert these tosnake_case, instead ofCamelCase. Reasoning is by comparison with custom implementations. Why is itlist(AcceptedSteps(steps)) in cell 6 butlist(custom_filter(steps)) in cell 11? WhyShootingPoints.using(AllSteps) buttiming.using(AllSteps) in cell 21? Butcustom_filter andtiming definitely feel correct.

One point I'd like advice on: We have "extractors," which extract some piece of information from the step, and "step filters", which select certain steps based on some property. For the most part, there isn't much namespace collision (RejectedSteps is a step filter;ShootingPoints is an extractor). However, the canonical mover is something that you might either want to extractor filter over. I think filtering is more common, so my leaning is to useCanonicalMover as the name for the step filter. But what, then, should the extractor be called? (See the notebook under "Predefined objects" for names of other extractors and filters.)

I'm thinking about importing all of these into theopenpathsampling.analysis namespace, rather than requiring something likefrom openpathsampling.analysis import filters. I'm open to other suggestions about namespace organization here.

Copy link

Member

sroet commentedJun 16, 2021

Before I write tests for this stuff, could you take a look at the documentation notebook and give any feedback on this implementation?

Seems fine by me.

But custom_filter and timing definitely feel correct.

I do agree that that feels correct, so +1 for snake_case

But what, then, should the extractor be called?

Is there a reason we could not just switch behaviour based on call input type? Is there any overlap? (a call with astring returns afilter and any other should return as an extractor)

I'm open to other suggestions about namespace organization here.

I am in favor for importing on top namespace, but do have split files forfilter andextractor (just so one could do auto-completion onopenpathsampling.analysis.filters oropenpathsampling.analysis.extractors if you want a quick list for available filters/extractors)

Add start to tests

9527d4b

dwhswenson added this to the1.6 milestone

Jun 18, 2021

dwhswenson added2 commits

July 5, 2021 10:10

cleanup of STAGES; several new filters

3ce4a88

snake_case

b91cb2e

Copy link

MemberAuthor

dwhswenson commentedJul 5, 2021

Is there a reason we could not just switch behaviour based on call input type? Is there any overlap? (a call with astring returns afilter and any other should return as an extractor)

While I think that's theoretically possible, I think it might be really confusing. It isn't as simple as a factory that returns different objects depending on the setup:

# as a filterfilt1=canonical_mover('shooting')shooting_steps=list(filt1(storage.steps))filt2=canonical_mover(scheme['shooting'][0])shooter0_steps=list(filt2(storage.steps))# as an extractormover=canonical_mover(step)ef=canonical_mover.using(rejected_steps)n_rejected_per_mover=collections.Counter(ef(storage.steps))

Really getting this to work would require either weird flags setting the type of object it was acting as, or some metaclass work.

So far, allExtractors are plural. This makes sense when used withusing to get an extract-filter, although it's a little weird in cases like this (or evenshooting_points). So one possibility is to use that as a distinction:

sp=shooting_points(step)# already doing thismover=canonical_movers(step)ef=canonical_movers.using(rejected_steps)...

It risks subtle errors (most likely incorrectly usingcanonical_mover(step) -- using filter in place of extractor). However, since the filter's__init__ already needs to handle some type checking, it's easy enough to add a very clear error message if it receives anMCStep.

I am in favor for importing on top namespace, but do have split files forfilter andextractor (just so one could do auto-completion onopenpathsampling.analysis.filters oropenpathsampling.analysis.extractors if you want a quick list for available filters/extractors)

I was thinking the same more for coding purposes (shorter files; clearer where to find each thing). I hadn't considered the autocompletion, but that trick gives another good reason to do that split!

Copy link

Member

sroet commentedJul 6, 2021

So far, all Extractors are plural. This makes sense when used with using to get an extract-filter, although it's a little weird in cases like this (or even shooting_points). So one possibility is to use that as a distinction:
sp = shooting_points(step)  # already doing thismover = canonical_movers(step)ef = canonical_movers.using(rejected_steps)...
It risks subtle errors (most likely incorrectly using canonical_mover(step) -- using filter in place of extractor). However, since the filter'sinit already needs to handle some type checking, it's easy enough to add a very clear error message if it receives an MCStep.

This probably the most reasonable solution.
Please make sure this convention of Extractors being plural makes it somewhere in the official documentation.

dwhswenson mentioned this pull request

Jul 6, 2021

OpenPathSampling 1.5#1031

Merged

dwhswenson added2 commits

July 6, 2021 09:50

add canonical_movers extractor; cleanup

766d81f

Add start to docs on extractors and filters

d17a04a

Copy link

MemberAuthor

dwhswenson commentedJul 6, 2021

Please make sure this convention of Extractors being plural makes it somewhere in the official documentation.

Started on docs for this (ind17a04a). There will be a prominent note discussing this at the beginning of the section on extractors.

dwhswenson added6 commits

July 6, 2021 12:47

get rid of f-strings

8ab5a21

outline tests we need to do

31df2f2

Merge branch 'master' of github.com:openpathsampling/openpathsampling…

0ef0885

… into analysis_filters

start on test fixtures

68ab949

Merge remote-tracking branch 'upstream/master' into analysis_filters

a9456ea

split off testing structure (separate PR)

9a3cb08

dwhswenson mentioned this pull request

Sep 10, 2021

New analysis testing utilities#1068

Merged

dwhswenson added16 commits

September 12, 2021 13:39

Merge branch 'test-analysis-utils' into tmp_test_analysis_filters

d8bd07e

tests for step filter

376b876

... well that was easy

steps toward testing

4108ecd

Merge branch 'master' of github.com:openpathsampling/openpathsampling…

6ecd22b

… into analysis_filters

Merge branch 'tmp_test_analysis_filters' into analysis_filters

cefb597

update test_filters for better analysis test utils

0fca990

fix Py2, more tests

298f73b

tests for bad input to canonical_mover

ec7cb4b

test_extractor

8aaa92a

Merge remote-tracking branch 'upstream/master' into analysis_filters

96495a3

steps toward better docs

20d12f8

starting on tests for shooting analysis

80bffc4

Parts of testing for filters

a6c9cf5

steps toward tests

a31b017

Merge remote-tracking branch 'upstream/master' into analysis_filters

a2bd212

Merge remote-tracking branch 'upstream/master' into analysis_filters

999bea2

Labels

feature

Movatterモバイル変換

[WIP] Analysis Filters#1026

Are you sure you want to change the base?

[WIP] Analysis Filters#1026

Uh oh!

Conversation

dwhswenson commentedMay 15, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

codecovbot commentedMay 15, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dwhswenson commentedMay 15, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

sroet commentedMay 17, 2021

Uh oh!

dwhswenson commentedMay 17, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

sroet commentedMay 17, 2021

Uh oh!

dwhswenson commentedMay 18, 2021

Uh oh!

sroet commentedMay 18, 2021

Uh oh!

dwhswenson commentedMay 18, 2021

Uh oh!

dwhswenson commentedJun 15, 2021

Uh oh!

sroet commentedJun 16, 2021

Uh oh!

dwhswenson commentedJul 5, 2021

Uh oh!

sroet commentedJul 6, 2021

Uh oh!

dwhswenson commentedJul 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dwhswenson commentedMay 15, 2021•
edited
Loading

codecovbot commentedMay 15, 2021•
edited
Loading

dwhswenson commentedMay 15, 2021•
edited
Loading

dwhswenson commentedMay 17, 2021•
edited
Loading