Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[WIP] Analysis Filters#1026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
dwhswenson wants to merge30 commits intoopenpathsampling:master
base:master
Choose a base branch
Loading
fromdwhswenson:analysis_filters

Conversation

@dwhswenson
Copy link
Member

@dwhswensondwhswenson commentedMay 15, 2021
edited
Loading

This PR introduces analysis filters, an idea I've had for a while but just finally figured out how to implement correctly.

Basically, the idea is that a lot of OPS analysis code looks something like:

forstepinsteps:mover=step.change.canonical.moverifnotstep.change.acceptedandisinstance(mover,my_mover_type):trials=paths.SampleSet(step.change.canonical.trials)ifmy_ensembleintrials.ensembles:sample=trials[my_ensemble]# do something with that sample

Here the hypothetical user is checking rejected trials of a specific mover and a specific ensemble. It's some messy code already a few levels of indent in, and with all thatstep.change stuff they need to remember. (And this is even using the hack that you can get a sample for a specific ensemble by converting the list of samples to aSampleSet.)

[EDIT: I updated the above to actually reflect the same behavior as below -- it's a little more complicated]

Here's how it works with the filters I'm introducing here:

filt=SampleFilter(step_filter=RejectedSteps&CanonicalMover(my_mover_type),sample_selector=TrialSamples,sample_filter=Ensemble(my_ensemble),)forsampleinfilt(steps,flatten=True):# do something with that sample

I guess it doesn't save too much vertical space, but it does save some characters, and I think it will be much easier to learn/teach this approach.

More usage in the notebook added in this PR. For easy use cases, it's really nice (for step in RejectedSteps(steps))


For discussion: does it make more sense stylistically to be:

RejectedSteps&CanonicalMover(my_mover_type)

or

rejected_steps&canonical_mover(my_mover_type)

Formally, I think PEP8 would advise

REJECTED_STEPS&CanonicalMover(my_mover_type)

since rejected states is actually a constant instance, while canonical mover really is a class. But consistency in user experience with filters is more important than foolish consistency with PEP8.

@codecov
Copy link

codecovbot commentedMay 15, 2021
edited
Loading

Codecov Report

Base:81.75% // Head:81.65% // Decreases project coverage by-0.10%⚠️

Coverage data is based on head(96495a3) compared to base(02102f3).
Patch coverage: 73.73% of modified lines in pull request are covered.

❗ Current head96495a3 differs from pull request most recent head999bea2. Consider uploading reports for the commit999bea2 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@##           master    #1026      +/-   ##==========================================- Coverage   81.75%   81.65%   -0.11%==========================================  Files         142      143       +1       Lines       15612    15810     +198     ==========================================+ Hits        12764    12910     +146- Misses       2848     2900      +52
Impacted FilesCoverage Δ
openpathsampling/analysis/filters.py73.73% <73.73%> (ø)

Help us with your feedback. Take ten seconds to tell ushow you rate us. Have a feature suggestion?Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment?Let us know in this issue.

@dwhswenson
Copy link
MemberAuthor

dwhswenson commentedMay 15, 2021
edited
Loading

I started to play with using the__matmul__ special method to create a simple syntax. I'd appreciate feedback on whether this syntax is understandable or if it is too obscure. The filter described above would be defined as:

filt=TrialSamples @ (RejectedSteps&CanonicalMover(my_mover_type),Ensemble(my_ensemble))

Eventually I'd like to remove the tuple and make it so a user can use& where the comma is, but that would require dealing with the distributive property of boolean algebra, which I'm not terribly interested in implementing right now. We'd probably continue to support this older notation, since it's easy to check whether the argument is a 2-tuple (and the implementation is trivial).

A few simpler examples of this syntax:

TrialSamples @ (RejectedSteps,None)# all rejected samplesActiveSamples @ (None,Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleTrialSamples @ (RejectedSteps,Ensemble(my_ensemble))# rejected samples for this ensemble

EDIT: Not liking theNone above, the following syntaxes will also be available. Would appreciate feedback on the choice ofusing --where seemed like another option:

TrialSamples @ (RejectedSteps,AllSamples)# all rejected samplesTrialSamples.using(RejectedSteps)# all rejected samplesTrialSamples.using(step_filter=RejectedSteps)# all rejected samplesActiveSamples @ (AllSteps,Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleActiveSamples.using(sample_filter=Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleTrialSamples.using(RejectedSteps,Ensemble(my_ensemble))# rejected samples for this ensemble

EDIT: What those simpler examples would look like with the target syntax (probably not near future, though):

TrialSamples @RejectedSteps# all rejected samplesActiveSamples @Ensemble(my_ensemble)# samples repeated with correct weight for this ensembleTrialSamples @ (RejectedSteps&Ensemble(my_ensemble))# rejected samples for this ensemble

Parentheses are necessary for logical combinations because@ has higher precedence than&,|, etc..

@sroet
Copy link
Member

For discussion: does it make more sense stylistically to be:

RejectedSteps & CanonicalMover(my_mover_type)

Definetly go for the double CamelCase ( you can even makeRejectedSteps a class that just selects the rejected steps (with subclasses for the different types of rejections)

About the__matmul__ override, I think that that might be a bit obscure to be user facing. But it might be a welcome addition inside core OPS. This is mainly due to the strict tuple order for the filters, which could probably lead to strange user errors if done incorrectly.

One question about the output order. This seems to always return an outer loop over the steps with an inner loop over the ensembles, is there any ways for a user to invert those two loops with this syntax?

@dwhswenson
Copy link
MemberAuthor

dwhswenson commentedMay 17, 2021
edited
Loading

Thanks for the feedback!

you can even makeRejectedSteps a class that just selects the rejected steps (with subclasses for the different types of rejections)

I originally had it as a class, but I didn't like the parentheses afterRejectedSteps(). Actually, I really like the way this is currently implemented:

RejectedSteps=StepFilterCondition(lambdastep:notstep.change.accepted,name="RejectedSteps")

(where thename is required for error strings; I may make that optional) This means you can easily create any weird custom filter. For example, let's say you wanted to select all the steps wherecv(shooting_point) < 0.3:

defmy_condition(step):# did I mention that I added an extractor called `ShootingPoints`?returncv(ShootingPoints(step))<0.3CustomFilter=StepFilterCondition(my_condition,name="Custom")

You might need to use that filter asShootingSteps & CustomFilter, but still -- I think that's pretty accessible even to novice users.

About the__matmul__ override, I think that that might be a bit obscure to be user facing. But it might be a welcome addition inside core OPS. This is mainly due to the strict tuple order for the filters, which could probably lead to strange user errors if done incorrectly.

Fair enough. Any thoughts on theusing (or possiblywhere) syntax? Example:

TrialSamples.using(RejectedSteps,Ensemble(my_ensemble))

This still requires both arguments, but with this you have all the standard introspection into Python functions that IDEs use for help. I'm starting to think this should be the preferred syntax -- I think it reads pretty well. Internally, it's just a wrapper around creating theSampleFilter.

One question about the output order. This seems to always return an outer loop over the steps with an inner loop over the ensembles, is there any ways for a user to invert those two loops with this syntax?

No, and I'm not sure that's even possible. The inner loop is overSamples; the samples come from inside the step, and so aren't defined until you have picked a step.

I'm guessing you're thinking about something like standard TIS analysis, where you want to group trajectories by the ensemble? We have thesteps_to_weighted_trajectories method for that. I could add agroup_by_ensemble method to return a dict mapping each ensemble to list of samples (probably list here, not generator, because of user confusion when the generator is exahausted):

# samples with unique accepted trajectories, sorted by ensemble# assuming all trajectory generation is in shooting steps (no shifting, etc.)unique_traj_samples=TrialSamples.using(AcceptedSteps&ShootingSteps)group_by_ensemble(unique_traj_samples(steps))

Did you have a use case in mind other than a group-by operation?

@sroet
Copy link
Member

Fair enough. Any thoughts on theusing (or possiblywhere) syntax?

With the python introspection, this seems like a good idea. I have a slight preference forusing as it does not remind me ofnp.where (which requires explicit conditions).

Did you have a use case in mind other than a group-by operation?

Nope, the group-by was the one I was thinking about.

@dwhswenson
Copy link
MemberAuthor

I've been thinking more about the nested levels of filtering, and it seems that my thinking was a little off before. First, some terminology I've been using (would love to know if there's more official terms for these things):

  • condition filter:AcceptedSteps,RejectedSteps,Replica, etc are condition filters, i.e., filters that are based on some condition.AcceptedSteps is a step condition filter,Replica is a sample condition filter.
  • extractor:TrialSamples andActiveSamples are extractors. Extractors extract some piece of information from aMCStep object.
  • extractor-filter:SampleFilter (or anything made byextractor.using()) is an extractor-filter. An extractor-filter combines an extractor and a filter (surprising, right?).

When I started on this, the extractors I was working on all returned lists of samples, so it made sense to add a second layer of filtering for the samples. But for other extractors, it doesn't make sense:ShootingPoints doesn't return samples -- you might want a step condition filter calledTrialReplica orTrialEnsemble, which checks if any of the trials involved the given replica/ensemble. However, the filters to extract the right sample from a list of samples aren't relevant toShootingPoints, because there's no list of samples.

So for a general extractor (ShootingPoints will probably be a good example), it will be:

extractor.using(step_filter)

For extractors that return lists of objects, here are the possible syntaxes:

# (A) current implementationextractor.using(step_filter,secondary_filter)TrialSamples.using(RejectedSteps,Replica(0))# (B) Add a `with_filter` method to extractor-filters that sets the secondary filterextractor.using(step_filter).with_filter(secondary_filter)TrialSamples.using(RejectedSteps).with_filter(Replica(0))

In (B) I called itwith_filter instead of justfilter as a reminder that this is actually setting an attribute of the object being created.

Any thoughts? I think I'm leaning toward (B) since that should make it possible to make only one class forExtractorFilter. Another nice thing about (B) is that I'd feel more comfortable makingsteps an optional parameter tousing:

# allows thisforstepinTrialSamples.using(RejectedSteps,steps=steps):    ...# instead offorstepinTrialSamples.using(RejectedSteps)(steps):# the )( often confuses new users    ...# orcustom_filter=TrialSamples.using(RejectedSteps)# gets rid of )( but adds lineforstepincustom_filter(steps):    ...# it even allowsforstepinTrialSamples.using(steps=steps):# default filter is AllSteps    ...

@sroet
Copy link
Member

I've been thinking more about the nested levels of filtering, and it seems that my thinking was a little off before. First, some terminology I've been using (would love to know if there's more official terms for these things):

condition filter: AcceptedSteps, RejectedSteps, Replica, etc are condition filters, i.e., filters that are based on some condition. AcceptedSteps is a step condition filter, Replica is a sample condition filter.
extractor: TrialSamples and ActiveSamples are extractors. Extractors extract some piece of information from a MCStep object.
extractor-filter: SampleFilter (or anything made by extractor.using()) is an extractor-filter. An extractor-filter combines an extractor and a filter (surprising, right?).

So the quickest somewhat similar reference is frompandas.Dataframes.
If you mimic their case, then yourcondition filter is alocalisation based on a boolean mask on the condition of certain columns inside certain rows. While yourextractor would befiltering based on a singlecolumn/row name.

Now, this is just my personal impression as someone who likes to think of this data as a DataFrame with a row for every MCStep, which can be pivoted arbitrarily (like apandas.DataFrame).

I personally have no preference between A and B, but as you describe it I do think B would lead to cleaner user facing code

@dwhswenson
Copy link
MemberAuthor

I think of it as more of filter and apply, in the pandas sense, because technically it can apply any function to the step. But I think that, to users, "apply" might seem like an odd term here. So I guess I’ll stick with extract!

I also just noticed that the syntax usingwith_filter (which I've now implemented) reminds me a bit of the syntax for list comprehensions. Consider:

# to get the ensemble for a replica at each MC stepfollow_replica=ActiveSamples.using(AllSteps).with_filter(Replica(0))history= [sample.ensembleforsampleinfollow_replica(storage.steps)]# write things in the order of:# what you want, what to iterate over, the optional if-statement to filter

@dwhswenson
Copy link
MemberAuthor

@sroet (and@gyorgy-hantal or anyone else who might be interested) : Before I write tests for this stuff, could you take a look at thedocumentation notebook and give any feedback on this implementation?

I am thinking I'll probably convert these tosnake_case, instead ofCamelCase. Reasoning is by comparison with custom implementations. Why is itlist(AcceptedSteps(steps)) in cell 6 butlist(custom_filter(steps)) in cell 11? WhyShootingPoints.using(AllSteps) buttiming.using(AllSteps) in cell 21? Butcustom_filter andtiming definitely feel correct.

One point I'd like advice on: We have "extractors," which extract some piece of information from the step, and "step filters", which select certain steps based on some property. For the most part, there isn't much namespace collision (RejectedSteps is a step filter;ShootingPoints is an extractor). However, the canonical mover is something that you might either want to extractor filter over. I think filtering is more common, so my leaning is to useCanonicalMover as the name for the step filter. But what, then, should the extractor be called? (See the notebook under "Predefined objects" for names of other extractors and filters.)

I'm thinking about importing all of these into theopenpathsampling.analysis namespace, rather than requiring something likefrom openpathsampling.analysis import filters. I'm open to other suggestions about namespace organization here.

@sroet
Copy link
Member

Before I write tests for this stuff, could you take a look at the documentation notebook and give any feedback on this implementation?

Seems fine by me.

But custom_filter and timing definitely feel correct.

I do agree that that feels correct, so +1 for snake_case

But what, then, should the extractor be called?

Is there a reason we could not just switch behaviour based on call input type? Is there any overlap? (a call with astring returns afilter and any other should return as an extractor)

I'm open to other suggestions about namespace organization here.

I am in favor for importing on top namespace, but do have split files forfilter andextractor (just so one could do auto-completion onopenpathsampling.analysis.filters oropenpathsampling.analysis.extractors if you want a quick list for available filters/extractors)

@dwhswensondwhswenson added this to the1.6 milestoneJun 18, 2021
@dwhswenson
Copy link
MemberAuthor

Is there a reason we could not just switch behaviour based on call input type? Is there any overlap? (a call with astring returns afilter and any other should return as an extractor)

While I think that's theoretically possible, I think it might be really confusing. It isn't as simple as a factory that returns different objects depending on the setup:

# as a filterfilt1=canonical_mover('shooting')shooting_steps=list(filt1(storage.steps))filt2=canonical_mover(scheme['shooting'][0])shooter0_steps=list(filt2(storage.steps))# as an extractormover=canonical_mover(step)ef=canonical_mover.using(rejected_steps)n_rejected_per_mover=collections.Counter(ef(storage.steps))

Really getting this to work would require either weird flags setting the type of object it was acting as, or some metaclass work.

So far, allExtractors are plural. This makes sense when used withusing to get an extract-filter, although it's a little weird in cases like this (or evenshooting_points). So one possibility is to use that as a distinction:

sp=shooting_points(step)# already doing thismover=canonical_movers(step)ef=canonical_movers.using(rejected_steps)...

It risks subtle errors (most likely incorrectly usingcanonical_mover(step) -- using filter in place of extractor). However, since the filter's__init__ already needs to handle some type checking, it's easy enough to add a very clear error message if it receives anMCStep.

I am in favor for importing on top namespace, but do have split files forfilter andextractor (just so one could do auto-completion onopenpathsampling.analysis.filters oropenpathsampling.analysis.extractors if you want a quick list for available filters/extractors)

I was thinking the same more for coding purposes (shorter files; clearer where to find each thing). I hadn't considered the autocompletion, but that trick gives another good reason to do that split!

@sroet
Copy link
Member

So far, all Extractors are plural. This makes sense when used with using to get an extract-filter, although it's a little weird in cases like this (or even shooting_points). So one possibility is to use that as a distinction:

sp = shooting_points(step)  # already doing thismover = canonical_movers(step)ef = canonical_movers.using(rejected_steps)...

It risks subtle errors (most likely incorrectly using canonical_mover(step) -- using filter in place of extractor). However, since the filter'sinit already needs to handle some type checking, it's easy enough to add a very clear error message if it receives an MCStep.

This probably the most reasonable solution.
Please make sure this convention of Extractors being plural makes it somewhere in the official documentation.

@dwhswensondwhswenson mentioned this pull requestJul 6, 2021
@dwhswenson
Copy link
MemberAuthor

Please make sure this convention of Extractors being plural makes it somewhere in the official documentation.

Started on docs for this (ind17a04a). There will be a prominent note discussing this at the beginning of the section on extractors.

sroet reacted with thumbs up emoji

... well that was easy
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

Projects

None yet

Milestone

1.6

Development

Successfully merging this pull request may close these issues.

2 participants

@dwhswenson@sroet

[8]ページ先頭

©2009-2025 Movatter.jp