- Notifications
You must be signed in to change notification settings - Fork49
[WIP] Analysis Filters#1026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
codecovbot commentedMay 15, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Codecov ReportBase:81.75% // Head:81.65% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@## master #1026 +/- ##==========================================- Coverage 81.75% 81.65% -0.11%========================================== Files 142 143 +1 Lines 15612 15810 +198 ==========================================+ Hits 12764 12910 +146- Misses 2848 2900 +52
Help us with your feedback. Take ten seconds to tell ushow you rate us. Have a feature suggestion?Share it here. ☔ View full report at Codecov. |
dwhswenson commentedMay 15, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I started to play with using the filt=TrialSamples @ (RejectedSteps&CanonicalMover(my_mover_type),Ensemble(my_ensemble)) Eventually I'd like to remove the tuple and make it so a user can use A few simpler examples of this syntax: TrialSamples @ (RejectedSteps,None)# all rejected samplesActiveSamples @ (None,Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleTrialSamples @ (RejectedSteps,Ensemble(my_ensemble))# rejected samples for this ensemble EDIT: Not liking the TrialSamples @ (RejectedSteps,AllSamples)# all rejected samplesTrialSamples.using(RejectedSteps)# all rejected samplesTrialSamples.using(step_filter=RejectedSteps)# all rejected samplesActiveSamples @ (AllSteps,Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleActiveSamples.using(sample_filter=Ensemble(my_ensemble))# samples repeated with correct weight for this ensembleTrialSamples.using(RejectedSteps,Ensemble(my_ensemble))# rejected samples for this ensemble EDIT: What those simpler examples would look like with the target syntax (probably not near future, though): TrialSamples @RejectedSteps# all rejected samplesActiveSamples @Ensemble(my_ensemble)# samples repeated with correct weight for this ensembleTrialSamples @ (RejectedSteps&Ensemble(my_ensemble))# rejected samples for this ensemble Parentheses are necessary for logical combinations because |
RejectedSteps & CanonicalMover(my_mover_type) Definetly go for the double CamelCase ( you can even make About the One question about the output order. This seems to always return an outer loop over the steps with an inner loop over the ensembles, is there any ways for a user to invert those two loops with this syntax? |
dwhswenson commentedMay 17, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Thanks for the feedback!
I originally had it as a class, but I didn't like the parentheses after RejectedSteps=StepFilterCondition(lambdastep:notstep.change.accepted,name="RejectedSteps") (where the defmy_condition(step):# did I mention that I added an extractor called `ShootingPoints`?returncv(ShootingPoints(step))<0.3CustomFilter=StepFilterCondition(my_condition,name="Custom") You might need to use that filter as
Fair enough. Any thoughts on the TrialSamples.using(RejectedSteps,Ensemble(my_ensemble)) This still requires both arguments, but with this you have all the standard introspection into Python functions that IDEs use for help. I'm starting to think this should be the preferred syntax -- I think it reads pretty well. Internally, it's just a wrapper around creating the
No, and I'm not sure that's even possible. The inner loop is over I'm guessing you're thinking about something like standard TIS analysis, where you want to group trajectories by the ensemble? We have the # samples with unique accepted trajectories, sorted by ensemble# assuming all trajectory generation is in shooting steps (no shifting, etc.)unique_traj_samples=TrialSamples.using(AcceptedSteps&ShootingSteps)group_by_ensemble(unique_traj_samples(steps)) Did you have a use case in mind other than a group-by operation? |
With the python introspection, this seems like a good idea. I have a slight preference for
Nope, the group-by was the one I was thinking about. |
I've been thinking more about the nested levels of filtering, and it seems that my thinking was a little off before. First, some terminology I've been using (would love to know if there's more official terms for these things):
When I started on this, the extractors I was working on all returned lists of samples, so it made sense to add a second layer of filtering for the samples. But for other extractors, it doesn't make sense: So for a general extractor ( extractor.using(step_filter) For extractors that return lists of objects, here are the possible syntaxes: # (A) current implementationextractor.using(step_filter,secondary_filter)TrialSamples.using(RejectedSteps,Replica(0))# (B) Add a `with_filter` method to extractor-filters that sets the secondary filterextractor.using(step_filter).with_filter(secondary_filter)TrialSamples.using(RejectedSteps).with_filter(Replica(0)) In (B) I called it Any thoughts? I think I'm leaning toward (B) since that should make it possible to make only one class for # allows thisforstepinTrialSamples.using(RejectedSteps,steps=steps): ...# instead offorstepinTrialSamples.using(RejectedSteps)(steps):# the )( often confuses new users ...# orcustom_filter=TrialSamples.using(RejectedSteps)# gets rid of )( but adds lineforstepincustom_filter(steps): ...# it even allowsforstepinTrialSamples.using(steps=steps):# default filter is AllSteps ... |
So the quickest somewhat similar reference is from Now, this is just my personal impression as someone who likes to think of this data as a DataFrame with a row for every MCStep, which can be pivoted arbitrarily (like a I personally have no preference between A and B, but as you describe it I do think B would lead to cleaner user facing code |
I think of it as more of filter and apply, in the pandas sense, because technically it can apply any function to the step. But I think that, to users, "apply" might seem like an odd term here. So I guess I’ll stick with extract! I also just noticed that the syntax using # to get the ensemble for a replica at each MC stepfollow_replica=ActiveSamples.using(AllSteps).with_filter(Replica(0))history= [sample.ensembleforsampleinfollow_replica(storage.steps)]# write things in the order of:# what you want, what to iterate over, the optional if-statement to filter |
@sroet (and@gyorgy-hantal or anyone else who might be interested) : Before I write tests for this stuff, could you take a look at thedocumentation notebook and give any feedback on this implementation? I am thinking I'll probably convert these to One point I'd like advice on: We have "extractors," which extract some piece of information from the step, and "step filters", which select certain steps based on some property. For the most part, there isn't much namespace collision ( I'm thinking about importing all of these into the |
Seems fine by me.
I do agree that that feels correct, so +1 for snake_case
Is there a reason we could not just switch behaviour based on call input type? Is there any overlap? (a call with a
I am in favor for importing on top namespace, but do have split files for |
While I think that's theoretically possible, I think it might be really confusing. It isn't as simple as a factory that returns different objects depending on the setup: # as a filterfilt1=canonical_mover('shooting')shooting_steps=list(filt1(storage.steps))filt2=canonical_mover(scheme['shooting'][0])shooter0_steps=list(filt2(storage.steps))# as an extractormover=canonical_mover(step)ef=canonical_mover.using(rejected_steps)n_rejected_per_mover=collections.Counter(ef(storage.steps)) Really getting this to work would require either weird flags setting the type of object it was acting as, or some metaclass work. So far, all sp=shooting_points(step)# already doing thismover=canonical_movers(step)ef=canonical_movers.using(rejected_steps)... It risks subtle errors (most likely incorrectly using
I was thinking the same more for coding purposes (shorter files; clearer where to find each thing). I hadn't considered the autocompletion, but that trick gives another good reason to do that split! |
This probably the most reasonable solution. |
Started on docs for this (ind17a04a). There will be a prominent note discussing this at the beginning of the section on extractors. |
… into analysis_filters
... well that was easy
… into analysis_filters
Uh oh!
There was an error while loading.Please reload this page.
This PR introduces analysis filters, an idea I've had for a while but just finally figured out how to implement correctly.
Basically, the idea is that a lot of OPS analysis code looks something like:
Here the hypothetical user is checking rejected trials of a specific mover and a specific ensemble. It's some messy code already a few levels of indent in, and with all that
step.changestuff they need to remember. (And this is even using the hack that you can get a sample for a specific ensemble by converting the list of samples to aSampleSet.)[EDIT: I updated the above to actually reflect the same behavior as below -- it's a little more complicated]
Here's how it works with the filters I'm introducing here:
I guess it doesn't save too much vertical space, but it does save some characters, and I think it will be much easier to learn/teach this approach.
More usage in the notebook added in this PR. For easy use cases, it's really nice (
for step in RejectedSteps(steps))For discussion: does it make more sense stylistically to be:
or
Formally, I think PEP8 would advise
since rejected states is actually a constant instance, while canonical mover really is a class. But consistency in user experience with filters is more important than foolish consistency with PEP8.