Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[WIP] New "snapshot features" implementation#1059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
dwhswenson wants to merge6 commits intoopenpathsampling:dev-2.0
base:dev-2.0
Choose a base branch
Loading
fromdwhswenson:attach_features

Conversation

@dwhswenson
Copy link
Member

@dwhswensondwhswenson commentedAug 27, 2021
edited
Loading

Background

"Snapshot features" are a really powerful tool in OPS that make it easy to create a snapshot class based on the fields that should be associated with that snapshot. Those fields vary from engine to engine.

Unfortunately, the function that handles this, theattach_features decorator, is written in a way that makes it extremely hard to maintain or test. It is a single function taking over 500 lines in the file, which has a CodeClimate cognitive complexity of 203. (For context, CodeClimate recommends keeping that number under 5, and I find that anything over 10 or 15 can be a headache to deal with.)

Additionally, the documentation for snapshot features is not very good. The best docs are in the docstring for the_decorator method nested insideattach_features, which can't be viewed unless you actually read the fileopenpathsampling/engines/features/base.py. This is not easy to find, and even that docstring doesn't quite fully correspond to what the code does.

For these reasons, I want to reimplement theattach_features decorator, and simplify some the process of creating snapshot features.

A big part of the functionality ofattach_features is that it actually writes the code for several methods on the snapshot, including:copy,copy_to,create_reversed,create_empty,__init__, andinit_empty. The reason to writecode here, instead of just using the information in the__features__ object, is to get the correct signature and docstring for introspection. For example, we can do:

In [1]:importopenpathsamplingaspathsIn [2]:paths.engines.toy.Snapshot?Initsignature:paths.engines.toy.Snapshot(velocities=None,coordinates=None,engine=None)Docstring:Simulationsnapshot.OnlyreferencestocoordinatesandvelocitiesAttributes----------velocities :numpy.ndarray,shape=(atoms,3),dtype=numpy.float32atomicvelocitiescoordinates :numpy.ndarray,shape=(atoms,3),dtype=numpy.float32atomiccoordinatesengine : :class:`openpathsampling.DynamicsEngine`referencetotheengineusedtogeneratethesnapshot

where the signature and docstrings are determined by the features that are attached to the snapshot.

Changes in this PR

The goal here is to simplify both the process of creating new snapshot features and the code for attaching snapshot features, as well as to clean up some of the code that was autogenerated.

NewFeatureCollection class

The current implementation uses aFeatureTuple namedtuple which is populated by theattach_features decorator. This has been replaced with aFeatureCollection class. In general the approach is to create aFeatureCollection for each feature module using itsfrom_module method, and then to combine all theFeatureCollections into oneFeatureCollection bysumming them.

Simplerattach_features decorator

This replaces the oldattach_features decorator with a new one, which has considerably simpler code structure. First, all the features are gathered into a singleFeatureCollection, then the input class is decorated by adding the necessary functions, etc.

Separate code writing

Rather than including all the code for code writing/compiling in theattach_features method, this is put in a separate module, which should improve testability.

API break: Removingcopy,copy_to

Snapshots are immutable data-containing objects. I do not think there is a need for thecopy orcopy_to methods. There is, however, acopy_with_replacement method -- if you give it no arguments, it will create an exact copy of your snapshot with a different UUID. It also provides the ability to create modified version of snapshots.

API break: Removinginit_empty, fixingcreate_empty

I don't see a purpose forinit_empty that isn't already handled bycreate_empty (again, with the understanding that these are immutable objects).create_empty might be used as a template or a placeholder, so I'm keeping it around, but the code it currently generates is wrong: it should be aclassmethod, but it isn't marked as one, and it usesself in the signature butcls in the code.

Moving more functionality into superclass methods (newSnapshot base class)

Rather than writing out self-contained methods for each class, the new code keeps the core logic in a newSnapshot superclass, and simply passes the parameters on when needed. The key observation here is that we need dynamically-generated code for the purpose of signatures and docstrings, but the logic can be in general functions. This should be better for testing and for debugging.

NewParameter class to define field for a snapshot

In the old approach, you needed to go through the following steps to define a snapshot field (e.g.,coordinates orvelocities)

  1. Add the name to thevariables list in the feature module
  2. Decide how to change that feature on time reversal: if it was abool that flipped, add it to the list namedflip in the module. If it was a number that changed sign, add it to the list namedminus in the feature.
  3. If it is a numpy array, add it to the list namednumpy in the feature.
  4. If it will be loaded by a lazy proxy, add it to the list namedlazy in the feature.
  5. Add a docstring to the module file, covering all thevariables included.

In the new approach, we have aParameter class that will contain the relevant information. With the old way, you may have needed to consider several top-level names for a given functionality (does this parameter go in theflip list for time reversal? in theminus list? neither? certainly not both, that would be nonsense -- but you could write code that tried that!) The new approach makes a class that describes each parameter, and the specific types of functionality (what to do on time reversal; how to copy; whether to lazy load) are contained in that object.

I think this is an easier way to think about these things, and it also has the advantage that instead of predefining certain behaviors (e.g., for copying), developers are welcome to create their own.

Time-reversal is treated a little differently than copying. Copying is an issue in computer programming, and is broadly relevant. Time-reversal is specific to MD. It's entirely possible that other such specific problems will arise in some subfield, so I tried to design a generic solution for them. Therefore,'time_reverse' is an entry in adict calledoperations. Theoperations can be extended with other domain-specific needs as they are found.

Decorator to declare instance methods to attach

If you want to attach functions that would become instance methods of your snapshot class, the old way was to define the function in the feature module and include its name in the list namedfunctions. The new way is to decorate the function with the@function_feature decorator.

(Temporary?) Code to generate newParameters from old style feature modules

I think the newParameter approach is more intuitive, but for now I've implemented methods inFeatureCollection that actually convert the old version to the new version. TODO: details to come.

Unchanged:@property

Just as before, and@property in a feature module becomes a property of the snapshot class.

@dwhswensondwhswenson added the 2.0issues and PRs for the 2.0 release labelAug 27, 2021
@dwhswenson
Copy link
MemberAuthor

Drafting this PR pretty early (and while it's still in the experimental package) so people interested in developing engines (@bdice,@davidkastner) can see some of what's coming soon. If I keep this inexperimental, I might retarget the PR for the 1.x cycle. That would be useful if people wanted to immediately use some of this functionality.

@dwhswenson
Copy link
MemberAuthor

@sroet (and others subscribed to this thread): I'd appreciate some feedback on whether to keep support for the old-style snapshot feature modules or not.

As currently implemented, this code works with the old snapshot feature modules (although there are some API-breaking differences in the resulting snapshots). However, if we will be removing support for the old style of features, I won't bother writing tests for that part.

You can get a sense for the differences in these from therough example modules I've drafted to be part of the tests. It's pretty straightforward to convert old-style to new-style (that's essentially how support for old-style works internally), and I expect to do that in our built-in features. The only question is whether it's worth leaving support for the old style in 2.0.

@bdice
Copy link
Contributor

I took a brief look at the code. I am not very familiar with the old style, so it is a little hard to provide meaningful feedback. In my understanding, most of the code for adding features to snapshots is internal to the OPS package and its implementations of engines/snapshots. In my understanding, most users don't interact with this API directly. If that's correct, I would definitely advocate for removing legacy/old methods to simplify the code base for new developers. I would prefer for there to be a single supported way to add features, because otherwise it requires a lot of effort to understand what is "old" (and should be avoided) and what is "new."

There should be one-- and preferably only one --obvious way to do it.
--The Zen of Python

@sroet
Copy link
Member

@sroet (and others subscribed to this thread): I'd appreciate some feedback on whether to keep support for the old-style snapshot feature modules or not.

First, don't put a lot of weight on my opinion, as I have not interacted with this part of the code (I have never written an engine)

I am in favor of dropping as much legacy code as we can get away with for2.0 as long as there is no user need, mainly to easy maintenance burden. It might be useful to have an unofficialconversion_script just in case users run into issues using OPS2.0 they can easily convert, which would be dropped for any release after 2.0.

One important party that you might want to ask about this any commercial/industry users of OPS (I don't know of any, but I remember there was some industry collaboration/adoption supposed to happen from the original OPS funding?)

I also fully agree with@bdice's comments about just having one way of doing things making it easier for new developers to figure out the "right" way of doing things

@dwhswenson
Copy link
MemberAuthor

@sroet@bdice Thanks to both of you for the feedback!

Some further context should be given here: Yes, this is code that is mainly used by developers of engines. I am aware of one engine that has been developed for OPS which has not been officially announced (still hoping to convince the devs to contribute that back before 2.0.) That was in an academic context. There may be other engines out there that I don't know about.

As far as one obvious way, the old way would definitely be deprecated, which should encourage the current "obvious" way -- the only question is if it ispossible to use the old way.

I don't think there would be a conversion script as such, but I do think there could easily be a guide to developers. With the caveat that old and new snapshots aren't identical due to API breaks, most of the update guide boils down to:

  • For any item invariables, create aParameter for it
  • If your feature was innumpy, usecopy=numpy.copy inParameter
  • If your feature was inminus, useoperations={'time_reverse': operator.neg} inParameter
  • If your feature was inflip, useoperations={'time_reverse': operator.invert} inParameter
  • If your feature was inlazy, uselazy=True inParameter
  • For any function infunctions, decorate it with@feature_function
  • Docstrings go in each individualParameter as an instance variable, instead of using the module docstring for all parameters

(This covers the changes here; doesn't include differences between SimStore and netcdfplus.)

Assuming you agree that this is straightforward enough (caveat: I may find other issues as I do the migration) then I'll just use the code that bridges between the two types of snapshots until I fully update all snapshot features that are in core. If we ever need code to support the old versions (which would mainly be to provide workarounds for users who can't get the changes made for some outside engine), we can always dig it out of this PR.

@sroet
Copy link
Member

Assuming you agree that this is straightforward enough (caveat: I may find other issues as I do the migration) then I'll just use the code that bridges between the two types of snapshots until I fully update all snapshot features that are in core.

Seems straightforward enough to me

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

2.0issues and PRs for the 2.0 release

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@dwhswenson@bdice@sroet

[8]ページ先頭

©2009-2025 Movatter.jp