"Snapshot features" are a really powerful tool in OPS that make it easy to create a snapshot class based on the fields that should be associated with that snapshot. Those fields vary from engine to engine.

Unfortunately, the function that handles this, theattach_features decorator, is written in a way that makes it extremely hard to maintain or test. It is a single function taking over 500 lines in the file, which has a CodeClimate cognitive complexity of 203. (For context, CodeClimate recommends keeping that number under 5, and I find that anything over 10 or 15 can be a headache to deal with.)

Additionally, the documentation for snapshot features is not very good. The best docs are in the docstring for the_decorator method nested insideattach_features, which can't be viewed unless you actually read the fileopenpathsampling/engines/features/base.py. This is not easy to find, and even that docstring doesn't quite fully correspond to what the code does.

For these reasons, I want to reimplement theattach_features decorator, and simplify some the process of creating snapshot features.

A big part of the functionality ofattach_features is that it actually writes the code for several methods on the snapshot, including:copy,copy_to,create_reversed,create_empty,__init__, andinit_empty. The reason to writecode here, instead of just using the information in the__features__ object, is to get the correct signature and docstring for introspection. For example, we can do:

In [1]:importopenpathsamplingaspathsIn [2]:paths.engines.toy.Snapshot?Initsignature:paths.engines.toy.Snapshot(velocities=None,coordinates=None,engine=None)Docstring:Simulationsnapshot.OnlyreferencestocoordinatesandvelocitiesAttributes----------velocities :numpy.ndarray,shape=(atoms,3),dtype=numpy.float32atomicvelocitiescoordinates :numpy.ndarray,shape=(atoms,3),dtype=numpy.float32atomiccoordinatesengine : :class:`openpathsampling.DynamicsEngine`referencetotheengineusedtogeneratethesnapshot

where the signature and docstrings are determined by the features that are attached to the snapshot.

Changes in this PR

The goal here is to simplify both the process of creating new snapshot features and the code for attaching snapshot features, as well as to clean up some of the code that was autogenerated.

New`FeatureCollection` class

The current implementation uses aFeatureTuple namedtuple which is populated by theattach_features decorator. This has been replaced with aFeatureCollection class. In general the approach is to create aFeatureCollection for each feature module using itsfrom_module method, and then to combine all theFeatureCollections into oneFeatureCollection bysumming them.

Simpler`attach_features` decorator

This replaces the oldattach_features decorator with a new one, which has considerably simpler code structure. First, all the features are gathered into a singleFeatureCollection, then the input class is decorated by adding the necessary functions, etc.

Separate code writing

Rather than including all the code for code writing/compiling in theattach_features method, this is put in a separate module, which should improve testability.

API break: Removing`copy`,`copy_to`

Snapshots are immutable data-containing objects. I do not think there is a need for thecopy orcopy_to methods. There is, however, acopy_with_replacement method -- if you give it no arguments, it will create an exact copy of your snapshot with a different UUID. It also provides the ability to create modified version of snapshots.

API break: Removing`init_empty`, fixing`create_empty`

I don't see a purpose forinit_empty that isn't already handled bycreate_empty (again, with the understanding that these are immutable objects).create_empty might be used as a template or a placeholder, so I'm keeping it around, but the code it currently generates is wrong: it should be aclassmethod, but it isn't marked as one, and it usesself in the signature butcls in the code.

Moving more functionality into superclass methods (new`Snapshot` base class)

Rather than writing out self-contained methods for each class, the new code keeps the core logic in a newSnapshot superclass, and simply passes the parameters on when needed. The key observation here is that we need dynamically-generated code for the purpose of signatures and docstrings, but the logic can be in general functions. This should be better for testing and for debugging.

New`Parameter` class to define field for a snapshot

In the old approach, you needed to go through the following steps to define a snapshot field (e.g.,coordinates orvelocities)

Add the name to thevariables list in the feature module
Decide how to change that feature on time reversal: if it was abool that flipped, add it to the list namedflip in the module. If it was a number that changed sign, add it to the list namedminus in the feature.
If it is a numpy array, add it to the list namednumpy in the feature.
If it will be loaded by a lazy proxy, add it to the list namedlazy in the feature.
Add a docstring to the module file, covering all thevariables included.

In the new approach, we have aParameter class that will contain the relevant information. With the old way, you may have needed to consider several top-level names for a given functionality (does this parameter go in theflip list for time reversal? in theminus list? neither? certainly not both, that would be nonsense -- but you could write code that tried that!) The new approach makes a class that describes each parameter, and the specific types of functionality (what to do on time reversal; how to copy; whether to lazy load) are contained in that object.

I think this is an easier way to think about these things, and it also has the advantage that instead of predefining certain behaviors (e.g., for copying), developers are welcome to create their own.

Time-reversal is treated a little differently than copying. Copying is an issue in computer programming, and is broadly relevant. Time-reversal is specific to MD. It's entirely possible that other such specific problems will arise in some subfield, so I tried to design a generic solution for them. Therefore,'time_reverse' is an entry in adict calledoperations. Theoperations can be extended with other domain-specific needs as they are found.

Decorator to declare instance methods to attach

If you want to attach functions that would become instance methods of your snapshot class, the old way was to define the function in the feature module and include its name in the list namedfunctions. The new way is to decorate the function with the@function_feature decorator.

(Temporary?) Code to generate new`Parameter`s from old style feature modules

I think the newParameter approach is more intuitive, but for now I've implemented methods inFeatureCollection that actually convert the old version to the new version. TODO: details to come.

Unchanged:`@property`

Just as before, and@property in a feature module becomes a property of the snapshot class.

start to new attach_features

eafea92

dwhswenson added the 2.0issues and PRs for the 2.0 release label

Aug 27, 2021

Copy link

MemberAuthor

dwhswenson commentedAug 27, 2021

Drafting this PR pretty early (and while it's still in the experimental package) so people interested in developing engines (@bdice,@davidkastner) can see some of what's coming soon. If I keep this inexperimental, I might retarget the PR for the 1.x cycle. That would be useful if people wanted to immediately use some of this functionality.

dwhswenson added5 commits

August 27, 2021 19:53

more on new attach_features: more unified approach

77e2a0d

cleanup; some docstrings

de1d97f

start to tests for write_code

d8b2856

start to tests on attach_feature/core

67fd0ed

add __init__.py for subpackage

3587310

Copy link

MemberAuthor

dwhswenson commentedAug 28, 2021

@sroet (and others subscribed to this thread): I'd appreciate some feedback on whether to keep support for the old-style snapshot feature modules or not.

As currently implemented, this code works with the old snapshot feature modules (although there are some API-breaking differences in the resulting snapshots). However, if we will be removing support for the old style of features, I won't bother writing tests for that part.

You can get a sense for the differences in these from therough example modules I've drafted to be part of the tests. It's pretty straightforward to convert old-style to new-style (that's essentially how support for old-style works internally), and I expect to do that in our built-in features. The only question is whether it's worth leaving support for the old style in 2.0.

Copy link

Contributor

bdice commentedAug 28, 2021

I took a brief look at the code. I am not very familiar with the old style, so it is a little hard to provide meaningful feedback. In my understanding, most of the code for adding features to snapshots is internal to the OPS package and its implementations of engines/snapshots. In my understanding, most users don't interact with this API directly. If that's correct, I would definitely advocate for removing legacy/old methods to simplify the code base for new developers. I would prefer for there to be a single supported way to add features, because otherwise it requires a lot of effort to understand what is "old" (and should be avoided) and what is "new."

There should be one-- and preferably only one --obvious way to do it.
--The Zen of Python

Copy link

Member

sroet commentedAug 29, 2021

@sroet (and others subscribed to this thread): I'd appreciate some feedback on whether to keep support for the old-style snapshot feature modules or not.

First, don't put a lot of weight on my opinion, as I have not interacted with this part of the code (I have never written an engine)

I am in favor of dropping as much legacy code as we can get away with for2.0 as long as there is no user need, mainly to easy maintenance burden. It might be useful to have an unofficialconversion_script just in case users run into issues using OPS2.0 they can easily convert, which would be dropped for any release after 2.0.

One important party that you might want to ask about this any commercial/industry users of OPS (I don't know of any, but I remember there was some industry collaboration/adoption supposed to happen from the original OPS funding?)

I also fully agree with@bdice's comments about just having one way of doing things making it easier for new developers to figure out the "right" way of doing things

Copy link

MemberAuthor

dwhswenson commentedAug 29, 2021

@sroet @bdice Thanks to both of you for the feedback!

Some further context should be given here: Yes, this is code that is mainly used by developers of engines. I am aware of one engine that has been developed for OPS which has not been officially announced (still hoping to convince the devs to contribute that back before 2.0.) That was in an academic context. There may be other engines out there that I don't know about.

As far as one obvious way, the old way would definitely be deprecated, which should encourage the current "obvious" way -- the only question is if it ispossible to use the old way.

I don't think there would be a conversion script as such, but I do think there could easily be a guide to developers. With the caveat that old and new snapshots aren't identical due to API breaks, most of the update guide boils down to:

For any item invariables, create aParameter for it
If your feature was innumpy, usecopy=numpy.copy inParameter
If your feature was inminus, useoperations={'time_reverse': operator.neg} inParameter
If your feature was inflip, useoperations={'time_reverse': operator.invert} inParameter
If your feature was inlazy, uselazy=True inParameter
For any function infunctions, decorate it with@feature_function
Docstrings go in each individualParameter as an instance variable, instead of using the module docstring for all parameters

(This covers the changes here; doesn't include differences between SimStore and netcdfplus.)

Assuming you agree that this is straightforward enough (caveat: I may find other issues as I do the migration) then I'll just use the code that bridges between the two types of snapshots until I fully update all snapshot features that are in core. If we ever need code to support the old versions (which would mainly be to provide workarounds for users who can't get the changes made for some outside engine), we can always dig it out of this PR.

Copy link

Member

sroet commentedAug 31, 2021

Assuming you agree that this is straightforward enough (caveat: I may find other issues as I do the migration) then I'll just use the code that bridges between the two types of snapshots until I fully update all snapshot features that are in core.

Seems straightforward enough to me

Labels

2.0

issues and PRs for the 2.0 release

Movatterモバイル変換

[WIP] New "snapshot features" implementation#1059

Are you sure you want to change the base?

[WIP] New "snapshot features" implementation#1059

Uh oh!

Conversation

dwhswenson commentedAug 27, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Background

Changes in this PR

NewFeatureCollection class

Simplerattach_features decorator

Separate code writing

API break: Removingcopy,copy_to

API break: Removinginit_empty, fixingcreate_empty

Moving more functionality into superclass methods (newSnapshot base class)

NewParameter class to define field for a snapshot

Decorator to declare instance methods to attach

(Temporary?) Code to generate newParameters from old style feature modules

Unchanged:@property

Uh oh!

dwhswenson commentedAug 27, 2021

Uh oh!

dwhswenson commentedAug 28, 2021

Uh oh!

bdice commentedAug 28, 2021

Uh oh!

sroet commentedAug 29, 2021

Uh oh!

dwhswenson commentedAug 29, 2021

Uh oh!

sroet commentedAug 31, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dwhswenson commentedAug 27, 2021•
edited
Loading

New`FeatureCollection` class

Simpler`attach_features` decorator

API break: Removing`copy`,`copy_to`

API break: Removing`init_empty`, fixing`create_empty`

Moving more functionality into superclass methods (new`Snapshot` base class)

New`Parameter` class to define field for a snapshot

(Temporary?) Code to generate new`Parameter`s from old style feature modules

Unchanged:`@property`