NotificationsYou must be signed in to change notification settings
Fork923
Star1.7k

Re-organizing VASP I/O code#3997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

mkhorton wants to merge2 commits intomaster

base:master

Choose a base branch

fromvasp-split

Draft

Re-organizing VASP I/O code#3997

mkhorton wants to merge2 commits intomasterfromvasp-split

Conversation

Copy link

Member

mkhorton commentedAug 12, 2024

Summary

Re-opening changes initially introduced by@janosh in#3865 and later reverted by@shyuep (seethread).

I will be taking responsibility for this PR. This PR will require approval from both@janosh and@shyuep before merging.

Specifically,

For each module changed, I will manually inspect__dir__() to make sureall imports remain unaffected and that there are no backwards-incompatible changes.
Also splitpymatgen.io.vasp.inputs andpymatgen.io.vasp.outputs into separate files.

This change is well-motivated to improve maintainability.

I hope we can get this PR merged without too much difficulty. Thank you again to@janosh for starting this PR, to@Andrew-S-Rosen for the bug report due to the errors caused, and to@shyuep for implementing a rapid fix.

Matthew Horton added2 commits

August 12, 2024 15:23

Revert "Add back missing sets.py file."

f035def

This reverts commit636165b.

Reapply "Split VASP input sets into submodules (#3865)" (#3989)

1c9684e

This reverts commitf1eb364.

mkhorton requested review fromjanosh andshyuep ascode owners

August 12, 2024 22:29

mkhorton changed the title~~Re-organizing~~Re-organizing VASP I/O code

Aug 12, 2024

mkhorton marked this pull request as draft

August 12, 2024 22:30

DanielYang59 reviewed

Aug 13, 2024

View reviewed changes

src/pymatgen/io/vasp/sets/lobster.py

		@@ -0,0 +1,107 @@
		# ruff: noqa: PGH003

This comment was marked as resolved.

Copy link

Member

shyuep commentedAug 13, 2024•
edited
Loading

Here are my comments:

Axiom 1: No. of lines is not a good metric of whether a module should be broken up. In even the most basic IDEs, there are tools to show the class structure, not to mention keyboard shortcuts to jump to any class/method name/line. I can't recall when the last time I had to use mouse scrolling for any file was. Note that the code files in vasp io are hardly the largest ones in pymatgen.
Given Axiom 1, the only reason to break up a module into separate pieces is if there is alogical consistency to the subparts. While I can get onboard with breaking up sets.py (which has many classes), inputs.py and outputs.py should stay as they are. There is no good reason to split them up. Inputs only has 5 classes and they all pertain to vasp inputs.
Even when breaking up sets.py, there is a certain inconsistency to the current choice. Why should input sets be split based on who wrote them, i.e., MP, MIT, MVL? Why not based on type of calculation (relaxation, static, electronic structure, NEB, etc.)? Do people select input sets based on organization? Isn't it more logical that they select an input set based on what they want to do?
In so far as module subdivision is merely a developer-related detail (which it seems you are suggesting it is, since the primary justification was "ease of maintenance") and not a user API detail, the modules should all be private, i.e., with a "_" prefix. This gives the flexibility to completely reorganize when needed without some user seeing his codefrom pymatgen.vasp.io.sets.mp import MPRelaxSet suffer from backwards incompatible breakages. I do not want to see anyone using that line - it must befrom pymatgen.vasp.io.sets import MPRelaxSet, even if the sets reside in separate files.

I refer you to the scikit-learn module and class organization. Please use that as a guide on how to organize. Finally, I would point out that many of sklearn's module files are about the same length as sets.py (2800-3200 lines). They do not break up a module for maintenance - they break it up when there is a logic to the organization.

Copy link

MemberAuthor

mkhorton commentedAug 13, 2024

The sklearn comparison is illustrative: even where there are large LOC, often there is only a single class per file, and they are quite readable. This is not the case for these specific files in pymatgen. I am not making axiomatic statements; I agree that a high number of LOC is not in itself an error.

Long-time pymatgen developers (we are in good company) might know exactly where in the file to go to make an edit, but before you can get there, one has to read and understand the structure of the code. It is nice to know what classes are available, for example, without needing to read the entire file first.

Point 3 I concur; perhaps it is better to split up the input sets according to functionality instead.

Copy link

Member

janosh commentedAug 15, 2024

Re 3, i think the most common use case is to have consistent settings across whatever mixture of workflows you run (relax, band gap, EOS, phonon, etc.). so after having run some number of relaxations, a user is unlikely to care about half a dozen other band gap workflows with inconsistent VASP settings to their relaxations. more likely, the only thing they want to know is if a band gap workflow from the same lineage (say MP) exists. hence splitting input sets by lineage makes sense imo.

i guess it doesn't have to be one or the other. if you look atatomate2/vasp/jobs, it has a mixture of splits by lineage (mp.py,matpes.py,lobster.py,amset.py) and by workflow type (md.py,elastic.py,phonon.py, ...)

Copy link

Member

shyuep commentedAug 21, 2024

If splitting by source is preferred, I am ok with that. Note that I don't really think atomate2's organizational structure should be the reference here. In any case, if the module split is a private implementation, then it really doesn't matter since we can always reorganize as needed without breaking downstream codes.

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-organizing VASP I/O code#3997

Are you sure you want to change the base?

Re-organizing VASP I/O code#3997

Uh oh!