- Notifications
You must be signed in to change notification settings - Fork923
Re-organizing VASP I/O code#3997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This reverts commit636165b.
| @@ -0,0 +1,107 @@ | |||
| # ruff: noqa: PGH003 | |||
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
shyuep commentedAug 13, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Here are my comments:
I refer you to the scikit-learn module and class organization. Please use that as a guide on how to organize. Finally, I would point out that many of sklearn's module files are about the same length as sets.py (2800-3200 lines). They do not break up a module for maintenance - they break it up when there is a logic to the organization. |
mkhorton commentedAug 13, 2024
The sklearn comparison is illustrative: even where there are large LOC, often there is only a single class per file, and they are quite readable. This is not the case for these specific files in pymatgen. I am not making axiomatic statements; I agree that a high number of LOC is not in itself an error. Long-time pymatgen developers (we are in good company) might know exactly where in the file to go to make an edit, but before you can get there, one has to read and understand the structure of the code. It is nice to know what classes are available, for example, without needing to read the entire file first. Point 3 I concur; perhaps it is better to split up the input sets according to functionality instead. |
janosh commentedAug 15, 2024
Re 3, i think the most common use case is to have consistent settings across whatever mixture of workflows you run (relax, band gap, EOS, phonon, etc.). so after having run some number of relaxations, a user is unlikely to care about half a dozen other band gap workflows with inconsistent VASP settings to their relaxations. more likely, the only thing they want to know is if a band gap workflow from the same lineage (say MP) exists. hence splitting input sets by lineage makes sense imo. i guess it doesn't have to be one or the other. if you look at |
shyuep commentedAug 21, 2024
If splitting by source is preferred, I am ok with that. Note that I don't really think atomate2's organizational structure should be the reference here. In any case, if the module split is a private implementation, then it really doesn't matter since we can always reorganize as needed without breaking downstream codes. |
Summary
Re-opening changes initially introduced by@janosh in#3865 and later reverted by@shyuep (seethread).
I will be taking responsibility for this PR. This PR will require approval from both@janosh and@shyuep before merging.
Specifically,
__dir__()to make sureall imports remain unaffected and that there are no backwards-incompatible changes.pymatgen.io.vasp.inputsandpymatgen.io.vasp.outputsinto separate files.This change is well-motivated to improve maintainability.
I hope we can get this PR merged without too much difficulty. Thank you again to@janosh for starting this PR, to@Andrew-S-Rosen for the bug report due to the errors caused, and to@shyuep for implementing a rapid fix.