- Notifications
You must be signed in to change notification settings - Fork516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Okina GPU [okina]#631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm a really impressed by all the work done. It gives a good idea of all the different parts of the code that have to change to port the first set of functionalities of MFEM to GPU. From the MFEM core side, I think most of the changes are not too disruptive. I really like that the device/backend specific functions are exposed and a developer doesn't have to guess what has to be written, the level of abstraction is at a granularity where it is easy for a developer to understand what the backend functions have to do. I think l would feel confortable writing a backend in this code. However, I think the design should be pushed a little bit more in the direction of the separation of what is device specific and what is backend specific.
I would give the following definitions of device specific and backend specific:
- Device functions are meant to be uniquely implemented for a specific device. If a better implementation is found for a device function, it should replace the current implementation. For instance, all backends should use the same implementation of
dot
on a GPU (I think). Device functions assume that there exists a best implementation on a device usable for all the different use cases. - Backend functions are meant to have multiple implementations for a specific device. Backend specific functions are selected by the user and are expected to have different performance on different problem configurations.
Instead of having everything inkernels
files, maybe having some indevice
files and other inbackend
files would help clarifying this. And showing how the differentbackend
functions could be selected would be really interesting (amap<Backend, FunctionPointer>
?).
Havingbackend
files would also allow to directly see what needs to implemented by a backend.
Going even further in this design, I think it is important that abackend
can only propose the implementation of a subset of all thebackend
functions, and either select whichbackend
to use when a backend function is not implemented, or use a default backend.
Regarding theArray
,Vector
classes, I would like to see if there is no impact on performance of the double pointer when some computations are achieved on the CPU. If there is a negative impact on performance, we might want to consider keeping themfem::Array
andmfem::Vector
classes as they are and make them compatible with the proposedArray
andVector
classes, they would have to be renamed toDeviceArray
andDeviceVector
(or something else). These classes would have a more limited set of functions (that could grow when needed) that all work both on gpu and on cpu.
I personally find thememoryManager
class a bit scary, and the different macros not so naturals to use. I have difficulties seeing the consequences of such a class as it is operating at a very low level. I think it could be worth introducing this class in a way that we could backtrack in the future, or replace by something else.
I think that thePABilinearForm
is going in the right direction but a lot still needs to be done if we want to be able to propose the different levels of assembly: matrix free, partial assembly, local matrices, global matrix. This could have a non negligible impact on the rest of the design though.
Congratulations@camierjs , impressive work done in such a small time frame!
I'm having trouble running ex1 with -p or -g with any other order than 1 and in 3D, so it's hard to look at performance. Is this a known limitation or is it just something that I'm doing wrong or something that can be fixed? |
Yes Jakub, it is a known limitation: I forgot to re-enable the kernels. |
Building on a Mac fails with the following errors:
|
Okina fallback kernels [okina-fallback-kernels]
…cts to be allocated.
Okina mm object [okina-mm-object]
Small updates in ex1, ex1p, ex6, ex6p.
parallel FE space.
Lambda const propagation [lambda-capture-problem]
Updates in the examples, class BilinearForm and related [okina-bilinearform-updates]
🎉 |
👍 |
Thank you@camierjs,@dmed256,@v-dobrev,@jakubcerveny,@jdahm,@pazner and@YohannDudouit for your contributions to this branch! 🚀 I know it was not easy, and I know that we are not done, but I think this is a really important step which would not have been possible without the hard and persistent work ofDavid,Veselin andJean-Sylvain. Thank you guys, you are truly amazing! |
See the
mfem-4.0-rc1
tasks in#813.