I'm a really impressed by all the work done. It gives a good idea of all the different parts of the code that have to change to port the first set of functionalities of MFEM to GPU. From the MFEM core side, I think most of the changes are not too disruptive. I really like that the device/backend specific functions are exposed and a developer doesn't have to guess what has to be written, the level of abstraction is at a granularity where it is easy for a developer to understand what the backend functions have to do. I think l would feel confortable writing a backend in this code. However, I think the design should be pushed a little bit more in the direction of the separation of what is device specific and what is backend specific.
I would give the following definitions of device specific and backend specific:

Device functions are meant to be uniquely implemented for a specific device. If a better implementation is found for a device function, it should replace the current implementation. For instance, all backends should use the same implementation ofdot on a GPU (I think). Device functions assume that there exists a best implementation on a device usable for all the different use cases.
Backend functions are meant to have multiple implementations for a specific device. Backend specific functions are selected by the user and are expected to have different performance on different problem configurations.

Instead of having everything inkernels files, maybe having some indevice files and other inbackend files would help clarifying this. And showing how the differentbackend functions could be selected would be really interesting (amap<Backend, FunctionPointer>?).
Havingbackend files would also allow to directly see what needs to implemented by a backend.
Going even further in this design, I think it is important that abackend can only propose the implementation of a subset of all thebackend functions, and either select whichbackend to use when a backend function is not implemented, or use a default backend.

Regarding theArray,Vector classes, I would like to see if there is no impact on performance of the double pointer when some computations are achieved on the CPU. If there is a negative impact on performance, we might want to consider keeping themfem::Array andmfem::Vector classes as they are and make them compatible with the proposedArray andVector classes, they would have to be renamed toDeviceArray andDeviceVector (or something else). These classes would have a more limited set of functions (that could grow when needed) that all work both on gpu and on cpu.

I personally find thememoryManager class a bit scary, and the different macros not so naturals to use. I have difficulties seeing the consequences of such a class as it is operating at a very low level. I think it could be worth introducing this class in a way that we could backtrack in the future, or replace by something else.

I think that thePABilinearForm is going in the right direction but a lot still needs to be done if we want to be able to propose the different levels of assembly: matrix free, partial assembly, local matrices, global matrix. This could have a non negligible impact on the rest of the design though.

Congratulations@camierjs , impressive work done in such a small time frame!

examples/ex1.cpp OutdatedShow resolvedHide resolved

fem/bilinearform.hpp OutdatedShow resolvedHide resolved

fem/bilininteg.hpp OutdatedShow resolvedHide resolved

general/okina.hpp OutdatedShow resolvedHide resolved

linalg/kernels/densemat.cpp OutdatedShow resolvedHide resolved

fem/kernels/kIntDiffusionAssemble.cpp OutdatedShow resolvedHide resolved

fem/kernels/kIntDiffusionMultAdd.cpp OutdatedShow resolvedHide resolved

Copy link

Member

jakubcerveny commentedOct 17, 2018

I'm having trouble running ex1 with -p or -g with any other order than 1 and in 3D, so it's hard to look at performance. Is this a known limitation or is it just something that I'm doing wrong or something that can be fixed?

Copy link

MemberAuthor

camierjs commentedOct 18, 2018

Yes Jakub, it is a known limitation: I forgot to re-enable the kernels.
I'll do this as soon as I can.

v-dobrev reviewed

Oct 18, 2018

View reviewed changes

fem/bilinearform.hpp OutdatedShow resolvedHide resolved

Copy link

Member

v-dobrev commentedOct 18, 2018

Building on a Mac fails with the following errors:

general/memmng.cpp:319:24: error: member reference type 'struct __darwin_mcontext64 *' is a pointer; did you mean to use '->'?   context->uc_mcontext.gregs[REG_RIP]++;   ~~~~~~~~~~~~~~~~~~~~^                       ->general/memmng.cpp:319:25: error: no member named 'gregs' in '__darwin_mcontext64'   context->uc_mcontext.gregs[REG_RIP]++;   ~~~~~~~~~~~~~~~~~~~~ ^general/memmng.cpp:319:31: error: use of undeclared identifier 'REG_RIP'   context->uc_mcontext.gregs[REG_RIP]++;                              ^

v-dobrev reviewed

Oct 18, 2018

View reviewed changes

examples/ex1.cpp OutdatedShow resolvedHide resolved

v-dobrev reviewed

Oct 18, 2018

View reviewed changes

examples/ex1.cpp OutdatedShow resolvedHide resolved

v-dobrev reviewed

Oct 18, 2018

View reviewed changes

examples/ex1.cpp OutdatedShow resolvedHide resolved

camierjs added3 commits

April 10, 2019 10:41

Add mass fallback kernels

8183e23

Revert example ex1

aad6c4c

Merge pull request#850from mfem/okina-fallback-kernels

7f1352e

Okina fallback kernels [okina-fallback-kernels]

vladotomov approved these changes

Apr 11, 2019

View reviewed changes

camierjsand others added5 commits

April 10, 2019 17:06

Move the 'destroyed' bool to an 'exists' section to allow pre-mm obje…

eb32fdf

…cts to be allocated.

Small changes

aa1053b

Merge pull request#849from mfem/okina-mm-object

3248f96

Okina mm object [okina-mm-object]

typos

a4bd80b

Updated CHANGELOG

3ac003d

tzanio approved these changes

Apr 11, 2019

View reviewed changes

v-dobrevand others added12 commits

April 10, 2019 21:34

Updates in the BilinearForm and related classes.

678759a

Small updates in ex1, ex1p, ex6, ex6p.

Initialize new data members in all ctors in class BilinearForm.

1c3e24e

Lambda const propagation problem.

7f88a3b

Working around nvcc lambda constant propagation problem.

b895ab8

In ParMesh::Rebalance, make sure the mesh nodes use a ParGridFunction.

14f063d

In ParMesh::Rebalance(), check for nodes before checking for

b750de4

parallel FE space.

Merge pull request#852from mfem/lambda-capture-problem

3386c41

Lambda const propagation [lambda-capture-problem]

Further small tweaks of the examples.

2d39e09

Memory manager maps erase fix

2a554ee

Merge pull request#851from mfem/okina-bilinearform-updates

a3aa7e3

Updates in the examples, class BilinearForm and related [okina-bilinearform-updates]

A few small updates in CHANGELOG and INSTALL.

e60be6f

Error message for ex1 -d cuda

de3c950

v-dobrev approved these changes

Apr 12, 2019

View reviewed changes

camierjs added the in-next label

Apr 12, 2019

camierjs merged commitff9819e intomaster

Apr 12, 2019

camierjs deleted the okina branch

April 12, 2019 01:45

Copy link

Member

dmed256 commentedApr 12, 2019

🎉

Copy link

Contributor

artv3 commentedApr 12, 2019

👍

Copy link

Member

tzanio commentedApr 12, 2019

Thank you@camierjs,@dmed256,@v-dobrev,@jakubcerveny,@jdahm,@pazner and@YohannDudouit for your contributions to this branch! 🚀

I know it was not easy, and I know that we are not done, but I think this is a really important step which would not have been possible without the hard and persistent work ofDavid,Veselin andJean-Sylvain. Thank you guys, you are truly amazing!