NotificationsYou must be signed in to change notification settings
Fork23
Star118

ENH: kernels for`random.vonmisses`; part 2#681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

samir-nasibli wants to merge17 commits intomaster

base:master

Choose a base branch

fromsamir-nasibli/enh/vonmisses_random

Draft

ENH: kernels for`random.vonmisses`; part 2#681

samir-nasibli wants to merge17 commits intomasterfromsamir-nasibli/enh/vonmisses_random

Conversation

Copy link

samir-nasibli commentedApr 14, 2021•
edited
Loading

Description

Enable computations on devices [CPU/GPU].

Tests

DPNP own:

tests/test_random.py::TestDistributionsVonmises::test_moments[large_kappa] PASSEDtests/test_random.py::TestDistributionsVonmises::test_moments[small_kappa] PASSEDtests/test_random.py::TestDistributionsVonmises::test_invalid_args PASSEDtests/test_random.py::TestDistributionsVonmises::test_seed[large_kappa] FAILEDtests/test_random.py::TestDistributionsVonmises::test_seed[small_kappa] FAILED

+ numpy external

TODO

tests/test_random.py::TestDistributionsVonmises::test_seed failed on both devices. Bug.

ENH: kernels for random.vonmisses

63eeab1

samir-nasibli added the in progressPlease do not merge. Work is in progress. label

Apr 14, 2021

samir-nasibli added2 commits

April 14, 2021 15:18

update

5e6086c

refactoring

4268517

samir-nasibli removed the in progressPlease do not merge. Work is in progress. label

Apr 14, 2021

samir-nasibli requested a review fromshssf

April 14, 2021 20:27

samir-nasibli added the in progressPlease do not merge. Work is in progress. label

Apr 17, 2021

samir-nasibli added4 commits

April 21, 2021 16:26

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

6f77dc0

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

b5f539a

disabled tests on CPU

e9c17c7

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

75dc985

shssf reviewed

May 13, 2021

View reviewed changes

dpnp/backend/kernels/dpnp_krnl_random.cpp Outdated

		Vvec =reinterpret_cast<_DataType>(dpnp_memory_alloc_c(size sizeof(_DataType)));

		for (size_t n =0; n < size;)
		n =reinterpret_cast<size_t*>(dpnp_memory_alloc_c(sizeof(size_t)));

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this is quite strange (Make scalar as a array with one element).
I think it should be a scalar, not an array.

Copy link

Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

scalar and this is the same. You just can not pass n to sycl region in other way.

shssf reviewed

May 13, 2021

View reviewed changes

dpnp/backend/kernels/dpnp_krnl_random.cpp Outdated

		Y =0.0;
		elseif (Y >1.0)
		Y =1.0;
		n[0] = n[0] +1;

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a mistake. This is parallel environment (SYCL kernel). Writing inside the kernel into same memory causehttps://en.wikipedia.org/wiki/Race_condition

shssf reviewed

May 13, 2021

View reviewed changes

dpnp/backend/kernels/dpnp_krnl_random.cpp

		V = Vvec[i];
		sn2 = sn * sn;
		cn2 = cn * cn;
		auto paral_kernel_some = [&](cl::sycl::handler& cgh) {

Copy link

Contributor

shssfMay 13, 2021•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Kernel inside the loop with bigger trip count. It would be more efficient to parallelize (make kernel) the algorithm by bigger valuesize insteadsize-n. So, it will require a loop inside the kernel.
It is questionable what will be more performant

loop with a kernels queue (data dependent)
kernel with a loop

It is hard to predict it with no perf measurements but I would vote that parallelization with bigger number of threads should be better.

samir-nasibli added3 commits

May 24, 2021 16:11

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

7221851

tmp solution

df3160f

revert last changes on dpnp_krnl_random.cpp

0e743d2

Copy link

Contributor

shssf commentedJun 22, 2021

@samir-nasibli Is this PR ready to review or still in development stage?

Copy link

Author

samir-nasibli commentedJun 22, 2021

@samir-nasibli Is this PR ready to review or still in development stage?

I will update this PR or move some part of this changes to another PR with closing this.

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

ff8de8e

samir-nasibli changed the title~~ENH: kernels forrandom.vonmisses~~ENH: kernels forrandom.vonmisses; part 2

Jul 12, 2021

samir-nasibli mentioned this pull request

Jul 12, 2021

ENH: kernels forrandom.vonmisses; part 1#779

Merged

samir-nasibliand others added4 commits

July 13, 2021 16:23

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

1492555

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

233cd59

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

30637c6

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

3222bc5

Alexander-Makaryev reviewed

Sep 29, 2021

View reviewed changes

dpnp/backend/kernels/dpnp_krnl_random.cpp Outdated

Comment on lines 1314 to 1315

		n = n +1;
		result1[*n] =cl::sycl::asin(cl::sycl::sqrt(Y));

Copy link

Contributor

Alexander-MakaryevSep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculaten (index of result) fromi.

Alexander-Makaryev reviewed

Sep 29, 2021

View reviewed changes

dpnp/backend/kernels/dpnp_krnl_random.cpp Outdated

Comment on lines 1400 to 1401

		n = n +1;
		result1[*n] =cl::sycl::acos(W);

Copy link

Contributor

Alexander-MakaryevSep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculate n (index of result) from i.

Alexander-Makaryev assignedLukichevaPolina

Oct 1, 2021

Merge branch 'master' into samir-nasibli/enh/vonmisses_random

2c3eeb2

Plukiche/vonmisses random (#998)

b2b3c42

* Fix race condition in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c* Rename arrays and change if condition from kernels in dpnp_rng_vonmises_large_kappa_c and dpnp_rng_vonmises_small_kappa_c* Add space* Fix indices in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c

densmirn approved these changes

Oct 7, 2021

View reviewed changes

Copy link

Author

samir-nasibli commentedOct 7, 2021•
edited
Loading

@LukichevaPolina
The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases.
We have to remove the possibilities for a potential race condition in the algorithm.

Copy link

Contributor

densmirn commentedOct 7, 2021

The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases.- ideas?
We have to remove the possibilities for a potential race condition in the algorithm.- done

Copy link

Author

samir-nasibli commentedOct 7, 2021

The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases.- ideas?
We have to remove the possibilities for a potential race condition in the algorithm.- done

Any optimization with the use of additional memory can actually degrade (depending on the input data) and underestimate all the benefits from parallelism. Allocation/Deallocation/Working with additional memory is expensive.
We have to remove the possibilities for a potential race condition in the algorithm.- done

using extra mem is brute force approach.