Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: kernels forrandom.vonmisses; part 2#681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
samir-nasibli wants to merge17 commits intomaster
base:master
Choose a base branch
Loading
fromsamir-nasibli/enh/vonmisses_random

Conversation

@samir-nasibli
Copy link

@samir-nasiblisamir-nasibli commentedApr 14, 2021
edited
Loading

Description

Enable computations on devices [CPU/GPU].

Tests

  • DPNP own:
tests/test_random.py::TestDistributionsVonmises::test_moments[large_kappa] PASSEDtests/test_random.py::TestDistributionsVonmises::test_moments[small_kappa] PASSEDtests/test_random.py::TestDistributionsVonmises::test_invalid_args PASSEDtests/test_random.py::TestDistributionsVonmises::test_seed[large_kappa] FAILEDtests/test_random.py::TestDistributionsVonmises::test_seed[small_kappa] FAILED
  • + numpy external

TODO

tests/test_random.py::TestDistributionsVonmises::test_seed failed on both devices. Bug.

@samir-nasiblisamir-nasibli added the in progressPlease do not merge. Work is in progress. labelApr 14, 2021
@samir-nasiblisamir-nasibli removed the in progressPlease do not merge. Work is in progress. labelApr 14, 2021
@samir-nasiblisamir-nasibli added the in progressPlease do not merge. Work is in progress. labelApr 17, 2021
Vvec =reinterpret_cast<_DataType*>(dpnp_memory_alloc_c(size *sizeof(_DataType)));

for (size_t n =0; n < size;)
n =reinterpret_cast<size_t*>(dpnp_memory_alloc_c(sizeof(size_t)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this is quite strange (Make scalar as a array with one element).
I think it should be a scalar, not an array.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

scalar and this is the same. You just can not pass n to sycl region in other way.

Y =0.0;
elseif (Y >1.0)
Y =1.0;
n[0] = n[0] +1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a mistake. This is parallel environment (SYCL kernel). Writing inside the kernel into same memory causehttps://en.wikipedia.org/wiki/Race_condition

V = Vvec[i];
sn2 = sn * sn;
cn2 = cn * cn;
auto paral_kernel_some = [&](cl::sycl::handler& cgh) {
Copy link
Contributor

@shssfshssfMay 13, 2021
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Kernel inside the loop with bigger trip count. It would be more efficient to parallelize (make kernel) the algorithm by bigger valuesize insteadsize-n. So, it will require a loop inside the kernel.
It is questionable what will be more performant

  1. loop with a kernels queue (data dependent)
  2. kernel with a loop

It is hard to predict it with no perf measurements but I would vote that parallelization with bigger number of threads should be better.

@shssf
Copy link
Contributor

@samir-nasibli Is this PR ready to review or still in development stage?

@samir-nasibli
Copy link
Author

@samir-nasibli Is this PR ready to review or still in development stage?

I will update this PR or move some part of this changes to another PR with closing this.

@samir-nasiblisamir-nasibli changed the titleENH: kernels forrandom.vonmissesENH: kernels forrandom.vonmisses; part 2Jul 12, 2021
Comment on lines 1314 to 1315
*n = *n +1;
result1[*n] =cl::sycl::asin(cl::sycl::sqrt(Y));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculaten (index of result) fromi.

samir-nasibli reacted with thumbs up emoji
Comment on lines 1400 to 1401
*n = *n +1;
result1[*n] =cl::sycl::acos(W);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculate n (index of result) from i.

samir-nasibli reacted with thumbs up emoji
* Fix race condition in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c* Rename arrays and change if condition from kernels in dpnp_rng_vonmises_large_kappa_c and dpnp_rng_vonmises_small_kappa_c* Add space* Fix indices in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c
@samir-nasibli
Copy link
Author

samir-nasibli commentedOct 7, 2021
edited
Loading

@LukichevaPolina
The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases.
We have to remove the possibilities for a potential race condition in the algorithm.

@densmirn
Copy link
Contributor

The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases.- ideas?
We have to remove the possibilities for a potential race condition in the algorithm.- done

@samir-nasibli
Copy link
Author

The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases.- ideas?
We have to remove the possibilities for a potential race condition in the algorithm.- done

  1. Any optimization with the use of additional memory can actually degrade (depending on the input data) and underestimate all the benefits from parallelism. Allocation/Deallocation/Working with additional memory is expensive.

  2. We have to remove the possibilities for a potential race condition in the algorithm.- done

using extra mem is brute force approach.

  1. - ideas?

We need to investigate it.

  1. I also recommend to use perf tests and profiler tools during optimization. Comparative analysis is important in such work.

@oleksandr-pavlyk
Copy link
Contributor

Stale PR?

@antonwolfyantonwolfy marked this pull request as draftFebruary 20, 2024 17:03
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@antonwolfyantonwolfyAwaiting requested review from antonwolfyantonwolfy will be requested when the pull request is marked ready for reviewantonwolfy is a code owner

@vlad-perevezentsevvlad-perevezentsevAwaiting requested review from vlad-perevezentsevvlad-perevezentsev will be requested when the pull request is marked ready for reviewvlad-perevezentsev is a code owner

@ndgrigorianndgrigorianAwaiting requested review from ndgrigorianndgrigorian will be requested when the pull request is marked ready for reviewndgrigorian is a code owner

3 more reviewers

@shssfshssfshssf left review comments

@Alexander-MakaryevAlexander-MakaryevAlexander-Makaryev left review comments

@densmirndensmirndensmirn approved these changes

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

@LukichevaPolinaLukichevaPolina

Labels

in progressPlease do not merge. Work is in progress.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@samir-nasibli@shssf@densmirn@oleksandr-pavlyk@Alexander-Makaryev@LukichevaPolina

[8]ページ先頭

©2009-2025 Movatter.jp