Howeng98/Odd-Even-SortPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star1

NTHU 2021 Parallel Programming HW1 Odd Even Sort implemented with MPI to Speed up The Performance of MergeSort and Quicksort

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Script		Script
img		img
sample		sample
testcases		testcases
.gitignore		.gitignore
Hw1_Description.pdf		Hw1_Description.pdf
Hw1_Optimization_Method.pdf		Hw1_Optimization_Method.pdf
Parallel_Programming_HW1_Report.pdf		Parallel_Programming_HW1_Report.pdf
README.md		README.md

Repository files navigation

2021 Parallel Programming HW1 Odd-Even-Sort

1. Implementation

In this homework, the target goal is implement odd even sort algorithm through MPI interface. The whole process can be seperate as 6 parts:

allocate memory
read input_data
MPI process communication
local sorting
reduce
write output_data

Load Balancing

In the load balancing part, since there always havenumber of data(n) greater thannumber of proc, thus most of the data need handle:

n / size

But some processes need handle:

n / size + 1 if rank < n % size

Data Preprocessing

I will prepare three memory space (data, temp, buf) for each rank.data is used t store the current loadinput_data orsorted_data from which received from another rank,data will be output as output value from each rank after all processes had done assigned task.temp is used for the temp memory space inmerge-sort phase.buf is the receive buffer when using MPI_Recv to receive sorted data from other rank.

Odd-Even Sort Phase

Let's explain how the odd-even phase work. In the even stage, there is only the rank whichrank%2==0orn&1 is true will recv the data from the odd rank (even rank-1) through MPI_Send and MPI_Recv. When they recv data from another rank, they save them in buf memory first, and try to merge the localdata and recvbuf intemp, and send the rest data back after sorting. Odd stage also do the samething, for more details please check my report documents.

Reduce

The even and odd stage are executed in while loops, it will be break when theMPI_Reduce result ofMPI_LOR is false. Because when if there is no anyswap happened,has_swap will be false, which is determined in themerge-sort part.

2. Optimization

Use & instead of %

Use bit operator instead of arithmetic operator is better for compiler.

Avoid to use ternary operator (e.g. a ? b : c)

Ternary operator also is a heavy cost operation, try to avoid use it.

Reassign pointer target instead of copy the array value when doing Array memory swap

This is important, relocate array pointer instead of copying array elements one by one, save a lot of time.

++i > i++ > i+=1 > i=i+1

A code writing hint.

Pick a suitable sorting algorithm

In this repo case, for local sort before start odd-even-sort, speadsort > qsort(C Library) > self-defined quicksort. It seems like qsort is optimized in the library, hence becareful when you choose sorting algorithm.

Dynamic allocating, load balancing

Using dynamic way to allocate task job to each MPI rank.

Avoid function memory allocation
Avoid logical, duplicate redudant computing, use constant value

3. Experiments

System Spec

4 Nodes, 12 CPUS ( if 1 cpu for 1 thread, then maximum is 48 threads in a same time )
OS: Arch Linux
Compilers: GCC 10.2.0, Clang 11.0.1
MPI: Intel MPI Library, Version 2019 Update 8
Scheduler: Slurm 20.02.5
Network: Infiniband

Strong Scalability (Speedup Factor)

Single Node	Multiple Node

Time Profile!

Single Node	Multiple Node

Discussion

The experiments result mentioned above, is executed on testcase34, which contain 546869888 data. The reason why I pick this testcase is because it have many data, so when we testing on it, it's executing time will make our experiments result image more clearly and easy to distinguish different through settings.

From my experiments result, obviously it shows that whether single node or multiple node, it's speedup performance is increasing slowly. But sometimes the performance will drop explicitly, especially the single node proc12 and multiple node on node4. I think the reason cause it is because the last node only can communicate with its left node, so its proc performance won't increase efficiently and let the sort step cost a lot of time.

4. Conclusion

From this homework, I had learn how to use MPI to let different processes to communication with each other. And the bottleneck in this repo I think is the Communication time, and the MPI_Wtime() functions, in the future work we can try to put more effort on these parts to make the program more efficiency.

5. References

About

NTHU 2021 Parallel Programming HW1 Odd Even Sort implemented with MPI to Speed up The Performance of MergeSort and Quicksort

Releases

No releases published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

2021 Parallel Programming HW1 Odd-Even-Sort