How to parallelize an algorithm in java using Spark?

Question 1

import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.commons.lang.time.StopWatch;import java.util.ArrayList;import java.util.List;public class Prime {    //Method to calculate and count the prime numbers    public List<Integer> countPrime(int n){        List<Integer> primes = new ArrayList<>();        for (int i = 2; i < n; i++){            boolean  isPrime = true;            //check if the number is prime or not            for (int j = 2; j < i; j++){                if (i % j == 0){                    isPrime = false;                    break;  // exit the inner for loop                }            }            //add the primes into the List            if (isPrime){                primes.add(i);            }        }        return primes;    }    //Main method to run the program    public static void main(String[]args){        StopWatch watch = new StopWatch();        watch.start();        //creating javaSparkContext object        SparkConf conf = new SparkConf().setAppName("haha").setMaster("local[2]");        JavaSparkContext sc = new JavaSparkContext(conf);        //new prime object        Prime prime = new Prime();        //prime.countPrime(1000000);        //parallelize the collection        JavaRDD<Integer> rdd = sc.parallelize(prime.countPrime(1000000),12);        long count = rdd.filter(e  -> e == 2|| e % 2 != 0).count();        //Stopping the execution time and printing the results        watch.stop();        System.out.println("Total time took to run the process is " + watch);        System.out.println("The number of prime between 0 to 1000000  is " + count);        sc.stop();    }}

Hi there , i have this following code which parallelize an algorithm. The algorithm counts the number of prime in a given range. But the code is only parallelizing the list of primes but not the process itself. How can modify the code to parallelize process of finding the primes?

Question 2

It's an order of operations issue - you're runningprime.CountPrime before you've created your Spark RDD. Spark runs operations in parallel that are defined within the RDD object'smap,reduce,filter, etc. operations. You need to rethink your approach:

Usesc.range(1, 1000000, 1, 12) to create an RDD of all integers from 1 to 1,000,000.
Create anisPrime(int n) method to evaluate if a given integer is prime.
filter your RDD on the condition of yourisPrime method (this is the part that will execute in parallel).
count the filtered RDD.

Charlie Flowers 1,3807 silver badges12 bronze badges · Accepted Answer · 2021-01-17 14:49:53Z

It's an order of operations issue - you're runningprime.CountPrime before you've created your Spark RDD. Spark runs operations in parallel that are defined within the RDD object'smap,reduce,filter, etc. operations. You need to rethink your approach:

Usesc.range(1, 1000000, 1, 12) to create an RDD of all integers from 1 to 1,000,000.
Create anisPrime(int n) method to evaluate if a given integer is prime.
filter your RDD on the condition of yourisPrime method (this is the part that will execute in parallel).
count the filtered RDD.

Movatterモバイル変換

Collectives™ on Stack Overflow

How to parallelize an algorithm in java using Spark?

1 Answer1

Comments

Your Answer

Sign up orlog in

Post as a guest

Related

Hot Network Questions

Subscribe to RSS