0
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.commons.lang.time.StopWatch;import java.util.ArrayList;import java.util.List;public class Prime {    //Method to calculate and count the prime numbers    public List<Integer> countPrime(int n){        List<Integer> primes = new ArrayList<>();        for (int i = 2; i < n; i++){            boolean  isPrime = true;            //check if the number is prime or not            for (int j = 2; j < i; j++){                if (i % j == 0){                    isPrime = false;                    break;  // exit the inner for loop                }            }            //add the primes into the List            if (isPrime){                primes.add(i);            }        }        return primes;    }    //Main method to run the program    public static void main(String[]args){        StopWatch watch = new StopWatch();        watch.start();        //creating javaSparkContext object        SparkConf conf = new SparkConf().setAppName("haha").setMaster("local[2]");        JavaSparkContext sc = new JavaSparkContext(conf);        //new prime object        Prime prime = new Prime();        //prime.countPrime(1000000);        //parallelize the collection        JavaRDD<Integer> rdd = sc.parallelize(prime.countPrime(1000000),12);        long count = rdd.filter(e  -> e == 2|| e % 2 != 0).count();        //Stopping the execution time and printing the results        watch.stop();        System.out.println("Total time took to run the process is " + watch);        System.out.println("The number of prime between 0 to 1000000  is " + count);        sc.stop();    }}

Hi there , i have this following code which parallelize an algorithm. The algorithm counts the number of prime in a given range. But the code is only parallelizing the list of primes but not the process itself. How can modify the code to parallelize process of finding the primes?

askedJan 13, 2021 at 15:28
Rickshaw Tomtom's user avatar

1 Answer1

1

It's an order of operations issue - you're runningprime.CountPrime before you've created your Spark RDD. Spark runs operations in parallel that are defined within the RDD object'smap,reduce,filter, etc. operations. You need to rethink your approach:

  1. Usesc.range(1, 1000000, 1, 12) to create an RDD of all integers from 1 to 1,000,000.

  2. Create anisPrime(int n) method to evaluate if a given integer is prime.

  3. filter your RDD on the condition of yourisPrime method (this is the part that will execute in parallel).

  4. count the filtered RDD.

Dharman's user avatar
Dharman
33.9k27 gold badges106 silver badges157 bronze badges
answeredJan 14, 2021 at 16:36
Charlie Flowers's user avatar
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.