3

I have a Spark 2.1 job where I maintain multiple Dataset objects/RDD's that represent different queries over our underlying Hive/HDFS datastore. I've noticed that if I simply iterate over the List of Datasets, they execute one at a time. Each individual query operates in parallel, but I feel that we are not maximizing our resources by not running the different datasets in parallel as well.

There doesn't seem to be a lot out there regarding doing this, as most questions appear to be around parallelizing a single RDD or Dataset, not parallelizing multiple within the same job.

Is this inadvisable for some reason? Can I just use a executor service, thread pool, or futures to do this?

Thanks!

askedFeb 17, 2018 at 5:58
Brian's user avatar
3
  • you can find multiple questions and answers in stackoverflow itself for examplestackoverflow.com/questions/31757737/… andstackoverflow.com/questions/30214474/… and there are a lot of materials explaining how to do them in the web as wellCommentedFeb 17, 2018 at 12:14
  • yes you can do this, the easiest way is to use scala's parallel collectionCommentedFeb 17, 2018 at 20:39
  • 1
    @RameshMaharjan Upon review - yes those questions are relevant, but without understanding that is the question I should be asking, it's hard to find those answers :).CommentedFeb 18, 2018 at 2:50

1 Answer1

4

Yes you can use multithreading in the driver code, but normally this does not increase performance, unless your queries operate on very skewed data and/or cannot be parallelized well enough to fully utilize the resources.

You can do something like that:

val datasets : Seq[Dataset[_]] = ???datasets  .par // transform to parallel Seq  .foreach(ds => ds.write.saveAsTable(...)
answeredFeb 17, 2018 at 20:43
Raphael Roth's user avatar
Sign up to request clarification or add additional context in comments.

1 Comment

i have multiple dataframes that iam reading them from the sql server and how do i have it to run parallely to create parquet files for each DFs ?

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.