Commit4f0a014

Pedro Bernardo

committed

Added pairRdd/sort/*.py

1 parent35b653c commit4f0a014Copy full SHA for 4f0a014

File tree

+59

-0

lines changed

+59

-0

lines changed

Lines changed: 23 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,23 @@`
	`1`	`+frompairRdd.aggregation.reducebykey.housePrice.AvgCountimportAvgCount`
	`2`	`+frompysparkimportSparkContext`
	`3`	`+`
	`4`	`+`
	`5`	`+if__name__=="__main__":`
	`6`	`+`
	`7`	`+sc=SparkContext("local","averageHousePriceSolution")`
	`8`	`+sc.setLogLevel("ERROR")`
	`9`	`+`
	`10`	`+lines=sc.textFile("in/RealEstate.csv")`
	`11`	`+cleanedLines=lines.filter(lambdaline:"Bedrooms"notinline)`
	`12`	`+housePricePairRdd=cleanedLines.map(lambdaline: \`
	`13`	`+ ((int(float(line.split(",")[3]))),AvgCount(1,float(line.split(",")[2]))))`
	`14`	`+`
	`15`	`+housePriceTotal=housePricePairRdd.reduceByKey(lambdax,y: \`
	`16`	`+AvgCount(x.count+y.count,x.total+y.total))`
	`17`	`+`
	`18`	`+housePriceAvg=housePriceTotal.mapValues(lambdaavgCount:avgCount.total/avgCount.count)`
	`19`	`+`
	`20`	`+sortedHousePriceAvg=housePriceAvg.sortByKey()`
	`21`	`+`
	`22`	`+forbedrooms,avgPriceinsortedHousePriceAvg.collect():`
	`23`	`+print("{} : {}".format(bedrooms,avgPrice))`

Lines changed: 16 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,16 @@`
	`1`	`+frompysparkimportSparkContext`
	`2`	`+`
	`3`	`+if__name__=="__main__":`
	`4`	`+`
	`5`	`+'''`
	`6`	`+ Create a Spark program to read the an article from in/word_count.text,`
	`7`	`+ output the number of occurrence of each word in descending order.`
	`8`	`+`
	`9`	`+ Sample output:`
	`10`	`+`
	`11`	`+ apple : 200`
	`12`	`+ shoes : 193`
	`13`	`+ bag : 176`
	`14`	`+ ...`
	`15`	`+`
	`16`	`+ '''`

Lines changed: 20 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,20 @@`
	`1`	`+frompysparkimportSparkContext`
	`2`	`+`
	`3`	`+if__name__=="__main__":`
	`4`	`+`
	`5`	`+sc=SparkContext("local","wordCounts")`
	`6`	`+sc.setLogLevel("ERROR")`
	`7`	`+lines=sc.textFile("in/word_count.text")`
	`8`	`+wordRdd=lines.flatMap(lambdaline:line.split(" "))`
	`9`	`+`
	`10`	`+wordPairRdd=wordRdd.map(lambdaword: (word,1))`
	`11`	`+wordToCountPairs=wordPairRdd.reduceByKey(lambdax,y:x+y)`
	`12`	`+`
	`13`	`+countToWordParis=wordToCountPairs.map(lambdawordToCount: (wordToCount[1],wordToCount[0]))`
	`14`	`+`
	`15`	`+sortedCountToWordParis=countToWordParis.sortByKey(ascending=False)`
	`16`	`+`
	`17`	`+sortedWordToCountPairs=sortedCountToWordParis.map(lambdacountToWord: (countToWord[1],countToWord[0]))`
	`18`	`+`
	`19`	`+forword,countinsortedWordToCountPairs.collect():`
	`20`	`+print("{} : {}".format(word,count))`

Comments

(0)