spdcoder/python-spark-tutorialPublic

forked fromjleetutorial/python-spark-tutorial

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Commitf637b18

Pedro Bernardo

committed

Setting log level to ERROR in scripts that prints to the standard output

1 parent131e3cf commitf637b18Copy full SHA for f637b18

File tree

8 files changed

+23

-18

lines changed

rdd
- WordCount.py
- collect
  - CollectExample.py
- count
  - CountExample.py
- nasaApacheWebLogs
  - UnionLogSolutions.py
- reduce
  - ReduceExample.py
- sumOfNumbers
  - SumOfNumbersProblem.py
  - SumOfNumbersSolution.py
- take
  - TakeExample.py

8 files changed

+23

-18

lines changed

`‎rdd/WordCount.py`

Lines changed: 7 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,9 +2,10 @@`
`2`	`2`	`frompysparkimportSparkContext`
`3`	`3`
`4`	`4`	`if__name__=="__main__":`
`5`		`-sc=SparkContext("local","word count")`
`6`		`-lines=sc.textFile("in/word_count.text")`
`7`		`-words=lines.flatMap(lambdaline:line.split(" "))`
`8`		`-wordCounts=words.countByValue()`
`9`		`-forword,countinwordCounts.items():`
`10`		`-print(word,count)`
	`5`	`+sc=SparkContext("local","word count")`
	`6`	`+sc.setLogLevel("ERROR")`
	`7`	`+lines=sc.textFile("in/word_count.text")`
	`8`	`+words=lines.flatMap(lambdaline:line.split(" "))`
	`9`	`+wordCounts=words.countByValue()`
	`10`	`+forword,countinwordCounts.items():`
	`11`	`+print(word,count)`

`‎rdd/collect/CollectExample.py`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,6 +2,7 @@`
`2`	`2`
`3`	`3`	`if__name__=="__main__":`
`4`	`4`	`sc=SparkContext("local","collect")`
	`5`	`+sc.setLogLevel("ERROR")`
`5`	`6`	`inputWords= ["spark","hadoop","spark","hive","pig","cassandra","hadoop"]`
`6`	`7`	`wordRdd=sc.parallelize(inputWords)`
`7`	`8`	`words=wordRdd.collect()`

`‎rdd/count/CountExample.py`

Lines changed: 2 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -2,10 +2,11 @@`
`2`	`2`
`3`	`3`	`if__name__=="__main__":`
`4`	`4`	`sc=SparkContext("local","count")`
	`5`	`+sc.setLogLevel("ERROR")`
`5`	`6`	`inputWords= ["spark","hadoop","spark","hive","pig","cassandra","hadoop"]`
`6`	`7`	`wordRdd=sc.parallelize(inputWords)`
`7`	`8`	`print("Count: {}".format(wordRdd.count()))`
`8`	`9`	`worldCountByValue=wordRdd.countByValue()`
`9`	`10`	`print("CountByValue: ")`
`10`	`11`	`forword,countinworldCountByValue.items():`
`11`		`-print("{} : {}".format(word,count))`
	`12`	`+print("{} : {}".format(word,count))`

`‎rdd/nasaApacheWebLogs/UnionLogSolutions.py`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`frompysparkimportSparkContext`
`2`	`2`
`3`		`-defisNotHeader(line:str):`
	`3`	`+defisNotHeader(line:str):`
`4`	`4`	`returnnot (line.startswith("host")and"bytes"inline)`
`5`	`5`
`6`	`6`	`if__name__=="__main__":`

`‎rdd/reduce/ReduceExample.py`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,6 +2,7 @@`
`2`	`2`
`3`	`3`	`if__name__=="__main__":`
`4`	`4`	`sc=SparkContext("local","reduce")`
	`5`	`+sc.setLogLevel("ERROR")`
`5`	`6`	`inputIntegers= [1,2,3,4,5]`
`6`	`7`	`integerRdd=sc.parallelize(inputIntegers)`
`7`	`8`	`product=integerRdd.reduce(lambdax,y:x*y)`

`‎rdd/sumOfNumbers/SumOfNumbersProblem.py`

Lines changed: 2 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,11 +1,10 @@`
`1`		`-`
`2`	`1`	`importsys`
`3`	`2`	`frompysparkimportSparkContext`
`4`	`3`
`5`	`4`	`if__name__=="__main__":`
`6`	`5`
`7`		`-'''`
	`6`	`+'''`
`8`	`7`	`Create a Spark program to read the first 100 prime numbers from in/prime_nums.text,`
`9`	`8`	`print the sum of those numbers to console.`
`10`	`9`	`Each row of the input file contains 10 prime numbers separated by spaces.`
`11`		`- '''`
	`10`	`+ '''`

`‎rdd/sumOfNumbers/SumOfNumbersSolution.py`

Lines changed: 2 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -3,9 +3,10 @@`
`3`	`3`
`4`	`4`	`if__name__=="__main__":`
`5`	`5`	`sc=SparkContext("local","primeNumbers")`
	`6`	`+sc.setLogLevel("ERROR")`
`6`	`7`	`lines=sc.textFile("in/prime_nums.text")`
`7`	`8`	`numbers=lines.flatMap(lambdaline:line.split("\t"))`
`8`	`9`	`validNumbers=numbers.filter(lambdanumber:number)`
`9`	`10`	`intNumbers=validNumbers.map(lambdanumber:int(number))`
`10`	`11`	`print("Sum is: ")`
`11`		`-print(intNumbers.reduce(lambdax,y:x+y))`
	`12`	`+print(intNumbers.reduce(lambdax,y:x+y))`

`‎rdd/take/TakeExample.py`

Lines changed: 7 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,9 +2,10 @@`
`2`	`2`	`frompysparkimportSparkContext`
`3`	`3`
`4`	`4`	`if__name__=="__main__":`
`5`		`-sc=SparkContext("local","take")`
`6`		`-inputWords= ["spark","hadoop","spark","hive","pig","cassandra","hadoop"]`
`7`		`-wordRdd=sc.parallelize(inputWords)`
`8`		`-words=wordRdd.take(3)`
`9`		`-forwordinwords:`
`10`		`-print(word)`
	`5`	`+sc=SparkContext("local","take")`
	`6`	`+sc.setLogLevel("ERROR")`
	`7`	`+inputWords= ["spark","hadoop","spark","hive","pig","cassandra","hadoop"]`
	`8`	`+wordRdd=sc.parallelize(inputWords)`
	`9`	`+words=wordRdd.take(3)`
	`10`	`+forwordinwords:`
	`11`	`+print(word)`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitf637b18

File tree

8 files changed

8 files changed

`‎rdd/WordCount.py`

`‎rdd/collect/CollectExample.py`

`‎rdd/count/CountExample.py`

`‎rdd/nasaApacheWebLogs/UnionLogSolutions.py`

`‎rdd/reduce/ReduceExample.py`

`‎rdd/sumOfNumbers/SumOfNumbersProblem.py`

`‎rdd/sumOfNumbers/SumOfNumbersSolution.py`

`‎rdd/take/TakeExample.py`

0 commit comments