luxq/spark_hbasePublic

forked fromGenTang/spark_hbase

NotificationsYou must be signed in to change notification settings
Fork0
Star0

An example in Scala of reading data saved in hbase by Spark and an example of converter for python

License

Apache-2.0 license

0 stars 10 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
project		project
sbt		sbt
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
hbase_input.py		hbase_input.py

Repository files navigation

spark_hbase

Spark has their own example about integrating HBase and Spark in scalaHBaseTest.scala and python converterHBaseConverters.scala.

However, the python converterHBaseResultToStringConverter inHBaseConverters.scala return only the value of first column in the result. AndHBaseTest.scala stops just at returningorg.apache.hadoop.hbase.client.Result and doing .count() call.

Here we provide a new example in Scala about transferring data saved in hbase intoString by Spark and a new example of python converter.

The example in scalaHBaseInput.scala transfers the data saved in hbase intoRDD[String] which containscolumnFamily, qualifier, timestamp, type, value.

The example of converter for pythonpythonConverters.scala transfer the data saved in hbase into string which contains the same information as the example above. We can useast package to easily transfer this string to dictionary

How to run

Make sure that you well set upgit
Download this application by

 $ git clone https://github.com/GenTang/spark_hbase.git

Build the assembly by using SBTassembly

$<the path to spark_hbase>/sbt/sbt clean assembly

Run example python scripthbase_input.py which use pythonConverterImmutableBytesWritableToStringConverter andHBaseResultToStringConverter to convert the data in hbase to dictionary
- If you are usingSPARK_CLASSPATH:
  1. Addexport SPARK_CLASSPATH=$SPARK_CLASSPATH":<the path to hbase>/lib/*:<the path to spark_hbase>/target/scala-2.10/spark_hbase-assembly-1.0.jar to./conf/spark-env.sh.
  2. Launch the script by
```
$ ./bin/spark-submit<the path to hbase_input.py> \<host><table><column>
```
- You can also usespark.executor.extraClassPath and--driver-class-path (recommended):
  1. Addspark.executor.extraClassPath <the path to hbase>/lib/* tospark-defaults.conf.
  2. Launch the script by
```
 $ ./bin/spark-submit \    --driver-class-path<the path to spark_hbase>/target/scala-2.10/spark_hbase-assembly-1.0.jar \<the path to hbase_input.py> \<host><table><column>
```

Run example scala scriptHBaseInput.scala

If you are usingSPARK_CLASSPATH:
1. Addexport SPARK_CLASSPATH=$SPARK_CLASSPATH":<the path to hbase>/lib/* to./conf/spark-env.sh.
2. Launch the script by
```
$ ./bin/spark-submit \   --class examples.HBaseInput \<the path to spark_hbase>/target/scala-2.10/spark_hbase-assembly-1.0.jar \<host><table>
```

You can also usespark.executor.extraClassPath and--driver-class-path (recommended):

The same configuration as above
Launch the script by

$ ./bin/spark-submit \   --driver-class-path<the path to hbase>/lib/*: \   --class examples.HBaseInput \<the path to spark_hbase>/target/scala-2.10/spark_hbase-assembly-1.0.jar \<host><table>

Example of results

Assume that you have already some data in hbase as follow:

hbase(main):028:0> scan "test"ROW                          COLUMN+CELL r1                          column=c1:a, timestamp=1420329575846, value=a1 r1                          column=c1:b, timestamp=1420329640962, value=b1 r2                          column=c1:a, timestamp=1420329683843, value=a2 r3                          column=c1:,  timestamp=1420329810504, value=3

By launching$ ./bin/spark-submit --driver-class-path <the path to spark_hbase>/target/scala-2.10/spark_hbase-assembly-1.0.jar <the path to hbase_input.py> localhost test c1, you will get

 (u'r1', {'columnFamliy': 'c1', 'timestamp': '1420329575846', 'type': 'Put', 'qualifier': 'a', 'value': 'a1'})  (u'r1', {'columnFamliy': 'c1', 'timestamp': '1420329640962', 'type': 'Put', 'qualifier': 'b', 'value': 'b1'})  (u'r2', {'columnFamliy': 'c1', 'timestamp': '1420329683843', 'type': 'Put', 'qualifier': 'a', 'value': 'a2'})  (u'r3', {'columnFamliy': 'c1', 'timestamp': '1420329810504', 'type': 'Put', 'qualifier': '', 'value': '3'})

About

An example in Scala of reading data saved in hbase by Spark and an example of converter for python

Releases

No releases published

Packages

No packages published

Languages

Shell46.8%
Scala42.9%
Python10.3%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

spark_hbase

How to run

Example of results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

luxq/spark_hbase

Folders and files

Latest commit

History

Repository files navigation

spark_hbase

How to run

Example of results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages