cloudera/impylaPublic

NotificationsYou must be signed in to change notification settings
Fork250
Star737

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)

License

Apache-2.0 license

737 stars 250 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 481 Commits
bin		bin
dev		dev
impala		impala
io/manylinux		io/manylinux
jenkins		jenkins
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.landscape.yaml		.landscape.yaml
CHANGELOG.md		CHANGELOG.md
DEVELOP.md		DEVELOP.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
build-dists.sh		build-dists.sh
ez_setup.py		ez_setup.py
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

impyla

Python client for HiveServer2 implementations (e.g., Impala, Hive) fordistributed query engines.

For higher-level Impala functionality, including a Pandas-like interface overdistributed data sets, see theIbis project.

Features

HiveServer2 compliant; works with Impala and Hive, including nested data
FullyDB API 2.0 (PEP 249)-compliant Python client (similar tosqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.
Works with Kerberos, LDAP, SSL
SQLAlchemy connector
Converter topandasDataFrame, allowing easy integration into thePython data stack (includingscikit-learn andmatplotlib); but see theIbis project for a richerexperience

Dependencies

Required:

Python 2.7+ or 3.5+
six,bitarray
thrift==0.16.0
thrift_sasl==0.4.3

Optional:

kerberos>=1.3.0 for Kerberos over HTTP support. This also requires Kerberos librariesto be installed on your system - seeSystem Kerberos
- On Windows operating systems an alternative is 'winkerberos' which can be installed with pip
pandas for conversion toDataFrame objects; but see theIbis project instead
sqlalchemy for the SQLAlchemy engine
pytest andrequests for running tests;unittest2 for testing on Python 2.6

System Kerberos

Different systems require different packages to be installed to enable Kerberos support inImpyla. Some examples of how to install the packages on different distributions follow.

Ubuntu:

apt-get install libkrb5-dev krb5-user

RHEL/CentOS:

yum install krb5-libs krb5-devel krb5-server krb5-workstation

Installation

Install the latest release withpip:

pip install impyla

For the latest (dev) version, install directly from the repo:

pip install git+https://github.com/cloudera/impyla.git

or clone the repo:

git clone https://github.com/cloudera/impyla.gitcd impylapython setup.py install

Running the tests

impyla uses thepytest toolchain, and depends on the followingenvironment variables:

export IMPYLA_TEST_HOST=your.impalad.comexport IMPYLA_TEST_PORT=21050export IMPYLA_TEST_AUTH_MECH=NOSASL

To run the maximal set of tests, run

cd path/to/impylapy.test --connect impala

Leave out the--connect option to skip tests for DB API compliance.

To test impyla with different Python versionstox can be used.The commands below will run all impyla tests with all supported andinstalled Python versions:

cd path/to/impylatox

To filter environments / tests use-e andpytest arguments after--:

tox -e py310 -- -ktest_utf8_strings

Usage

Impyla implements thePython DB API v2.0 (PEP 249) database interface(refer to it for API details):

fromimpala.dbapiimportconnectconn=connect(host='my.host.com',port=21050)# auth_mechanism='PLAIN' for unsecured Hive connection, see function doccursor=conn.cursor()cursor.execute('SELECT * FROM mytable LIMIT 100')printcursor.description# prints the result set's schemaresults=cursor.fetchall()

TheCursor object also exposes the iterator interface, which is buffered(controlled bycursor.arraysize):

cursor.execute('SELECT * FROM mytable LIMIT 100')forrowincursor:print(row)

Furthermore theCursor object returns you information about the columnsreturned in the query. This is useful to export your data as a csv file.

importcsvcursor.execute('SELECT * FROM mytable LIMIT 100')columns= [datum[0]fordatumincursor.description]targetfile='/tmp/foo.csv'withopen(targetfile,'w',newline='')asoutcsv:writer=csv.writer(outcsv,delimiter=',',quotechar='"',quoting=csv.QUOTE_ALL,lineterminator='\n')writer.writerow(columns)forrowincursor:writer.writerow(row)

You can also get back a pandas DataFrame object

fromimpala.utilimportas_pandasdf=as_pandas(cur)# carry df through scikit-learn, for example

How do I contribute code?

You need to first sign and return anICLAandCCLAbefore we can accept and redistribute your contribution. Once these are submitted you arefree to start contributing to impyla. Submit these toCLA@cloudera.com.

Find

We use Github issues to track bugs for this project. Find an issue that you would like towork on (or file one if you have discovered a new issue!). If no-one is working on it,assign it to yourself only if you intend to work on it shortly.

It's a good idea to discuss your intended approach on the issue. You are much morelikely to have your patch reviewed and committed if you've already got buy-in from theimpyla community before you start.

Fix

Now start coding! As you are writing your patch, please keep the following things in mind:

First, please include tests with your patch. If your patch adds a feature or fixes a bugand does not include tests, it will generally not be accepted. If you are unsure how towrite tests for a particular component, please ask on the issue for guidance.

Second, please keep your patch narrowly targeted to the problem described by the issue.It's better for everyone if we maintain discipline about the scope of each patch. Ingeneral, if you find a bug while working on a specific feature, file a issue for the bug,check if you can assign it to yourself and fix it independently of the feature. This helpsus to differentiate between bug fixes and features and allows us to build stablemaintenance releases.

Finally, please write a good, clear commit message, with a short, descriptive title anda message that is exactly long enough to explain what the problem was, and how it wasfixed.

Please create a pull request on github with your patch.

About

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

impyla

Features

Dependencies

System Kerberos

Installation

Running the tests

Usage

How do I contribute code?

Find

Fix

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors69

Uh oh!

Languages

Movatterモバイル変換

License

cloudera/impyla

Folders and files

Latest commit

History

Repository files navigation

impyla

Features

Dependencies

System Kerberos

Installation

Running the tests

Usage

How do I contribute code?

Find

Fix

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors69

Uh oh!

Languages

Packages