Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)

License

NotificationsYou must be signed in to change notification settings

cloudera/impyla

Repository files navigation

Python client for HiveServer2 implementations (e.g., Impala, Hive) fordistributed query engines.

For higher-level Impala functionality, including a Pandas-like interface overdistributed data sets, see theIbis project.

Features

  • HiveServer2 compliant; works with Impala and Hive, including nested data

  • FullyDB API 2.0 (PEP 249)-compliant Python client (similar tosqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

  • Works with Kerberos, LDAP, SSL

  • SQLAlchemy connector

  • Converter topandasDataFrame, allowing easy integration into thePython data stack (includingscikit-learn andmatplotlib); but see theIbis project for a richerexperience

Dependencies

Required:

  • Python 2.7+ or 3.5+

  • six,bitarray

  • thrift==0.16.0

  • thrift_sasl==0.4.3

Optional:

  • kerberos>=1.3.0 for Kerberos over HTTP support. This also requires Kerberos librariesto be installed on your system - seeSystem Kerberos

  • pandas for conversion toDataFrame objects; but see theIbis project instead

  • sqlalchemy for the SQLAlchemy engine

  • pytest andrequests for running tests;unittest2 for testing on Python 2.6

System Kerberos

Different systems require different packages to be installed to enable Kerberos support inImpyla. Some examples of how to install the packages on different distributions follow.

Ubuntu:

apt-get install libkrb5-dev krb5-user

RHEL/CentOS:

yum install krb5-libs krb5-devel krb5-server krb5-workstation

Installation

Install the latest release withpip:

pip install impyla

For the latest (dev) version, install directly from the repo:

pip install git+https://github.com/cloudera/impyla.git

or clone the repo:

git clone https://github.com/cloudera/impyla.gitcd impylapython setup.py install

Running the tests

impyla uses thepytest toolchain, and depends on the followingenvironment variables:

export IMPYLA_TEST_HOST=your.impalad.comexport IMPYLA_TEST_PORT=21050export IMPYLA_TEST_AUTH_MECH=NOSASL

To run the maximal set of tests, run

cd path/to/impylapy.test --connect impala

Leave out the--connect option to skip tests for DB API compliance.

To test impyla with different Python versionstox can be used.The commands below will run all impyla tests with all supported andinstalled Python versions:

cd path/to/impylatox

To filter environments / tests use-e andpytest arguments after--:

tox -e py310 -- -ktest_utf8_strings

Usage

Impyla implements thePython DB API v2.0 (PEP 249) database interface(refer to it for API details):

fromimpala.dbapiimportconnectconn=connect(host='my.host.com',port=21050)# auth_mechanism='PLAIN' for unsecured Hive connection, see function doccursor=conn.cursor()cursor.execute('SELECT * FROM mytable LIMIT 100')printcursor.description# prints the result set's schemaresults=cursor.fetchall()

TheCursor object also exposes the iterator interface, which is buffered(controlled bycursor.arraysize):

cursor.execute('SELECT * FROM mytable LIMIT 100')forrowincursor:print(row)

Furthermore theCursor object returns you information about the columnsreturned in the query. This is useful to export your data as a csv file.

importcsvcursor.execute('SELECT * FROM mytable LIMIT 100')columns= [datum[0]fordatumincursor.description]targetfile='/tmp/foo.csv'withopen(targetfile,'w',newline='')asoutcsv:writer=csv.writer(outcsv,delimiter=',',quotechar='"',quoting=csv.QUOTE_ALL,lineterminator='\n')writer.writerow(columns)forrowincursor:writer.writerow(row)

You can also get back a pandas DataFrame object

fromimpala.utilimportas_pandasdf=as_pandas(cur)# carry df through scikit-learn, for example

How do I contribute code?

You need to first sign and return anICLAandCCLAbefore we can accept and redistribute your contribution. Once these are submitted you arefree to start contributing to impyla. Submit these toCLA@cloudera.com.

Find

We use Github issues to track bugs for this project. Find an issue that you would like towork on (or file one if you have discovered a new issue!). If no-one is working on it,assign it to yourself only if you intend to work on it shortly.

It's a good idea to discuss your intended approach on the issue. You are much morelikely to have your patch reviewed and committed if you've already got buy-in from theimpyla community before you start.

Fix

Now start coding! As you are writing your patch, please keep the following things in mind:

First, please include tests with your patch. If your patch adds a feature or fixes a bugand does not include tests, it will generally not be accepted. If you are unsure how towrite tests for a particular component, please ask on the issue for guidance.

Second, please keep your patch narrowly targeted to the problem described by the issue.It's better for everyone if we maintain discipline about the scope of each patch. Ingeneral, if you find a bug while working on a specific feature, file a issue for the bug,check if you can assign it to yourself and fix it independently of the feature. This helpsus to differentiate between bug fixes and features and allows us to build stablemaintenance releases.

Finally, please write a good, clear commit message, with a short, descriptive title anda message that is exactly long enough to explain what the problem was, and how it wasfixed.

Please create a pull request on github with your patch.

About

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp