cloudspannerecosystem/spanner-analyticsPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star2

License

Apache-2.0 license

2 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
samples		samples
spanner_analytics		spanner_analytics
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package_test.py		package_test.py
pyproject.toml		pyproject.toml

Repository files navigation

spanner-analytics

This package aims to facilitate common data-analytic operations in Pythonusing data from Cloud Spanner. This includes integrations with JupyterNotebooks.

Using

Installation

Install from PyPI:

pip install spanner-analytics

Developing

This package can be used from Python code. For example:

fromspanner_analyticsimportDatabasedb=Database.connect('<project>','<instance>','<database>')dataframe=db.execute_sql("SELECT * FROM my_table")

The package also offers a "magic" command that can be used within a JupyterNotebook. For example:

%load_ext spanner_analytics.magic

%%spanner --project <project> --instance <instance> --database <database>SELECT * FROM my_table

Queries are executed using Cloud SpannerDataBoost.DataBoost allocates dedicated compute resources to execute your query. So itwon't compete for resources with other workloads on your production database.But you will be billed for compute resources consumed by a query. See theDataBoost Pricingpage for more details.

Root-partitionable queries

Queries currently must beroot-partitionable. This means that the query canbe logically decomposed into independent operations that operate on collocateddata, with no data shuffling and no final aggregation required. The query planmust specifically contain a DistributedUnion operator as its topmost operator,and otherwise follow the documentation onreading data in parallel.This enables Spanner's client to connect in parallel to multiple Spannernodes to fetch data with maximum performance.

For example,

SELECT a + b FROM t

is root-partitionable because it operates on each row independently. Similarly,

SELECT a + b FROM tWHERE c < 5

is root-partitionable because, while some nodes may not have any data that'srelevant to the query, that determination can be made independently.

SELECT sum(a) FROM t   -- Nope!

is NOT root-partitionable: While each node can scan in parallel, the queryrequires bringing data back to a single node to compute the final sum. This canbe implemented by reading all data fromt and performing the sum using Pandas.

SELECT * FROM t1 JOIN t1 ON t1.x = t2.y

MAY be root-partitionable IFt1 andt2 areINTERLEAVED together.Interleaved tables are stored together, so joins between them can beperformed locally. Non-interleaved tables require shuffling each affectedrecord from one table over to the node that stores the corresponding recordfrom the other table. Because of this requirement to send data between nodes,non-interleaved joins are generally not root-partitionable.

Building

This package uses thesetuptools andbuild packages.cd into therepository's top-level directory and run:

python3 -m build

This will produce a.whl file underdist/. For more information aboutPython's build process, see Python's packagingdocumentation.Also seepackage_test.py.

Testing changes

This project usespytest to test its code. To executeall tests,cd to the repository's top-level directory and run:

pytest .

The end-to-end tests in this suite depend on Google'sgcloud command-linetool, and will be skipped if it's not available. The tool is used to launcha local Spanner Emulator process, to test that this code can correctly connectto a Spanner database and handle results that it returns.gcloud can beinstalled followingthese directions.

About

No description, website, or topics provided.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

spanner-analytics

Using

Installation

Developing

Root-partitionable queries

Building

Testing changes

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

cloudspannerecosystem/spanner-analytics

Folders and files

Latest commit

History

Repository files navigation

spanner-analytics

Using

Installation

Developing

Root-partitionable queries

Building

Testing changes

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages