Movatterモバイル変換

python-graphblas/python-graphblasPublic

NotificationsYou must be signed in to change notification settings
Fork15
Star142

Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics

python-graphblas.readthedocs.io/en/stable/

License

Apache-2.0 license

142 stars 15 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 931 Commits
.github		.github
binder		binder
docs		docs
graphblas		graphblas
notebooks		notebooks
scripts		scripts
.codecov.yml		.codecov.yml
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
.yamllint.yaml		.yamllint.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
conftest.py		conftest.py
dev-requirements.txt		dev-requirements.txt
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Repository files navigation

Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics.For algorithms, seegraphblas-algorithms.

Documentation:https://python-graphblas.readthedocs.io/
- FAQ:https://python-graphblas.readthedocs.io/en/stable/getting_started/faq.html
- GraphBLAS C API:https://graphblas.org/docs/GraphBLAS_API_C_v2.0.0.pdf
- SuiteSparse:GraphBLAS User Guide:https://github.com/DrTimothyAldenDavis/GraphBLAS/raw/stable/Doc/GraphBLAS_UserGuide.pdf
Source:https://github.com/python-graphblas/python-graphblas
Bug reports:https://github.com/python-graphblas/python-graphblas/issues
Github discussions:https://github.com/python-graphblas/python-graphblas/discussions
Weekly community call:python-graphblas#247 orhttps://scientific-python.org/calendars/
Chat via Discord:https://discord.com/invite/vur45CbwMz in the#graphblas channel

Install

Install the latest version of Python-graphblas via conda:

$ conda install -c conda-forge python-graphblas

or pip:

$ pip install 'python-graphblas[default]'

This will also install theSuiteSparse:GraphBLAS compiled C library.We currently support theGraphBLAS C API 2.0 specification.

Optional Dependencies

The following are not required by python-graphblas, but may be needed for certain functionality to work.

pandas – required for nicer__repr__;
matplotlib – required for basic plotting of graphs;
scipy – used inio module to read/writescipy.sparse format;
networkx – used inio module to interface withnetworkx graphs;
fast-matrix-market - for faster read/write of Matrix Market files withgb.io.mmread andgb.io.mmwrite.

Description

Currently works withSuiteSparse:GraphBLAS, but the goal is to make it work with all implementations of the GraphBLAS spec.

The approach taken with this library is to follow the C-API 2.0 specification as closely as possible while making improvementsallowed with the Python syntax. Because the spec always passes in the output object to be written to, we follow the same,which is very different from the way Python normally operates. In fact, many who are familiar with other Python datalibraries (numpy, pandas, etc) will find it strange to not create new objects for every call.

At the highest level, the goal is to separate output, mask, and accumulator on the left side of the assignmentoperator= and put the computation on the right side. Unfortunately, that approach doesn't always work very wellwith how Python handles assignment, so instead we (ab)use the left-shift<< notation to give the same flavor ofassignment. This opens up all kinds of nice possibilities.

This is an example of how the mapping works:

// C callGrB_Matrix_mxm(M,mask,GrB_PLUS_INT64,GrB_MIN_PLUS_INT64,A,B,NULL)

# Python callM(mask.V,accum=binary.plus)<<A.mxm(B,semiring.min_plus)

The expression on the rightA.mxm(B) creates a delayed object which does no computation. Once it is used in the<< expression withM, the whole thing is translated into the equivalent GraphBLAS call.

Delayed objects also have a.new() method which can be used to force computation and return a newobject. This is convenient and often appropriate, but will create many unnecessary objects if used in a loop. Italso loses the ability to perform accumulation with existing results. For best performance, following the standardGraphBLAS approach of (1) creating the object outside the loop and (2) using the object repeatedly within each loopis a much better approach, even if it doesn't feel very Pythonic.

Descriptor flags are set on the appropriate elements to keep logic close to what it affects. Here is the same callwith descriptor bits set.ttcsr indicates transpose the first and second matrices, complement the structure of the mask,and do a replacement on the output.

// C callGrB_Matrix_mxm(M,mask,GrB_PLUS_INT64,GrB_MIN_PLUS_INT64,A,B,desc.ttcsr)

# Python callM(~mask.S,accum=binary.plus,replace=True)<<A.T.mxm(B.T,semiring.min_plus)

The objects receiving the flag operations (A.T, ~mask, etc) are also delayed objects. They hold on to the state butdo no computation, allowing the correct descriptor bits to be set in a single GraphBLAS call.

If no mask or accumulator is used, the call looks like this:

M<<A.mxm(B,semiring.min_plus)

The use of<< to indicate updating is actually just syntactic sugar for a real.update() method. The aboveexpression could be written as:

M.update(A.mxm(B,semiring.min_plus))

Operations

M(mask,accum)<<A.mxm(B,semiring)# mxmw(mask,accum)<<A.mxv(v,semiring)# mxvw(mask,accum)<<v.vxm(B,semiring)# vxmM(mask,accum)<<A.ewise_add(B,binaryop)# eWiseAddM(mask,accum)<<A.ewise_mult(B,binaryop)# eWiseMultM(mask,accum)<<A.kronecker(B,binaryop)# kroneckerM(mask,accum)<<A.T# transpose

Extract

M(mask,accum)<<A[rows,cols]# rows and cols are a list or a slicew(mask,accum)<<A[rows,col_index]# extract columnw(mask,accum)<<A[row_index,cols]# extract rows=A[row_index,col_index].value# extract single element

Assign

M(mask,accum)[rows,cols]<<A# rows and cols are a list or a sliceM(mask,accum)[rows,col_index]<<v# assign columnM(mask,accum)[row_index,cols]<<v# assign rowM(mask,accum)[rows,cols]<<s# assign scalar to many elementsM[row_index,col_index]<<s# assign scalar to single element# (mask and accum not allowed)delM[row_index,col_index]# remove single element

Apply

M(mask,accum)<<A.apply(unaryop)M(mask,accum)<<A.apply(binaryop,left=s)# bind-firstM(mask,accum)<<A.apply(binaryop,right=s)# bind-second

Reduce

v(mask,accum)<<A.reduce_rowwise(op)# reduce row-wisev(mask,accum)<<A.reduce_columnwise(op)# reduce column-wises(accum)<<A.reduce_scalar(op)s(accum)<<v.reduce(op)

Creating new Vectors / Matrices

A=Matrix.new(dtype,num_rows,num_cols)# new_typeB=A.dup()# dupA=Matrix.from_coo([row_indices], [col_indices], [values])# build

New from delayed

Delayed objects can be used to create a new object using.new() method

C=A.mxm(B,semiring).new()

Properties

size=v.size# sizenrows=M.nrows# nrowsncols=M.ncols# ncolsnvals=M.nvals# nvalsrindices,cindices,vals=M.to_coo()# extractTuples

Initialization

There is a mechanism to initializegraphblas with a context prior to use. This allows for setting the backend touse as well as the blocking/non-blocking mode. If the context is not initialized, a default initialization willbe performed automatically.

importgraphblasasgb# Context initialization must happen before any other importsgb.init("suitesparse",blocking=True)# Now we can import other items from graphblasfromgraphblasimportbinary,semiringfromgraphblasimportMatrix,Vector,Scalar

Performant User Defined Functions

Python-graphblas requiresnumba which enables compiling user-defined Python functions to native C for use in GraphBLAS.

Example customized UnaryOp:

fromgraphblasimportunarydefforce_odd_func(x):ifx%2==0:returnx+1returnxunary.register_new("force_odd",force_odd_func)v=Vector.from_coo([0,1,3], [1,2,3])w=v.apply(unary.force_odd).new()w# indexes=[0, 1, 3], values=[1, 3, 3]

Similar methods exist for BinaryOp, Monoid, and Semiring.

Relation to other network analysis libraries

Python-graphblas aims to provide an efficient and consistent expressionof graph operations using linear algebra. This allows the development ofhigh-performance implementations of existing and new graph algorithms(also seegraphblas-algorithms).

While end-to-end analysis can be done usingpython-graphblas, usersmight find that other libraries in the Python ecosystem provide a moreconvenient high-level interface for data pre-processing and transformation(e.g.pandas,scipy.sparse), visualization (e.g.networkx,igraph),interactive exploration and analysis (e.g.networkx,igraph) or foralgorithms that are not (yet) implemented ingraphblas-algorithms (e.g.networkx,igraph,scipy.sparse.csgraph). To facilitate communication withother libraries,graphblas.io contains multiple connectors, see thefollowing section.

Import/Export connectors to the Python ecosystem

graphblas.io contains functions for converting to and from:

importgraphblasasgb# scipy.sparse matricesA=gb.io.from_scipy_sparse(m)m=gb.io.to_scipy_sparse(m,format="csr")# networkx graphsA=gb.io.from_networkx(g)g=gb.io.to_networkx(A)# numpy arrays can use `from_dense` and `to_dense` on Vector and Matrixv=gb.Vector.from_dense(m)m=v.to_dense()A=gb.Matrix.from_dense(m,missing_value=0)m=A.to_dense(fill_value=0)