Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Best practices for vectorising GraphBLAS#298

Unanswered
rciric asked this question inQ&A
Discussion options

Hello all, and thanks for the incredible work.

The problem

I have several use cases for GraphBLAS that involve either multi-channel graphs or batched graphs, where the sparsity pattern is the same across a vector of observations. For instance, this occurs when we have multiple data channels associated with each edge in a fixed mesh or grid, or when we have a batch of graphs whose connectivity structure is identical but whose weights are different.

I’m not sure if there’s an “out-of-the-box” or recommended way of usingpython-graphblas for this class of problem, but the most direct approach (expectedly) fails with an error like'plus_times does not work with FP32[100]' (for batch dim 100 and float32 datatype). To address these use cases, I’m thus using the UDT utilities ofpython-graphblas to create vector datatypes and define operators over those types. TheSuiteSparse user guide seems to advise against this (the bit aboutgeneric kernels in Performance, section 14.2), but I’ve found we still get substantially improved performance relative to the most naive loop this way, provided that we can amortise the compilation cost (i.e., provided that the “batch” or “channel” dimension remains the same across subsequent calls).

The current solution (not great)

Directly registering new operators with theis_udt=True flag fails for my case, for several reasons related to C pointer datatypes. (I can provide more details if this problem is of interest to others. There are some basic notes in the linked gist below, but it's not a pretty piece of code.)

I’ve thus created some very hackish patches of_get_udt_wrapper and the_compile_udt methods of operator classesthat allow the vectorisation to work in principle (at least for the small subset of cases I’ve tested). But these implementations are very brittle — they’re defined in terms ofpython-graphblas internals rather than the public API, and I presume the internals could be subject to change at any time. (In fact, their location has changed since I installedpython-graphblas.)

Is there a better way of doing this that I’m missing, which doesn’t rely on patching the internals? Is this use case something thatpython-graphblas supports in principle? Any suggestions would be welcome. Thanks!

Asides:

  • I also found thatSelectOps aren’t compatible with my UDTs becauseSelectOp doesn’t implement a_compile_udt method. Not sure if this is intentional.
  • I was unable to get a customIndexUnaryOp working without overridingfind_opclass, since that function isn’t subclass-friendly. There were a other few places (e.g.,_build methods) where I needed to define patched classes because of strict type checks that didn’t accommodate subclasses.
You must be logged in to vote

Replies: 3 comments 1 reply

Comment options

Hi@rciric! Thanks for the questions.

Wow, I am beyond impressed. This is a challenging part of the codebase. What you want to do makes sense to me, and I suspect you're right that it would give good performance for UDTs of this size. Using UDTs like this would be the "GraphBLAS way" to do things IMHO.

I added UDTs topython-graphblas earlier this year in#177, and, as you have seen, the job isn't quite done. Some things don't work yet. At the time, I couldn't figure out how to have generic UDFs return a UDT, which, uh, might be important.

Thanks so much for the gist ❤️ . Looks great and surprisingly readable at a glance (beauty is in the eye of the beholder ;) ). I will take a closer look at your code ASAP, and then we can decide how we can get the functionality intopython-graphblas. I really, really want vectorized or "batched" UDFs over array UDTs to work inpython-graphblas. Are you willing to work together to get this in?

In my final comment#177 (comment), I call out the possibility of default operators to "just work" on array UDTs, such asplus(A | B) orplus_times(A @ B) orA + 1 where the dtypes are e.g.FP32[100]. If we can get vectorized UDFs to work, then I think it would be straightforward to make operators super-nice to use on UDTs.

Feel free to share any more details in this thread that you think may be useful. I'll read it and try to use it.

Do you know of any shortcomings to your current solution? Are you blocked by anything, and how urgent are things to get unblocked or improved? We try to be responsive to user needs.

But these implementations are very brittle — they’re defined in terms of python-graphblas internals rather than the public API, and I presume the internals could be subject to change at any time. (In fact, their location has changed since I installed python-graphblas.)

Sorry about that! We try to give a deprecation cycle of at least six months for breaking changes if possible. Moving things tographblas.core should actually help us improve stability in the future, and we won't move things again without very good reason.graphblas.core is now considered part of the public API for advanced users such as yourself.

Takeaways and actions

Thanks again@rciric, and welcome to our community 🎉

You must be logged in to vote
0 replies
Comment options

To clarify, this is an example workflow that I want to "just work" inpython-graphblas:

>>>importgraphblasasgb>>>A=gb.Matrix("int[3]",2,2)>>>A<< (1,2,3)>>>A"M_0"nvalsnrowsncolsdtypeformatgb.Matrix422INT64[3]fullr (iso)-----------------------------------------------------010  [1,2,3]  [1,2,3]1  [1,2,3]  [1,2,3]>>>B=gb.Matrix("int[3]",2,2)>>>B<< (2,3,4)>>>B"M_1"nvalsnrowsncolsdtypeformatgb.Matrix422INT64[3]fullr (iso)-----------------------------------------------------010  [2,3,4]  [2,3,4]1  [2,3,4]  [2,3,4]>>> (A @B).new()"M_2"nvalsnrowsncolsdtypeformatgb.Matrix422INT64[3]fullr (iso)-----------------------------------------------------010  [4,12,24]  [4,12,24]1  [4,12,24]  [4,12,24]

The final expression will be the most difficult to do, but this is my target end-state.

The earlier expressions need these PRs--#299 and#300 --which make using UDTs extra-nice imho.

You must be logged in to vote
0 replies
Comment options

Hello, and thanks for being so responsive!

It’s great to hear you’ve planned on supporting this functionality — I would definitely be happy to work with you all to get a solution (particularly for the last case above) into the code base. I can also open issues for some of the specific troublespots I mentioned. There are a couple of other projects I’m working on simultaneously, so I will try to work with you all toward a preliminary PR in the next couple weeks, but no promises yet☺️. And thanks for the invites — I’ll sign up for the Discord / Slack and see if I can perhaps make the community meeting as well!

IMO the biggest deficiencies in my implementation, relative to the above and your#177 comment, are (1) that I’ve only hard-coded solutions for one elementary operator of each type, and (2) that the implementation defines and registers all new ops as part of the creation of the vector/array UDT, which is done explicitly. I’ll say a little bit about each of these issues below, and my working model of how they might be fixed.

  1. On first pass, I suspect that the right thing to do here might be to retrieve a numpy primitive function corresponding to each operator fromnumpy namespaces likegb.binary.numpy._np (using_graphblas_to_numpy dictionaries, when those are implemented), and then pass the matched numpy function to a vectorisation map like the ones that I wrote in the gist, thereby returning a vector UDF. This can be done iteratively over all operators when a new vector/array UDT is instantiated, or better…

  2. If I'm understanding correctly, in line with your previous comment when introducing UDTs and your new post here, a better solution might define each vector UDF automatically at call time, as a typed implementation of an existing operator (rather than as a new operator). So in summary: whenplus is called on a previously unregistered vector datatype, rather than throw an error, the numpy primitive corresponding toplus is retrieved from thegb.binary.numpy._np submodule. Then, the vectorisation map for binary operators is called to transform the primitive and compile it, returning the vectorisedplus operator for that array type. This vectorised operator is then registered as a typed implementation ofplus, so that it can be dispatched automatically the next time the same vector/array datatype is encountered.

Or, perhaps there is a better way to do this (e.g., that wouldn't require a separate JIT for each new array shape)?

Preliminarily, I suspect another source of difficulty integrating my changes would likely stem from reconciling some of the patches with existing code. Off the top of my head, there are the patched op classes,Batched~, which I pretty much introduced to accommodate explicit return type specification for UDFs over vector UDTs. I made this decision because it would get me to a working solution fast, but there might be a better way that extends your Numba implementation of return type inference. That said, another option might be allowing an explicit return type for custom ops as a temporary workaround.

There’s definitely no urgency/blocking here — there are plenty of other features I’m also experimenting with for my use case, etc. (To summarise briefly without going off-topic, the application is something DL-adjacent but not quite DL — I can share more details at the meeting / on the chat, and maybe we can see whether / how it might feasibly fit into the GraphBLAS ecosystem.)

Thanks again, and happy to contribute however I can!

You must be logged in to vote
1 reply
@eriknw
Comment options

That's exactly right. We'll use_graphblas_to_numpy to map builtin operators to UDFs, and we should JIT compile the UDFs on-demand at call time. We should be pragmatic and try to getsomething working, then we can iterate and try to make it better (and I suspect we'll need to compile for each new UDT including shape). I think allowing explicit input and output dtypes is something we should add to our UDF operators anyway. If we need to require this for certain kinds of UDFs on UDTs, then so be it.

Thanks for your thoughtfulness and willingness to contribute. We're in no particular rush on this feature other than I'm eager for it :). I'm also curious to hear how your application might fit into the GraphBLAS ecosystem. If you want or need anything from us, please ask. Cheers!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
Q&A
Labels
None yet
2 participants
@rciric@eriknw

[8]ページ先頭

©2009-2025 Movatter.jp