Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PostgresML is Moving to Rust for our 2.0 Release - PostgresML#326

Discussion options

PostgresML is Moving to Rust for our 2.0 Release - PostgresML

In PostgresML 2.0, we'd like to address runtime speed, memory consumption and the overall reliability we've seen for machine learning deployments running at scale, in addition to simplifying the workflow for building and deploying models.

https://postgresml.org/blog/postgresml-is-moving-to-rust-for-our-2.0-release/

You must be logged in to vote

Replies: 5 comments 6 replies

Comment options

The rust example could userayon to parallelize

You must be logged in to vote
2 replies
@montanalow
Comment options

This is a good point, although I think we'd like to leave most parallelization to Postgres workers themselves. They can map/reduce large query results sets like this, but not the individual computations like we could in Rust. I have glanced athttps://github.com/AdamNiederer/faster though for SIMD speedups which I think will take us up to or past parity with BLAS implementations in most cases. Do you have experience on that?

@CLEckhardt
Comment options

I have glanced athttps://github.com/AdamNiederer/faster though for SIMD speedups which I think will take us up to or past parity with BLAS implementations in most cases. Do you have experience on that?

FWIW, I've been looking at SIMD in Rust... To me, implementing those operations directly seems appealing. You might find this helpful:https://medium.com/@Razican/learning-simd-with-rust-by-finding-planets-b85ccfb724c3

Comment options

I’ve not used that crate but I have tried out [numeric-array](https://crates.io/crates/numeric-array) before. It would probably be worth making some comparative examples and benchmarking just to try different crates out and also to create a mental model. SIMD would definitely speed this up, and on top of that if this can be rewritten to use fixed length arrays or slices that would be way better than using Vecs here. When using fixed length arrays the length can be padded with 0/1 and then perhaps this could be optimized to a single SIMD instruction per call.
Sent from my iPhone
On Sep 19, 2022, at 7:33 PM, Montana Low ***@***.***> wrote:  This is a good point, although I think we'd like to leave most parallelization to Postgres workers themselves. They can map/reduce large query results sets like this, but not the individual computations like we could in Rust. I have glanced athttps://github.com/AdamNiederer/faster though for SIMD speedups which I think will take us up to or past parity with BLAS implementations in most cases. Do you have experience on that? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
You must be logged in to vote
0 replies
Comment options

Pretty strange to see comparison with sklearn. AFAIK nobody is using it in production - especially for inference, because everyone knows that it is slow

You must be logged in to vote
0 replies
Comment options

Very interesting to see the numpy implementation so much lower than everything else. Were you able to do any investigation into why this is that you can elaborate on?

You must be logged in to vote
0 replies
Comment options

hi there - great writeup! (and it inspired me to give pgx a test, as I'm keen on the idea of rust inside postgres)

How did you run your dot products per second timings?

I ran your test_*.sql scripts with a basic pgx setup on ubuntu 20.04, and dot_product_rust was slower than dot_product_sql (as timed with the \timing setting inside psql).

PG14
Time: 200.828 ms -- dot_product_rust
Time: 180.507 ms -- dot_product_sql

PG13
Time: 197.446 ms -- dot_product_rust
Time: 184.035 ms -- dot_product_sql

Admittedly these were only single runs, but I was expecting a clear performance boost for rust...

You must be logged in to vote
4 replies
@montanalow
Comment options

We always warmed up the runs first, and then took the average of three.
Python warmup for our larger code base was always a killer as mentioned in
the article. That also means though that your hardware may be storage bound
rather than compute bound if you’re seeing similar results. Do you happen
to be on spinny disks or network attached storage?

@selwyn-mccrackengiscus
Comment options

The initial results were with a cloud VM with an SSD for the OS disk (where the results were run). I ran each several times and got similar results, then achieved similar timings on my M1 Mac locally (also with an SSD).

I also tried to activate pg_prewarm with the default values from the pg docs in ~/.pgx/data-14/postgresql.conf , but pgx didn't like that, and pre-warming is something I'm not familiar with, so I stopped there.

If you have a recommended, repeatable, method of prewarming with pgx, then I'd be happy to try that.

@montanalow
Comment options

My next thought would be to make sure you are running ‘cargo pgx run --release’ for tests to make sure optimizations are enabled. I’ll try to put together a long form tomorrow if that doesn’t resolve repeatability.

@selwyn-mccrackengiscus
Comment options

That did it! Time: 16.727 ms for rust (and consistently thereabouts over successive runs)

Thank you for the help. Looking forward to making use of rust with PG :-)

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
None yet
6 participants
@montanalow@selwyn-mccracken@cyrusmsk@Bajix@CLEckhardt@maxmindlin

[8]ページ先頭

©2009-2025 Movatter.jp