postgresml/postgresmlPublic

NotificationsYou must be signed in to change notification settings
Fork352
Star6.6k

PostgresML is Moving to Rust for our 2.0 Release - PostgresML#326

giscus[bot]bot announced inAnnouncements

giscus[bot]bot

Sep 19, 2022

· 5 comments· 6 replies

Return to top

Discussion options

giscus[bot]
botSep 19, 2022

PostgresML is Moving to Rust for our 2.0 Release - PostgresML

In PostgresML 2.0, we'd like to address runtime speed, memory consumption and the overall reliability we've seen for machine learning deployments running at scale, in addition to simplifying the workflow for building and deploying models.

https://postgresml.org/blog/postgresml-is-moving-to-rust-for-our-2.0-release/

You must be logged in to vote

Replies: 5 comments 6 replies

Comment options

Bajix
Sep 19, 2022 — withgiscus

The rust example could userayon to parallelize

You must be logged in to vote

2 replies

Comment options

montanalow Sep 20, 2022
Maintainer

This is a good point, although I think we'd like to leave most parallelization to Postgres workers themselves. They can map/reduce large query results sets like this, but not the individual computations like we could in Rust. I have glanced athttps://github.com/AdamNiederer/faster though for SIMD speedups which I think will take us up to or past parity with BLAS implementations in most cases. Do you have experience on that?

Comment options

CLEckhardt Sep 27, 2022

I have glanced athttps://github.com/AdamNiederer/faster though for SIMD speedups which I think will take us up to or past parity with BLAS implementations in most cases. Do you have experience on that?

FWIW, I've been looking at SIMD in Rust... To me, implementing those operations directly seems appealing. You might find this helpful:https://medium.com/@Razican/learning-simd-with-rust-by-finding-planets-b85ccfb724c3

Comment options

Bajix
Sep 20, 2022

I’ve not used that crate but I have tried out [numeric-array](https://crates.io/crates/numeric-array) before. It would probably be worth making some comparative examples and benchmarking just to try different crates out and also to create a mental model. SIMD would definitely speed this up, and on top of that if this can be rewritten to use fixed length arrays or slices that would be way better than using Vecs here. When using fixed length arrays the length can be padded with 0/1 and then perhaps this could be optimized to a single SIMD instruction per call.

…

Sent from my iPhone

On Sep 19, 2022, at 7:33 PM, Montana Low ***@***.***> wrote: This is a good point, although I think we'd like to leave most parallelization to Postgres workers themselves. They can map/reduce large query results sets like this, but not the individual computations like we could in Rust. I have glanced athttps://github.com/AdamNiederer/faster though for SIMD speedups which I think will take us up to or past parity with BLAS implementations in most cases. Do you have experience on that? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

You must be logged in to vote

0 replies

Comment options

cyrusmsk
Sep 20, 2022 — withgiscus

Pretty strange to see comparison with sklearn. AFAIK nobody is using it in production - especially for inference, because everyone knows that it is slow

You must be logged in to vote

0 replies

Comment options

maxmindlin
Sep 20, 2022 — withgiscus

Very interesting to see the numpy implementation so much lower than everything else. Were you able to do any investigation into why this is that you can elaborate on?

You must be logged in to vote

0 replies

Comment options

selwyn-mccracken
Sep 22, 2022 — withgiscus

hi there - great writeup! (and it inspired me to give pgx a test, as I'm keen on the idea of rust inside postgres)

How did you run your dot products per second timings?

I ran your test_*.sql scripts with a basic pgx setup on ubuntu 20.04, and dot_product_rust was slower than dot_product_sql (as timed with the \timing setting inside psql).

PG14
Time: 200.828 ms -- dot_product_rust
Time: 180.507 ms -- dot_product_sql

PG13
Time: 197.446 ms -- dot_product_rust
Time: 184.035 ms -- dot_product_sql

Admittedly these were only single runs, but I was expecting a clear performance boost for rust...

You must be logged in to vote

4 replies

Comment options

montanalow Sep 22, 2022
Maintainer

We always warmed up the runs first, and then took the average of three.
Python warmup for our larger code base was always a killer as mentioned in
the article. That also means though that your hardware may be storage bound
rather than compute bound if you’re seeing similar results. Do you happen
to be on spinny disks or network attached storage?

Comment options

selwyn-mccracken Sep 26, 2022 — withgiscus

The initial results were with a cloud VM with an SSD for the OS disk (where the results were run). I ran each several times and got similar results, then achieved similar timings on my M1 Mac locally (also with an SSD).

I also tried to activate pg_prewarm with the default values from the pg docs in ~/.pgx/data-14/postgresql.conf , but pgx didn't like that, and pre-warming is something I'm not familiar with, so I stopped there.

If you have a recommended, repeatable, method of prewarming with pgx, then I'd be happy to try that.

Comment options

montanalow Sep 26, 2022
Maintainer

My next thought would be to make sure you are running ‘cargo pgx run --release’ for tests to make sure optimizations are enabled. I’ll try to put together a long form tomorrow if that doesn’t resolve repeatability.

Comment options

selwyn-mccracken Sep 26, 2022 — withgiscus

That did it! Time: 16.727 ms for rust (and consistently thereabouts over successive runs)

Thank you for the help. Looking forward to making use of rust with PG :-)

Movatterモバイル変換

PostgresML is Moving to Rust for our 2.0 Release - PostgresML#326

Uh oh!

giscus[bot]botSep 19, 2022

PostgresML is Moving to Rust for our 2.0 Release - PostgresML

Replies: 5 comments· 6 replies

Uh oh!

BajixSep 19, 2022 — withgiscus

Uh oh!

montanalowSep 20, 2022 Maintainer

Uh oh!

Uh oh!

CLEckhardtSep 27, 2022

Uh oh!

BajixSep 20, 2022

Uh oh!

cyrusmskSep 20, 2022 — withgiscus

Uh oh!

maxmindlinSep 20, 2022 — withgiscus

Uh oh!

selwyn-mccrackenSep 22, 2022 — withgiscus

Uh oh!

montanalowSep 22, 2022 Maintainer

Uh oh!

selwyn-mccrackenSep 26, 2022 — withgiscus

Uh oh!

montanalowSep 26, 2022 Maintainer

Uh oh!

selwyn-mccrackenSep 26, 2022 — withgiscus

Uh oh!

giscus[bot]
botSep 19, 2022

Replies: 5 comments 6 replies

Bajix
Sep 19, 2022 — withgiscus

montanalow Sep 20, 2022
Maintainer

CLEckhardt Sep 27, 2022

Bajix
Sep 20, 2022

cyrusmsk
Sep 20, 2022 — withgiscus

maxmindlin
Sep 20, 2022 — withgiscus

selwyn-mccracken
Sep 22, 2022 — withgiscus

montanalow Sep 22, 2022
Maintainer

selwyn-mccracken Sep 26, 2022 — withgiscus

montanalow Sep 26, 2022
Maintainer

selwyn-mccracken Sep 26, 2022 — withgiscus