- Notifications
You must be signed in to change notification settings - Fork328
add a new example#14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
from pgml.exceptions import PgMLException | ||
from pgml.sql import q | ||
def flatten(S): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Are you sure this won't blow the stack on a large dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It’s called per row, and I haven’t seen datasets with more than 4D arrays.
SELECT models.* | ||
FROM pgml.models | ||
WHERE project_id = {q(project.id)} | ||
ORDER by models.metrics->>{q(metric)} DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why notflatten normalize the structure into the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Relevant metrics are different depending on the objective. We could have another join table to hold just metrics per model, but that seems like overkill just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I was just thinking of making them nullable and only fill in the relevant columns for the model being trained.
--- | ||
--- Predict | ||
--- | ||
CREATE OR REPLACE FUNCTION pgml.predict(project_name TEXT,VARIADICfeaturesDOUBLE PRECISION[]) | ||
CREATE OR REPLACE FUNCTION pgml.predict(project_name TEXT, featuresNUMERIC[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm wondering about this because variadic allows us to pass columns as arguments, e.g.:
SELECTpgml.predict('Red Wine Quality',quality_wine_red.acidity,quality_wine_red.color, ...)FROM quality_wine_redWHERE ...
I see this to be the more likely use case than passing in some raw numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That’s true, but you can always put those columns in an array just like this, and having the features as a single param will allow us to extend the API with additional Paramus in the future if we need.
Bump the version in |
pgml is not compatible with plpython, if using both pgml and plpython in thesame session, postgresql will crash.minimum reproducible code:```sqlSELECT pgml.embed('intfloat/e5-small', 'hi mom');create or replace function pyudf()returns int as$$return 0$$ language 'plpython3u';```the call stack:``` Stack trace of thread 161970: #0 0x00007efc1429edb8 PyImport_Import (libpython3.9.so.1.0 + 0x9edb8)postgresml#1 0x00007efc1429f125 PyImport_ImportModule (libpython3.9.so.1.0 + 0x9f125)postgresml#2 0x00007efb04b0f496 n/a (plpython3.so + 0x10496)postgresml#3 0x00007efb04b1039d plpython3_validator (plpython3.so + 0x1139d)postgresml#4 0x0000559d0cdbc5c2 OidFunctionCall1Coll (postgres + 0x6465c2)postgresml#5 0x0000559d0c9d68bb ProcedureCreate (postgres + 0x2608bb)postgresml#6 0x0000559d0ca5030c CreateFunction (postgres + 0x2da30c)postgresml#7 0x0000559d0ce1c730 n/a (postgres + 0x6a6730)postgresml#8 0x0000559d0cc5a030 standard_ProcessUtility (postgres + 0x4e4030)postgresml#9 0x0000559d0cc545ed n/a (postgres + 0x4de5ed)postgresml#10 0x0000559d0cc546e7 n/a (postgres + 0x4de6e7)postgresml#11 0x0000559d0cc54beb PortalRun (postgres + 0x4debeb)postgresml#12 0x0000559d0cc55249 n/a (postgres + 0x4df249)postgresml#13 0x0000559d0cc576f0 PostgresMain (postgres + 0x4e16f0)postgresml#14 0x0000559d0cbc3e9c n/a (postgres + 0x44de9c)postgresml#15 0x0000559d0cbc50aa PostmasterMain (postgres + 0x44f0aa)postgresml#16 0x0000559d0c8ce7d2 main (postgres + 0x1587d2)postgresml#17 0x00007efc18427cd0 n/a (libc.so.6 + 0x27cd0)postgresml#18 0x00007efc18427d8a __libc_start_main (libc.so.6 + 0x27d8a)postgresml#19 0x0000559d0c8cee15 _start (postgres + 0x158e15)```this is because PostgreSQL is using dlopen(RTLD_GLOBAL). this will parse someof symbols into the previous opened .so file, but the others will use arelative offset in pgml.so, and will cause a null-pointer crash.this commit hide all symbols except the UDF symbols (ends with `_wrapper`) andthe magic symbols (`_PG_init` `Pg_magic_func`). so dlopen(RTLD_GLOBAL) willparse the symbols to the correct position.
No description provided.