Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Is the pgml.digits dataset storing images in psql?#488

Answeredbymontanalow
MatsMoll asked this question inQ&A
Discussion options

I see one of the examples you present in the docs is a digits dataset.

SELECTpgml.train('My First PostgresML Project',   task=>'regression',  relation_name=>'pgml.digits',  y_column_name=>'target',  algorithm=>'xgboost' );

Now I am guessing this is the classical MNIST dataset.
However, I have always thought that storing images in psql is a bad practice. So I am wondering if this is the case, or if this is a totally separate dataset.
And if it is a image dataset, is it stored in a "smarter" way the usual?

Btw, super interesting project you have going here! Keep it going 🚀

You must be logged in to vote

In terms of image format, ML algos require images to be either 2D arrays for black and white or 3D arrays for color. The MNIST data is stored as an 8x8 pixel black and white image with 16 shades of gray, i.e. a PostgresSMALLINT[][] .

https://github.com/postgresml/postgresml/blob/master/pgml-extension/src/orm/dataset.rs#L371

You definitelycan store images and other binary data in a database, but I think the question isshould you? A CDN fronting something like an S3 bucket is a better way to store and serve image content for a web application, rather than directly out of a database. Here are a few reasons you should consider vertically sharding your binary data (image, audio, large text.…

Replies: 1 comment

Comment options

In terms of image format, ML algos require images to be either 2D arrays for black and white or 3D arrays for color. The MNIST data is stored as an 8x8 pixel black and white image with 16 shades of gray, i.e. a PostgresSMALLINT[][] .

https://github.com/postgresml/postgresml/blob/master/pgml-extension/src/orm/dataset.rs#L371

You definitelycan store images and other binary data in a database, but I think the question isshould you? A CDN fronting something like an S3 bucket is a better way to store and serve image content for a web application, rather than directly out of a database. Here are a few reasons you should consider vertically sharding your binary data (image, audio, large text...) into a different storage and distribution mechanism other than your primary database.

  • Binary data is often large compared to numeric or text data
  • Binary data is not relational like numeric or text data
  • Binary data frequently wants bespoke and cpu intensive functionality not found in DBs (image resizing, audio sampling, video transcoding)

These same reasons may not hold up for an ML application with a dedicated ML database.

  • If your dataset is larger than RAM, regardless of the types of features, you're going to need to figure out scale up front, but many ML datasets fit completely in memory, in fact they often have to for the algorithms to be reasonably performant. You may still want to have separate databases for separate use cases and workloads.
  • ML algorithms like conv nets change what we'd consider binary data (images/audio) into something that is structured and meaningful, and that you may want to mix with other non binary features in a relational way.
  • PostgresML offers (some of) that bespoke functionality you may want, i.e. doing ML, conveniently in your DB.
You must be logged in to vote
0 replies
Answer selected bymontanalow
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
Q&A
Labels
None yet
2 participants
@MatsMoll@montanalow

[8]ページ先頭

©2009-2025 Movatter.jp