Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Added embed2 which returns a table structure#1186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
ns1000 wants to merge4 commits intopostgresml:master
base:master
Choose a base branch
Loading
fromns1000:master
Open
Changes fromall commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletionspgml-extension/src/api.rs
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -558,6 +558,26 @@ pub fn embed_batch(
}
}

#[cfg(all(feature = "python", not(feature = "use_as_lib")))]
#[pg_extern(immutable, parallel_safe, name = "embed2")]
pub fn embed_batch2<'a>(
Copy link
Contributor

@montanalowmontanalowNov 25, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My rough thoughts, without running the code on some examples.

I think we should name this SQLembed_3 and Rustembed_batch_3 with the goal of establishing this as the 3.0embed API, as well as a pattern for releasing 3.0 APIs early as we developing them in an alpha state (with potentially breaking changes, where we completely drop them in 3.1 in favor of the newly established default behavior).

Your example convinces me that batch APIs should return a table, but I think that table's rows should beJSONB with {id, embedding} keys (at least), unless there is a significant performance implication on that front. My thinking is that embedding models are getting more complicated and now some take JSON rather than TEXT forinputs including aprompt. It would be nice to have an optionalid in the input JSON, and if it's not present, then just return the entire input JSON as theid, which acts just like your TEXT as the key.

Final thought is thatkwargs is JSONB currently, which works well with the underlying Python dependencies, but I'd like to structure it as much as possible for final 3.0. We should find a way to flag this obviously as an alpha API, that will be broken and eventually dropped when a final version is available.

transformer: &str,
inputs: Vec<&'a str>,
kwargs: default!(JsonB, "'{}'"),
) -> TableIterator<'a, (name!(text, String), name!(embedding, Vec<f32>))> {
let rows = match crate::bindings::transformers::embed(transformer, inputs.clone(), &kwargs.0) {
Ok(rows) => rows,
Err(e) => {
error!("{e}");
}
};
TableIterator::new(
inputs.into_iter().zip(rows.into_iter()).map(|(text, embedding)| {
(text.to_string(), embedding)
}),
)
}

/// Clears the GPU cache.
///
/// # Arguments
Expand Down

[8]ページ先頭

©2009-2025 Movatter.jp