Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Adding embed_array for getting the embeddings of multiple strings#686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
montanalow merged 15 commits intopostgresml:masterfromjsaied99:embeddings_inputs
Jun 5, 2023

Conversation

@jsaied99
Copy link
Contributor

@jsaied99jsaied99 commentedJun 5, 2023
edited
Loading

Works the same way as pgml.embed but can take an array of inputs. Example:

SELECTpgml.embed(    transformer=>'intfloat/e5-small',     inputs=> ARRAY['Hello','World'],    kwargs=>'{"device": "cpu"}'::JSONB);
SELECTpgml.embed(    transformer=>'hkunlp/instructor-base',     inputs=> ARRAY['Hello World','I love Rust'],    kwargs=>'{"device": "cpu", "instruction": "Represent the content for retrieving supporting documents:"}'::JSONB);

For instructor, I'm passing the same instruction to each input, we could potentially have a inputs array that is a json and each individual input.

@jsaied99jsaied99 marked this pull request as ready for reviewJune 5, 2023 20:20

try:
inputs=json.loads(inputs)
exceptjson.decoder.JSONDecodeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is the known case that this is handling?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Without passing any extra arguments, I couldn't think of a way of knowing wether inputs was a string or a JSON string. Thought the simplest way was trying to convert it into a python object.


else:
texts_with_instructions= []
instruction=kwargs.pop("instruction")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Seems like it may be common to have multiple instructions with multiple inputs? Hmm, in that case I'm not sure we have a nice way to structure the args... we can leave that for some future work.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Okay I'll leave it as it is.

ifinstructor:
result=model.encode(inputs,**kwargs)
ifinstructorandlen(result)==1:
result=result[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think we also need to handle the multi instructor case, right? Or does instructor always just return an appropriate single dimension array?

inputs:default!(Vec<String>,"ARRAY[]::TEXT[]"),
kwargs:default!(JsonB,"'{}'"),
) ->Vec<Vec<f32>>{
crate::bindings::transformers::embed_batch(transformer,&inputs,&kwargs.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would makecrate::bindings::transformers::embed take a list of inputs always, and modifypub fn embed to pass a slice in, similar togenerate andgenerate_batch.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure, I'll change this. Makes sense

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I guess if we do pass an input array always, I could get rid of the try catch since it's always an array of strings.

@jsaied99jsaied99 requested a review frommontanalowJune 5, 2023 21:26
@montanalowmontanalow merged commit96dd570 intopostgresml:masterJun 5, 2023
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@montanalowmontanalowmontanalow approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@jsaied99@montanalow

[8]ページ先頭

©2009-2025 Movatter.jp