- Notifications
You must be signed in to change notification settings - Fork352
Adding embed_array for getting the embeddings of multiple strings#686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
| try: | ||
| inputs=json.loads(inputs) | ||
| exceptjson.decoder.JSONDecodeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What is the known case that this is handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Without passing any extra arguments, I couldn't think of a way of knowing wether inputs was a string or a JSON string. Thought the simplest way was trying to convert it into a python object.
| else: | ||
| texts_with_instructions= [] | ||
| instruction=kwargs.pop("instruction") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Seems like it may be common to have multiple instructions with multiple inputs? Hmm, in that case I'm not sure we have a nice way to structure the args... we can leave that for some future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Okay I'll leave it as it is.
| ifinstructor: | ||
| result=model.encode(inputs,**kwargs) | ||
| ifinstructorandlen(result)==1: | ||
| result=result[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think we also need to handle the multi instructor case, right? Or does instructor always just return an appropriate single dimension array?
pgml-extension/src/api.rs Outdated
| inputs:default!(Vec<String>,"ARRAY[]::TEXT[]"), | ||
| kwargs:default!(JsonB,"'{}'"), | ||
| ) ->Vec<Vec<f32>>{ | ||
| crate::bindings::transformers::embed_batch(transformer,&inputs,&kwargs.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I would makecrate::bindings::transformers::embed take a list of inputs always, and modifypub fn embed to pass a slice in, similar togenerate andgenerate_batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sure, I'll change this. Makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I guess if we do pass an input array always, I could get rid of the try catch since it's always an array of strings.
Uh oh!
There was an error while loading.Please reload this page.
Works the same way as pgml.embed but can take an array of inputs. Example:
For instructor, I'm passing the same instruction to each input, we could potentially have a inputs array that is a json and each individual input.