huggingface/text-embeddings-inferencePublic

NotificationsYou must be signed in to change notification settings
Fork288
Star3.8k

Add`Dense` layer in`2_Dense/` modules#660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

alvarobartt wants to merge15 commits intomain

base:main

Choose a base branch

fromadd-dense

Open

Add`Dense` layer in`2_Dense/` modules#660

alvarobartt wants to merge15 commits intomainfromadd-dense

+8,720 −52

Conversation

Copy link

Member

alvarobartt commentedJun 26, 2025•
edited
Loading

What does this PR do?

This PR adds support for2_Dense/ modules, since some models as e.g.https://huggingface.co/sentence-transformers/LaBSE require the extraDense module i.e., an extraLinear layer on top of the pooled embeddings, when generating the embeddings.

So on, this PR introduces theDenseLayer trait, impl forDense and addsDenseConfig, which are models with basically a singleLinear layer, pulling the configuration from2_Dense/config.json and the model weights from2_Dense/model.safetensors.

Note

The2_Dense/ is only required when generating embeddings, meaning that it will only apply to theEmbedding model type, whereas theReranker andClassifier are not affected by this addition, so on, neither therank orpredict methods for the given backend.

This PR solves the issue recently reported athttps://discuss.huggingface.co/t/inference-result-not-aligned-with-local-version-of-same-model-and-revision/160514.

Additionally, this PR also fixes a shape mismatch issue produced when performing matrix multiplication of 2D tensors on Metal devices due to thecandle Metal kernels expecting the tensors to be contiguous. It seems that the error only arises on Metal for 2D tensors, where as for e.g. 3D tensors it seems to be working just fine without having to use.contiguous() (which is expensive as it needs to clone the tensor).

Reproduce

To ensure that the implementation was working fine and producing successful results i.e.,allclose like checks are true, and the cosine similarity is 1.0 (or as close as possible), the following test has been run:

Deploy Text Embeddings Inference (TEI) as e.g.:

cargo run --release --features candle,http --no-default-features -- --model-id sentence-transformers/LaBSE --dtype float16

Then, once it's running run the following Python script (requirestorch,transformers,sentence-transformers,accelerate andnumpy):

importnumpyasnpimportrequestsfromsentence_transformersimportSentenceTransformermodel=SentenceTransformer("sentence-transformers/LaBSE",model_kwargs={"torch_dtype":"float16","device_map":"mps",    },)out_py=model.encode("What is Deep Learning?",normalize_embeddings=True,convert_to_numpy=True,)response=requests.post("http://localhost:3000/embed",json={"inputs":"What is Deep Learning?","normalize":True,    },)response.raise_for_status()out=response.json()[0]out_http=np.array(out,dtype=np.float16)print(f"Embeddings are close:{np.allclose(out_py,out_http,atol=1e-3,rtol=1e-4)=}")defcosine_similarity(x:np.ndarray,y:np.ndarray)->float:dot_product=np.dot(x,y)norm_x=np.linalg.norm(x)norm_y=np.linalg.norm(y)returndot_product/ (norm_x*norm_y)print(f"The similarity score is:{cosine_similarity(out_py,out_http)=}")

It should produce the following on any combination of device (CPU, MPS, CUDA) and dtype (float32, float16):

Embeddings are close: np.allclose(out_py, out_http, atol=1e-3, rtol=1e-4)=TrueThe similarity score is: cosine_similarity(out_py, out_http)=np.float16(1.0)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read thecontributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue or theforum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are thedocumentation guidelines, andhere are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@Narsil

alvarobartt added2 commits

June 26, 2025 18:09

Fixforward pass onLinear for Metal devices

73935bc

Apparently, `candle` expects the tensors to be contiguous on Metal whenperforming 2D matrix multiplication

AddDense,DenseLayer andDenseConfig to handle2_Dense/

1a59eaf

Required for some models as e.g.https://huggingface.co/sentence-transformers/LaBSE

alvarobartt requested a review fromNarsil

June 26, 2025 16:26

alvarobartt added2 commits

June 26, 2025 18:38

Fix linting and update code-comment

b666d0e

Runpre-commit run --all-files

27adbb6

Copy link

Contributor

kozistr commentedJun 27, 2025

@alvarobartt Hi! Just for your reference, you may already be aware, Stella v5 model usesIdentity layer as its activation function for2_Dense!2_Dense/config.json

alvarobartt added3 commits

July 2, 2025 12:01

Merge branch 'main' into add-dense

578ea3a

AddDenseActivation and handletanh andidentity

1f38d86

Runpre-commit run --all-files

dcb3ee8

Narsil reviewed

Jul 2, 2025

View reviewed changes

Copy link

Collaborator

Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good, I think we can simplify a bit the parsing part.

backends/candle/src/models/dense.rs OutdatedShow resolvedHide resolved

alvarobartt added2 commits

July 2, 2025 13:34

Add--dense-path argument (to be used withinCandleBackend)

c9cddf2

If `--dense-path` was not allowed, that would prevent users from usingother `Dense` layers when available as per e.g.https://huggingface.co/NovaSearch/stella_en_400M_v5, that containsdifferent directories for different `Dense` layers with different outputvector dimensionality as `2_Dense_<dims>/`.

Fix warn/error messages on2_Dense downloads

070ef02

alvarobartt changed the title~~AddDense,DenseLayer andDenseConfig to handle2_Dense/~~AddDense layer in2_Dense/ modules

Jul 2, 2025

alvarobartt added5 commits

July 2, 2025 15:06

Updatedownload_artifacts incandle/tests to includedense_path

5ff72a6

Addbackends/candle/tests/test_dense.rs

1618559

Rename withserde inDenseActivation

144aec1

Add comments inDenseActivation

0d1b698

Add missingNone inrun fordense_path

a368460

alvarobartt marked this pull request as ready for review

July 3, 2025 08:57

Merge branch 'main' into add-dense

4c855bc

Labels

None yet

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add`Dense` layer in`2_Dense/` modules#660

Are you sure you want to change the base?

Add`Dense` layer in`2_Dense/` modules#660

Conversation

alvarobartt commentedJun 26, 2025•
edited
Loading

Uh oh!

What does this PR do?

Reproduce

Before submitting

Who can review?

Uh oh!

kozistr commentedJun 27, 2025

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

AddDense layer in2_Dense/ modules#660

Are you sure you want to change the base?

AddDense layer in2_Dense/ modules#660

Conversation

alvarobartt commentedJun 26, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

What does this PR do?

Reproduce

Before submitting

Who can review?

Uh oh!

kozistr commentedJun 27, 2025

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add`Dense` layer in`2_Dense/` modules#660

Add`Dense` layer in`2_Dense/` modules#660

alvarobartt commentedJun 26, 2025•
edited
Loading