Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
@stephantul
stephantul
Follow
View stephantul's full-sized avatar
🌳
Busy planting trees

Stephan Tulkens stephantul

🌳
Busy planting trees

Organizations

@clips@MinishLab

Block or report stephantul

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more aboutblocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more aboutreporting abuse.

Report abuse
stephantul/README.md

I'm Stéphan Tulkens! I'm a computational linguistics/AI person. I am currently working as a machine learning engineer atEcosia. I am one of the two founding members ofMinish.

I got my Phd atCLiPS at the University of Antwerpen under the watchful eyes of Walter Daelemans (Computational Linguistics) and Dominiek Sandra (Psycholinguistics). The topic of my Phd was the way people process orthography during reading. You can find a copyhere.Before that I studied computational linguistics (Ma), philosophy (Ba) and software engineering (Ba)

My goal is always to make things as fast and small as possible. I like it when simple models work well, and I love it when simple models get close in accuracy to big models. I do not believe absolute accuracy is a metric to be chased, and I think we should always be mindful of what a model computes or learns from the data.

I’m currently working on 🏃‍♂️:

  • vicinity: a ANN/Knn interface library.
  • model2vec: a library for creating extremely fast sentence-transformers through distillation.
  • semhash: a library for data deduplication and other dataset work.
  • reach: a library for loading and working with word embeddings.

Other stuff I made (most of it from my Phd) 🐕:

  • wordkit: a library for working with orthography
  • old20: calculate the orthographic levenshtein distance 20 metric.
  • metameric: fast interactive activation networks in numpy.
  • humumls: load the UMLS database into a mongodb instance. Fast!
  • dutchembeddings: word embeddings for dutch (back when this was a cool thing to do)

My research interests 🤖:

  • Tokenizers, specifically subword tokenizers.
  • Embeddings, specificallystatic embeddings (so old-fashioned! 💀), and how to combine these in meaningful ways.
  • String similarity, and how to compute it without using dynamic programming.

Contact:

PinnedLoading

  1. reachreachPublic

    Load embeddings and featurize your sentences.

    Python 28 6

  2. clips/dutchembeddingsclips/dutchembeddingsPublic

    Repository for the word embeddings experiments described in "Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource", presented at LREC 2016.

    Python 83 14

  3. MinishLab/model2vecMinishLab/model2vecPublic

    Fast State-of-the-Art Static Embeddings

    Python 1.1k 49


[8]ページ先頭

©2009-2025 Movatter.jp