davidberenstein1957Follow

davidberenstein1957

🦦

David Berenstein davidberenstein1957

🦦

ML & DevRel at 🤗@huggingface ||👨🏽‍🍳 Cooking, 👨🏽‍💻 Coding, 🏆 Committing

186 followers ·108 following

Achievements

Achievements

Organizations

davidberenstein1957/README.md

Hi there 👋

From failing to study medicine ➡️ BSc industrial engineer ➡️ MSc computer scientist.
Life can be strange, so better enjoy it.
I´m sure I do by: 👨🏽‍🍳 Cooking, 👨🏽‍💻 Coding, 🏆 Committing.

Conferences/Presentations 📖

Synthetic Data - Weaviate Podcast #118! -podcast
SmolAgents - From~~Bells and Whistles~~ to Agents and Tools -slides video
No data? No problem! - synthetic data to the rescue -slides
Practical AI Podcast - Towards high-quality (maybe synthetic) datasets -podcast
Code Together Podcast Intel Software - Scaling LLM Datasets with Less Effort Using Argilla -video
Mastering LLMs - Creating, curating, and cleaning data for LLMs -slides video
🧼 From GPU-poor to data-rich - data quality practices for LLM fine-tuning -slides
Deeplearning.ai LLM workshop - get started with Argilla for human- and distilabel for AI feedback -video
NLP Healthcare Summit 2023 - Smart Shortcuts for Bootstrapping a Healthcare NER Project -video
Anyscale Ray Europe Meetup - Smart shortcuts for Bootstrapping a Text Classification project -video

AI Code Content

Employers 👨🏽‍💻

Hugging Face 🤗 (2024-current) - The AI community building the future
Argilla (2022-2024) - data annotation and monitoring for enterprise NLP
Pandora Intelligence (2020-2022) - an independent intelligence company, specialized in security risks

Open source ⭐️

Maintainer 🤓

observers - A Lightweight Library for AI Observability
dataset-viber - Data viber is your chill repo for data collection and vibe checks
concise-concepts - a word similarity approach to few-shot NER
fast-sentence-transformers - simply, faster, sentence-transformers
classy-classification - a quick and dirty few-shot text classification solution
crosslingual-coreference - a multi-lingual CoRef resolver using cross-lingual training
adept-augmentations - a Python library aimed at dissecting and augmenting NER training data
spacy-setfit - a Python library aimed to facilitate easy SetFit usage in spaCy

Contributions 🫱🏾‍🫲🏼

Haystack - small feature and CI/CD updates
- InMemoryDatabase - Serialization + to and from disk methods
- GitHub Actions - caching for pip environment
spaCy - several additions to the spacy-universe
- spanmarker - added.pipe() method to spaCy integration
- spacy-dbpedia-spotlight - added a batch processing functionality
- spacy-fishing - added a batch processing functionality + bug fixes
- spacy-opentapioca - added a batch processing functionality
streamlit-url-fragment - resolved Python versioning issues
allennlp-models - added a batch processing functionality
mutate - resolved Python versioning issues and addedPyPI support
rebel - added a batch processing functionality
trl - updated RLHF documentation forPPOTrainer
vicinity
- added vector search on serializable item types
- improved docs
minhash - added deduplication based on entropy score

Volunteering 🌍

Bonfari - small to medium sustainable scale projects in Gambia 🇬🇲
510 red-cross - occasional projects to improve humanitarian aid with data

Contacts

PinnedLoading

argilla-io/argillaargilla-io/argillaPublic
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Python 4.4k 420
argilla-io/distilabelargilla-io/distilabelPublic
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Python 2.6k 188
dataset-viberdataset-viberPublic
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
Python 46 12
concise-conceptsconcise-conceptsPublic
This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring.
Python 244 14
crosslingual-coreferencecrosslingual-coreferencePublic
A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.
Python 105 18
spacy-setfitspacy-setfitPublic
This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.
Python 78 5

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

David Berenstein davidberenstein1957

Achievements

Achievements

Organizations

Block or report davidberenstein1957

Hi there 👋

Conferences/Presentations 📖

AI Code Content

Employers 👨🏽‍💻

Open source ⭐️

Maintainer 🤓

Contributions 🫱🏾‍🫲🏼

Volunteering 🌍

Contacts

PinnedLoading