From failing to study medicine ➡️ BSc industrial engineer ➡️ MSc computer scientist.
Life can be strange, so better enjoy it.
I´m sure I do by: 👨🏽🍳 Cooking, 👨🏽💻 Coding, 🏆 Committing.
- Synthetic Data - Weaviate Podcast #118! -podcast
- SmolAgents - From
Bells and Whistlesto Agents and Tools -slidesvideo - No data? No problem! - synthetic data to the rescue -slides
- Practical AI Podcast - Towards high-quality (maybe synthetic) datasets -podcast
- Code Together Podcast Intel Software - Scaling LLM Datasets with Less Effort Using Argilla -video
- Mastering LLMs - Creating, curating, and cleaning data for LLMs -slidesvideo
- 🧼 From GPU-poor to data-rich - data quality practices for LLM fine-tuning -slides
- Deeplearning.ai LLM workshop - get started with Argilla for human- and distilabel for AI feedback -video
- NLP Healthcare Summit 2023 - Smart Shortcuts for Bootstrapping a Healthcare NER Project -video
- Anyscale Ray Europe Meetup - Smart shortcuts for Bootstrapping a Text Classification project -video
- Hugging Face 🤗 (2024-current) - The AI community building the future
- Argilla (2022-2024) - data annotation and monitoring for enterprise NLP
- Pandora Intelligence (2020-2022) - an independent intelligence company, specialized in security risks
- observers - A Lightweight Library for AI Observability
- dataset-viber - Data viber is your chill repo for data collection and vibe checks
- concise-concepts - a word similarity approach to few-shot NER
- fast-sentence-transformers - simply, faster, sentence-transformers
- classy-classification - a quick and dirty few-shot text classification solution
- crosslingual-coreference - a multi-lingual CoRef resolver using cross-lingual training
- adept-augmentations - a Python library aimed at dissecting and augmenting NER training data
- spacy-setfit - a Python library aimed to facilitate easy SetFit usage in spaCy
- Haystack - small feature and CI/CD updates
- InMemoryDatabase - Serialization + to and from disk methods
- GitHub Actions - caching for pip environment
- spaCy - several additions to the spacy-universe
- spanmarker - added
.pipe()
method to spaCy integration - spacy-dbpedia-spotlight - added a batch processing functionality
- spacy-fishing - added a batch processing functionality + bug fixes
- spacy-opentapioca - added a batch processing functionality
- spanmarker - added
- streamlit-url-fragment - resolved Python versioning issues
- allennlp-models - added a batch processing functionality
- mutate - resolved Python versioning issues and added
PyPI
support - rebel - added a batch processing functionality
- trl - updated RLHF documentation for
PPOTrainer
- vicinity
- minhash - added deduplication based on entropy score
- Bonfari - small to medium sustainable scale projects in Gambia 🇬🇲
- 510 red-cross - occasional projects to improve humanitarian aid with data
PinnedLoading
- argilla-io/argilla
argilla-io/argilla PublicArgilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
- argilla-io/distilabel
argilla-io/distilabel PublicDistilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
- dataset-viber
dataset-viber PublicDataset Viber is your chill repo for data collection, annotation and vibe checks.
- concise-concepts
concise-concepts PublicThis repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring.
- crosslingual-coreference
crosslingual-coreference PublicA multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.
- spacy-setfit
spacy-setfit PublicThis repository contains an easy and intuitive approach to use SetFit in combination with spaCy.
If the problem persists, check theGitHub status page orcontact support.