In this presentation, I will build onInes Montani's keynote, "Applied NLP in the Age of Generative AI", by demonstrating how to create an information extraction pipeline. The talk will focus on using thespaCy NLP library and theProdigy annotation tool, although the principles discussed will also apply to other frameworks.
spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. It’s designed specifically for production use and helps you build applications that process and “understand” large volumes of text.
Prodigy is a modern annotation tool for creating training data for machine learning models. It’s so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration.
https://github.com/explosion/spacy-llm
spacy-llm features a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks, no training data required.
https://explosion.ai/blog/human-in-the-loop-distillation
This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.
https://explosion.ai/blog/sp-global-commodities
A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment using human-in-the-loop distillation.