Welcome to Extractus
We develop and share open source tools for collecting media content.
- feed-extractor: extract & normalize RSS/ATOM/JSON feed
- article-extractor: extract main article from given URL
- oembed-extractor: extract oEmbed data from supported providers
You can use one or combination of them to build news sites, create automated content systems for marketing campaign or gather dataset for NLP projects, etc.
Here is an example based on our news engine.
If you have any idea, or want more features, or face any problem while using them, please create issue.
In the future, we would like to add more dedicated tools for extracting links, tweets, audios, videos, products, crypto/stock prices.
We have not much time. This is self-training and non-profit side project. Contributions and collaborators are always welcomed 🙂
PinnedLoading
- article-extractor
article-extractor PublicTo extract main article from given URL with Node.js
- feed-extractor
feed-extractor PublicSimplest way to read & normalize RSS/ATOM/JSON feed data
Repositories
- extractus Public
extractus/extractus’s past year of commit activity