Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

corpus-builder

Here are 21 public repositories matching this topic...

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

  • UpdatedSep 12, 2025
  • Python

Crawler for linguistic corpora

  • UpdatedAug 18, 2025
  • Python

Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora

  • UpdatedSep 21, 2022
  • C

Collector and speech cutter for librivox audiobooks

  • UpdatedDec 8, 2022
  • C#

Ebook Corpus - A parser and extractor for electronic books

  • UpdatedJan 29, 2026
  • Ruby
dictpress-tts

TTS plugin for dictpress

  • UpdatedDec 21, 2025
  • Go

Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!

  • UpdatedMar 14, 2024
  • Go

A corpus builder for evaluation of plagiarism detection tools

  • UpdatedDec 12, 2016
  • PHP

Automated text preprocessing pipeline for large corpora. Features customizable filters for diacritics, stop words, punctuation, and regex.

  • UpdatedOct 2, 2025
  • Python

Extract text from Vikidia/Wikipedia articles [fr]

  • UpdatedJul 20, 2021
  • Python

Crawl Ask.fm QA lists and create corpus for ML.

  • UpdatedDec 15, 2023
  • Python

The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing

  • UpdatedFeb 15, 2026
  • PHP
crow_frontend

The user interface for the Corpus & Repository of Writing, built in Angular

  • UpdatedFeb 17, 2026
  • TypeScript

Chatbot in Polish language, trained on movie subtitles collected using web scraping, based on Transformer architecture.

  • UpdatedJun 30, 2024
  • Jupyter Notebook

App and Scripts working with the corpus-builder CorpusCook, to have a corpus updated with corrected wrong predictions

  • UpdatedMar 20, 2020
  • Python

This is a text corpus management system for the german linguistic department of the university of Basel.

  • UpdatedApr 15, 2020
  • PHP

Builds Wikipedia corpora in I5 (a TEI-based format)

  • UpdatedJul 12, 2025
  • Java

Corpus Development Software for Machine Translation

  • UpdatedApr 23, 2024
  • JavaScript

Improve this page

Add a description, image, and links to thecorpus-builder topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thecorpus-builder topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2026 Movatter.jp