Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

CoNLL-U

CoNLL-U is revised version of the CoNLL-X format. Annotations are encoded in plain text files (UTF-8, normalized to NFC, using only the LF character as line break, including an LF character at the end of file) with three types of lines:

  • Word lines containing the annotation of a word/token in 10 fields separated by single tab characters; see below.
  • Blank lines marking sentence boundaries.
  • Comment lines starting with hash (#).

This is an example of how to load a file inCoNLL-U format. The whole file is treated as one document. The example data (conllu.conllu) is based on one of the standard UD/CoNLL-U examples.

from langchain_community.document_loadersimport CoNLLULoader
API Reference:CoNLLULoader
loader= CoNLLULoader("example_data/conllu.conllu")
document= loader.load()
document
[Document(page_content='They buy and sell books.', metadata={'source': 'example_data/conllu.conllu'})]

Related


[8]ページ先頭

©2009-2025 Movatter.jp