Movatterモバイル変換

Hide navigation sidebar

Hide table of contents sidebar

Skip to content

Toggle site navigation sidebar

Hezar Documentation

Toggle table of contents sidebar

Hezar Documentation

Get Started
Toggle navigation of Get Started
Tutorial
Toggle navigation of Tutorial
- Models
- Datasets
- Embeddings
- Preprocessors
- Training Recipes
  Toggle navigation of Training Recipes
Developer Guides
Toggle navigation of Developer Guides
Reference API
Toggle navigation of Reference API
- hezar package
  Toggle navigation of hezar package
  - hezar.data package
    Toggle navigation of hezar.data package
  - hezar.embeddings package
    Toggle navigation of hezar.embeddings package
  - hezar.metrics package
    Toggle navigation of hezar.metrics package
  - hezar.models package
    Toggle navigation of hezar.models package
    - hezar.models.backbone package
      Toggle navigation of hezar.models.backbone package
      - hezar.models.backbone.bert package
        Toggle navigation of hezar.models.backbone.bert package
        hezar.models.backbone.bert.bert module
        hezar.models.backbone.bert.bert_config module
      - hezar.models.backbone.distilbert package
        Toggle navigation of hezar.models.backbone.distilbert package
        hezar.models.backbone.distilbert.distilbert module
        hezar.models.backbone.distilbert.distilbert_config module
      - hezar.models.backbone.roberta package
        Toggle navigation of hezar.models.backbone.roberta package
        hezar.models.backbone.roberta.roberta module
        hezar.models.backbone.roberta.roberta_config module
      - hezar.models.backbone.vit package
        Toggle navigation of hezar.models.backbone.vit package
        hezar.models.backbone.vit.vit module
        hezar.models.backbone.vit.vit_config module
    - hezar.models.image2text package
      Toggle navigation of hezar.models.image2text package
      - hezar.models.image2text.beit_roberta package
        Toggle navigation of hezar.models.image2text.beit_roberta package
        hezar.models.image2text.beit_roberta.beit_roberta_image2text module
        hezar.models.image2text.beit_roberta.beit_roberta_image2text_config module
      - hezar.models.image2text.crnn package
        Toggle navigation of hezar.models.image2text.crnn package
        hezar.models.image2text.crnn.crnn_decode_utils module
        hezar.models.image2text.crnn.crnn_image2text module
        hezar.models.image2text.crnn.crnn_image2text_config module
      - hezar.models.image2text.trocr package
        Toggle navigation of hezar.models.image2text.trocr package
        hezar.models.image2text.trocr.trocr_image2text module
        hezar.models.image2text.trocr.trocr_image2text_config module
      - hezar.models.image2text.vit_gpt2 package
        Toggle navigation of hezar.models.image2text.vit_gpt2 package
        hezar.models.image2text.vit_gpt2.vit_gpt2_image2text module
        hezar.models.image2text.vit_gpt2.vit_gpt2_image2text_config module
      - hezar.models.image2text.vit_roberta package
        Toggle navigation of hezar.models.image2text.vit_roberta package
        hezar.models.image2text.vit_roberta.vit_roberta_image2text module
        hezar.models.image2text.vit_roberta.vit_roberta_image2text_config module
    - hezar.models.mask_filling package
      Toggle navigation of hezar.models.mask_filling package
      - hezar.models.mask_filling.bert package
        Toggle navigation of hezar.models.mask_filling.bert package
        hezar.models.mask_filling.bert.bert_mask_filling module
        hezar.models.mask_filling.bert.bert_mask_filling_config module
      - hezar.models.mask_filling.distilbert package
        Toggle navigation of hezar.models.mask_filling.distilbert package
        hezar.models.mask_filling.distilbert.distilbert_mask_filling module
        hezar.models.mask_filling.distilbert.distilbert_mask_filling_config module
      - hezar.models.mask_filling.roberta package
        Toggle navigation of hezar.models.mask_filling.roberta package
        hezar.models.mask_filling.roberta.roberta_mask_filling module
        hezar.models.mask_filling.roberta.roberta_mask_filling_config module
    - hezar.models.sequence_labeling package
      Toggle navigation of hezar.models.sequence_labeling package
      - hezar.models.sequence_labeling.bert package
        Toggle navigation of hezar.models.sequence_labeling.bert package
        hezar.models.sequence_labeling.bert.bert_sequence_labeling module
        hezar.models.sequence_labeling.bert.bert_sequence_labeling_config module
      - hezar.models.sequence_labeling.distilbert package
        Toggle navigation of hezar.models.sequence_labeling.distilbert package
        hezar.models.sequence_labeling.distilbert.distilbert_sequence_labeling module
        hezar.models.sequence_labeling.distilbert.distilbert_sequence_labeling_config module
      - hezar.models.sequence_labeling.roberta package
        Toggle navigation of hezar.models.sequence_labeling.roberta package
        hezar.models.sequence_labeling.roberta.roberta_sequence_labeling module
        hezar.models.sequence_labeling.roberta.roberta_sequence_labeling_config module
    - hezar.models.speech_recognition package
      Toggle navigation of hezar.models.speech_recognition package
      - hezar.models.speech_recognition.whisper package
        Toggle navigation of hezar.models.speech_recognition.whisper package
        hezar.models.speech_recognition.whisper.whisper_feature_extractor module
        hezar.models.speech_recognition.whisper.whisper_speech_recognition module
        hezar.models.speech_recognition.whisper.whisper_speech_recognition_config module
        hezar.models.speech_recognition.whisper.whisper_tokenizer module
    - hezar.models.text_classification package
      Toggle navigation of hezar.models.text_classification package
      - hezar.models.text_classification.bert package
        Toggle navigation of hezar.models.text_classification.bert package
        hezar.models.text_classification.bert.bert_text_classification module
        hezar.models.text_classification.bert.bert_text_classification_config module
      - hezar.models.text_classification.distilbert package
        Toggle navigation of hezar.models.text_classification.distilbert package
        hezar.models.text_classification.distilbert.distilbert_text_classification module
        hezar.models.text_classification.distilbert.distilbert_text_classification_config module
      - hezar.models.text_classification.roberta package
        Toggle navigation of hezar.models.text_classification.roberta package
        hezar.models.text_classification.roberta.roberta_text_classification module
        hezar.models.text_classification.roberta.roberta_text_classification_config module
    - hezar.models.text_detection package
      Toggle navigation of hezar.models.text_detection package
      - hezar.models.text_detection.craft package
        Toggle navigation of hezar.models.text_detection.craft package
        hezar.models.text_detection.craft.craft_image_processor module
        hezar.models.text_detection.craft.craft_text_detection module
        hezar.models.text_detection.craft.craft_text_detection_config module
        hezar.models.text_detection.craft.craft_utils module
    - hezar.models.text_generation package
      Toggle navigation of hezar.models.text_generation package
      - hezar.models.text_generation.gpt2 package
        Toggle navigation of hezar.models.text_generation.gpt2 package
        hezar.models.text_generation.gpt2.gpt2_text_generation module
        hezar.models.text_generation.gpt2.gpt2_text_generation_config module
      - hezar.models.text_generation.t5 package
        Toggle navigation of hezar.models.text_generation.t5 package
        hezar.models.text_generation.t5.t5_text_generation module
        hezar.models.text_generation.t5.t5_text_generation_config module
    - hezar.models.model module
    - hezar.models.model_outputs module
  - hezar.preprocessors package
    Toggle navigation of hezar.preprocessors package
  - hezar.trainer package
    Toggle navigation of hezar.trainer package
  - hezar.utils package
    Toggle navigation of hezar.utils package
  - hezar.builders module
  - hezar.configs module
  - hezar.constants module
  - hezar.registry module
Contributing to Hezar

Toggle table of contents sidebar

hezar.data.datasets.ocr_dataset module ¶

classhezar.data.datasets.ocr_dataset.OCRDataset(config:OCRDatasetConfig,split=None,preprocessor=None,**kwargs)[source]¶

Bases:Dataset

General OCR dataset class.

OCR dataset supports two types of image to text dataset. One is for tokenizer-based models in which the labels aretokens and the other is char-level models in which the labels are separated by character and the converted to ids.This behavior is specified by thetext_split_type in config which can be eithertokenize orchar_split.

required_backends:List[str|Backends]=[Backends.SCIKIT]¶

classhezar.data.datasets.ocr_dataset.OCRDatasetConfig(path:str|None=None,task:~hezar.constants.TaskType=TaskType.IMAGE2TEXT,max_size:int|float|None=None,hf_load_kwargs:dict|None=None,text_split_type:str|~hezar.data.datasets.ocr_dataset.TextSplitType=TextSplitType.CHAR_SPLIT,id2label:~typing.Dict[int,str]=<factory>,text_column:str='label',images_paths_column:str='image_path',max_length:int|None=None,invalid_characters:list|None=None,reverse_text:bool|None=None,reverse_digits:bool|None=None)[source]¶

Bases:DatasetConfig

Configuration class for OCR datasets.

Parameters:

path (str) – Path to the dataset.
text_split_type (TextSplitType) – Type of text splitting (CHAR_SPLIT or TOKENIZE).
id2label (Dict[int,str]) – Mapping of label IDs to characters.
text_column (str) – Column name for text in the dataset.
images_paths_column (str) – Column name for image paths in the dataset.
max_length (int) – Maximum length of text.
invalid_characters (list) – List of invalid characters.
reverse_digits (bool) – Whether to reverse the digits in text.

id2label:Dict[int,str]¶

images_paths_column:str='image_path'¶

invalid_characters:list=None¶

max_length:int=None¶

name:str='ocr'¶

path:str=None¶

reverse_digits:bool=None¶

reverse_text:bool=None¶

task:TaskType='image2text'¶

text_column:str='label'¶

text_split_type:str|TextSplitType='char_split'¶

classhezar.data.datasets.ocr_dataset.TextSplitType(value)[source]¶

Bases:str,Enum

An enumeration.

CHAR_SPLIT='char_split'¶

TOKENIZE='tokenize'¶

hezar.data.datasets.ocr_dataset module

[8]ページ先頭

©2009-2025 Movatter.jp