Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

pdf-translator translates English PDF files into Japanese, preserving the original layout.

License

NotificationsYou must be signed in to change notification settings

discus0434/pdf-translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository offers an WebUI and API endpoint that translates English PDF files into Japanese, preserving the original layout.

Features

To be more readable, the translated PDF file displays the original PDF page in the left side and the translated text in the right side (see the image above).

To speed up the translation process,translation is performed until "References" section in the PDF file. After that, the rest of the page is copied as it is.

This repository contains some unsolved issues. Pull requests for improvements are always welcome.

Installation

  1. Clone this repository
   git clone https://github.com/discus0434/pdf-translator.gitcd pdf-translator/docker
  1. Build the docker image via Makefile
   make build
  1. Run the docker container via Makefile
   make run

GUI Usage

Access to GUI via browser.

http://localhost:8288

CLI Usage

cd pdf-translator/docker&& make translate INPUT="path/to/input_pdf_or_dir"

You can throw a PDF file or a directory containing PDF files.

The translated PDF files will be saved in./outputs directory.

Requirements

  • NVIDIA GPU(currently only support NVIDIA GPU)
  • Docker

License

This repository does not allow commercial use.

This repository is licensed under CC BY-NC 4.0. SeeLICENSE for more information.

References

  • For PDF layout analysis, usingDiT.

  • For PDF to text conversion, usingPaddlePaddle model.

  • For text translation, usingFuguMT model fromHuggingFace.

    FuguMT models are distributed under the CC BY-SA 4.0 license. Please also note that the use is clearly stated as "for research purposes only" and that "no responsibility is assumed for operation or output".

  • Font files are fromSource Han Serif.

TODOs

  • Make possible to highlight the translated text
  • Support M1 Mac or CPU

Contributors

Thanks to the following people who have contributed to this project:

  • Akira Ishino: Improvements on text truncation algorithm
  • hibit: Implementation of directory input totranslator.py

About

pdf-translator translates English PDF files into Japanese, preserving the original layout.

Topics

Resources

License

Stars

Watchers

Forks

Languages


[8]ページ先頭

©2009-2025 Movatter.jp