Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Stirling-PDF OCR clean-up yields no visible UI (with fix)#7225

stiggy87 started this conversation inGeneral
Discussion options

I was trying to clean-up various PDF files to be used with a local LLM. When the nordic encoder is used it reads a lot of data as garbled mess. To fix this, I went to my local Stirling-PDF install to try and clean up the OCR, but when I tried, it didn't show anything!

Digging some more, I found that the tesseract-data was not installed correctly. A couple things I found out:

  1. The tesseract-data was installed in/usr/share/tesseract-ocr/5 when Stirling-PDF is looking for/usr/share/tessdata by default
  2. The/opt/Stirling-PDF/.env does not contain theTESSDATA_PREFIX which is the path to the data location. (You could use the existing path, but the Stirling-PDF docs suggest thetessdata directory).

I fixed these items and restarted the service and everything looks to work!

I think the Stirling-PDF install/update script should be adding this info to make it work out of the box.

Anyone else have this issue? Or is it my install being bad? I did make this LXC before the refactor.

You must be logged in to vote

Replies: 4 comments 3 replies

Comment options

Tesseract is installing with a simpleapt-get install -y 'tesseract-ocr-*'. We don't have controll over how tesseract deb packages install

You must be logged in to vote
0 replies
Comment options

I understand there's no control over the installation of the tesseract-ocr packages.

What I'm bringing up is, the installing/update script might need to be updated following StirlingPDF documentation related to where the tesseract-ocr is placed.https://docs.stirlingpdf.com/Advanced%20Configuration/OCR/

What I mentioned is a good workaround if people want to use the OCR feature and run into the problem I did.

You must be logged in to vote
1 reply
@MickLesk
Comment options

Isn't it much easier to just create a symlink?

Comment options

I was trying to clean-up various PDF files to be used with a local LLM. When the nordic encoder is used it reads a lot of data as garbled mess. To fix this, I went to my local Stirling-PDF install to try and clean up the OCR, but when I tried, it didn't show anything!

Digging some more, I found that the tesseract-data was not installed correctly. A couple things I found out:

  1. The tesseract-data was installed in/usr/share/tesseract-ocr/5 when Stirling-PDF is looking for/usr/share/tessdata by default
  2. The/opt/Stirling-PDF/.env does not contain theTESSDATA_PREFIX which is the path to the data location. (You could use the existing path, but the Stirling-PDF docs suggest thetessdata directory).

I fixed these items and restarted the service and everything looks to work!

I think the Stirling-PDF install/update script should be adding this info to make it work out of the box.

Anyone else have this issue? Or is it my install being bad? I did make this LXC before the refactor.

Same issue - do you mind explaining how you fixed it?

You must be logged in to vote
1 reply
@smarthomelawyer
Comment options

Never mind - I found a prior discussion on tteck's GitHub that fixed the issuehttps://github.com/tteck/Proxmox/discussions/2538

in short, I ran:cp -r /usr/share/tesseract-ocr/5/* /usr/share/

Comment options

ln -s /usr/share/tesseract-ocr/5/tessdata /usr/share/tessdata is enought in the LXC to make the languages available. No need to copy the files.

You must be logged in to vote
1 reply
@nlubello
Comment options

Worked for me!! Thanks

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
6 participants
@stiggy87@nlubello@derreisende77@MickLesk@tremor021@smarthomelawyer

[8]ページ先頭

©2009-2025 Movatter.jp