NotificationsYou must be signed in to change notification settings
Fork506
Star5.4k

fix to issue #94#95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

lolipopshock merged 2 commits intoLayout-Parser:masterfromkforcodeai:master

Feb 2, 2022

Merged

fix to issue #94#95

lolipopshock merged 2 commits intoLayout-Parser:masterfromkforcodeai:master

Feb 2, 2022

Conversation

Copy link

Contributor

kforcodeai commentedOct 31, 2021

Fixes ##94 (comment)
#94
The issue was, all digit sequences were inferred as float, with this fix all text (numeric + non-numeric) will be inferred as string and the user can change it to their desired data type.
But with this fix, the user will be required to change the numeric data type columns.
i could not find any better solution other than this.

fix toLayout-Parser#94 (comment)

9b2fa43

now all text will inferred as string and the user can change it to their desired data type.

kforcodeai changed the title~~fix to https://github.com/Layout-Parser/layout-parser/issues/94#issue…~~fix to issue #94

Oct 31, 2021

lolipopshock reviewed

Nov 3, 2021

View reviewed changes

src/layoutparser/ocr/tesseract_agent.py Outdated

		_cols.remove('text')
		for col in _cols:
		_df[col] = _df[col].astype(int)
		res['data'] = _df

Copy link

Member

lolipopshockNov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you try the following code:

_data = pytesseract.image_to_data(img_content, lang=self.lang, **self.configs)df = pd.read_csv(   io.StringIO(_data), quoting=csv.QUOTE_NONE, encoding="utf-8", sep="\t")df['text'] = df['text'].astype('str')res["data"] = df

Copy link

ContributorAuthor

kforcodeaiNov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@lolipopshock sorry it does not, I have tried this
and ya i get it, the for loop and all that stuff looks ugly :)

here's the screenshot

Copy link

Member

lolipopshockNov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I see -- it's the issue from floating point numbers.0 right?

Copy link

ContributorAuthor

kforcodeaiNov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes

maybe a simpler solution

09f630e

lolipopshock closed this

Feb 2, 2022

lolipopshock reopened this

Feb 2, 2022

Copy link

Member

lolipopshock commentedFeb 2, 2022•
edited
Loading

I think the new solution can solve your issue -- see example below:

Let's say we have a csv filetest.csv:

Col_A, Col_B, 12, 3245.0,

And if we read it via:

df = pd.read_csv("test.csv", converters={"Col_A": str})

We have

Test	B
	1
2	3
245.0

(There's no.0 for 2 in the 2nd row and 1st col.

lolipopshock merged commit0809fa8 intoLayout-Parser:master

Feb 2, 2022

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix to issue #94#95

fix to issue #94#95

Uh oh!

Conversation

kforcodeai commentedOct 31, 2021

Uh oh!

lolipopshockNov 3, 2021

Choose a reason for hiding this comment

Uh oh!

kforcodeaiNov 4, 2021

Choose a reason for hiding this comment

Uh oh!

lolipopshockNov 4, 2021

Choose a reason for hiding this comment

Uh oh!

kforcodeaiNov 4, 2021

Choose a reason for hiding this comment

Uh oh!

lolipopshock commentedFeb 2, 2022•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

fix to issue #94#95

fix to issue #94#95

Uh oh!

Conversation

kforcodeai commentedOct 31, 2021

Uh oh!

lolipopshockNov 3, 2021

Choose a reason for hiding this comment

Uh oh!

kforcodeaiNov 4, 2021

Choose a reason for hiding this comment

Uh oh!

lolipopshockNov 4, 2021

Choose a reason for hiding this comment

Uh oh!

kforcodeaiNov 4, 2021

Choose a reason for hiding this comment

Uh oh!

lolipopshock commentedFeb 2, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lolipopshock commentedFeb 2, 2022•
edited
Loading