Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix to issue #94#95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
lolipopshock merged 2 commits intoLayout-Parser:masterfromkforcodeai:master
Feb 2, 2022
Merged

Conversation

kforcodeai
Copy link
Contributor

Fixes ##94 (comment)
#94
The issue was, all digit sequences were inferred as float, with this fix all text (numeric + non-numeric) will be inferred as string and the user can change it to their desired data type.
But with this fix, the user will be required to change the numeric data type columns.
i could not find any better solution other than this.

now all text will inferred as string and the user can change it to their desired data type.
@kforcodeaikforcodeai changed the titlefix to https://github.com/Layout-Parser/layout-parser/issues/94#issue…fix to issue #94Oct 31, 2021
_cols.remove('text')
for col in _cols:
_df[col] = _df[col].astype(int)
res['data'] = _df
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you try the following code:

_data = pytesseract.image_to_data(img_content, lang=self.lang, **self.configs)df = pd.read_csv(   io.StringIO(_data), quoting=csv.QUOTE_NONE, encoding="utf-8", sep="\t")df['text'] = df['text'].astype('str')res["data"] = df

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@lolipopshock sorry it does not, I have tried this
and ya i get it, the for loop and all that stuff looks ugly :)

here's the screenshot
layout_parse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I see -- it's the issue from floating point numbers.0 right?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes

@lolipopshock
Copy link
Member

lolipopshock commentedFeb 2, 2022
edited
Loading

I think the new solution can solve your issue -- see example below:

Let's say we have a csv filetest.csv:

Col_A, Col_B, 12, 3245.0,

And if we read it via:

df = pd.read_csv("test.csv", converters={"Col_A": str})

We have

TestB
 1
23
245.0

(There's no.0 for 2 in the 2nd row and 1st col.

@lolipopshocklolipopshock merged commit0809fa8 intoLayout-Parser:masterFeb 2, 2022
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@lolipopshocklolipopshocklolipopshock left review comments

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@kforcodeai@lolipopshock

[8]ページ先頭

©2009-2025 Movatter.jp