Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

refactor(gtfs-reader): replace javacsv with opencsv#3084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
kschrab wants to merge7 commits intographhopper:master
base:master
Choose a base branch
Loading
fromkschrab:fork/master

Conversation

@kschrab
Copy link
Contributor

This PR exchanges the librarynet.sourceforge.javacsv:javacsv withcom.opencsv:opencsv which is used to parse CSV files from GTFS.

The motivation for this change is, thatjavacsv was released 16 years ago, its source code cannot be found anymore, and it was distributed under LGPL, whereas opencsv is Apache 2.0.

For the start I just wrote a wrapper to preserve the old API of the CsvReader from javacsv, and then used the new CSVReader from opencsv within that wrapper. There's also code for writing CSV files, which is not used anywhere though.

@karussell
Copy link
Member

karussell commentedDec 2, 2024
edited
Loading

Thanks! Did you test the changes already with a few GTFS files?

There's also code for writing CSV files, which is not used anywhere though.

Then we could remove it for GraphHopper here? (you can keep it of course in this PR)

Can you also adapt NOTICE.md?

Stringstr =reader.get(column);
if (str ==null) {
if (!missingRequiredColumns.contains(column)) {
if (required &&!missingRequiredColumns.contains(column)) {
Copy link
ContributorAuthor

@kschrabkschrabDec 3, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There's one difference between both implementations. Thereader.get(..) method injavacsv returns an empty string if the column is not existing, whereas inopencsv it returnsnull. This now leads to thisMissingColumnError for required fields instead of anEmptyFieldError. I also added therequired check here, as otherwise theMissingColumError was raised even for unrequired fields.

@kschrab
Copy link
ContributorAuthor

I double checked for several GTFS feeds, such as VBB, London, Munich, Japan, and Cairo. The only difference I found was in the number of feed errors being raised, due to a behavioral change in the get method (see other comment).

I adjusted theNOTICE.md.

@kschrab
Copy link
ContributorAuthor

I furthermore removed the unused code for writing GTFS feeds.

@kschrab
Copy link
ContributorAuthor

kschrab commentedDec 4, 2024
edited
Loading

@karussell I just noticed thatopencsv comes with several transitive dependencies from Apache Commons:commons-beanutils,commons-text,commons-collections4, andcommons-lang3.

However, the CSV parsing requires onlycommons-lang3, the others are just for mapping CSV rows directly on Java objects, which we do not need here. Therefore, I would excludecommons-beanutils,commons-text, andcommons-collections4 viapom.xml.

pom.xml Outdated
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.13.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We don't use it explicitly, so I'd keep it as a transitive dependency.

kschrab reacted with thumbs up emoji
try {
currenRecord =this.reader.readNext();
returncurrenRecord !=null;
}catch (Exceptione) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wouldn't it be better to check usinghasNext() instead of waiting for an exception?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There is nohasNext(), but I could usepeek() instead. I rewrote the whole method and it now takes care of skipping empty lines by itself now, as the LineValidator does not work as expected (see other comment).

}
}

staticclassSkipEmptyLinesimplementsLineValidator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this really skip? Looks like it would simply throw an exception on the fist occurrence of an empty line.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

From my understanding the CSVReader would skip all lines in which the LineValidator raises an exception, but I will double check this!

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Your intention was right. There is no nice way to skip empty lines in opencsv though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we even encounter such files?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In the most examples there's an empty line at the end of the file which produced that one empty record.

line filtering in opencsv not possible, we therefore read the row and check if it's produced from an empty line

currenRecord =nextRecord;
returntrue;
}catch (IOExceptione) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This error should be propagated, not quietly discarded.

@karussell
Copy link
Member

karussell commentedJan 20, 2025
edited
Loading

Thanks for the work@kschrab. The additional dependencies are indeed a bit ugly. It seemsfastcsv.org is MIT licensed and comes without dependency (they say) or is this a larger single dependency? Are there any other reasons you picked opencsv?

btw: I found the source code of javacsvhere.

@easbar
Copy link
Member

Did you consider using Apache Commons Csv instead? On theirwebsite it is stated that "There are three pre-existing BSD compatible CSV parsers which this component will hopefully make redundant (authors willing)" and one of them is Open CSV. It only depends on commons-io (which gtfs-reader already depends on) and commons-codec. These two have no further dependencies themselves.

@kschrab
Copy link
ContributorAuthor

I will check that asap! I somehow forgot about this PR 🫣

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

1 more reviewer

@otbutzotbutzotbutz requested changes

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@kschrab@karussell@easbar@otbutz

[8]ページ先頭

©2009-2025 Movatter.jp