- Notifications
You must be signed in to change notification settings - Fork1.8k
refactor(gtfs-reader): replace javacsv with opencsv#3084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
karussell commentedDec 2, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Thanks! Did you test the changes already with a few GTFS files?
Then we could remove it for GraphHopper here? (you can keep it of course in this PR) Can you also adapt NOTICE.md? |
| Stringstr =reader.get(column); | ||
| if (str ==null) { | ||
| if (!missingRequiredColumns.contains(column)) { | ||
| if (required &&!missingRequiredColumns.contains(column)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There's one difference between both implementations. Thereader.get(..) method injavacsv returns an empty string if the column is not existing, whereas inopencsv it returnsnull. This now leads to thisMissingColumnError for required fields instead of anEmptyFieldError. I also added therequired check here, as otherwise theMissingColumError was raised even for unrequired fields.
kschrab commentedDec 3, 2024
I double checked for several GTFS feeds, such as VBB, London, Munich, Japan, and Cairo. The only difference I found was in the number of feed errors being raised, due to a behavioral change in the get method (see other comment). I adjusted the |
kschrab commentedDec 4, 2024
I furthermore removed the unused code for writing GTFS feeds. |
kschrab commentedDec 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@karussell I just noticed that However, the CSV parsing requires only |
pom.xml Outdated
| <dependency> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-lang3</artifactId> | ||
| <version>3.13.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We don't use it explicitly, so I'd keep it as a transitive dependency.
Uh oh!
There was an error while loading.Please reload this page.
| try { | ||
| currenRecord =this.reader.readNext(); | ||
| returncurrenRecord !=null; | ||
| }catch (Exceptione) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Wouldn't it be better to check usinghasNext() instead of waiting for an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There is nohasNext(), but I could usepeek() instead. I rewrote the whole method and it now takes care of skipping empty lines by itself now, as the LineValidator does not work as expected (see other comment).
| } | ||
| } | ||
| staticclassSkipEmptyLinesimplementsLineValidator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does this really skip? Looks like it would simply throw an exception on the fist occurrence of an empty line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
From my understanding the CSVReader would skip all lines in which the LineValidator raises an exception, but I will double check this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Your intention was right. There is no nice way to skip empty lines in opencsv though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we even encounter such files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In the most examples there's an empty line at the end of the file which produced that one empty record.
line filtering in opencsv not possible, we therefore read the row and check if it's produced from an empty line
| currenRecord =nextRecord; | ||
| returntrue; | ||
| }catch (IOExceptione) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This error should be propagated, not quietly discarded.
karussell commentedJan 20, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Thanks for the work@kschrab. The additional dependencies are indeed a bit ugly. It seemsfastcsv.org is MIT licensed and comes without dependency (they say) or is this a larger single dependency? Are there any other reasons you picked opencsv? btw: I found the source code of javacsvhere. |
easbar commentedJan 20, 2025
Did you consider using Apache Commons Csv instead? On theirwebsite it is stated that "There are three pre-existing BSD compatible CSV parsers which this component will hopefully make redundant (authors willing)" and one of them is Open CSV. It only depends on commons-io (which gtfs-reader already depends on) and commons-codec. These two have no further dependencies themselves. |
kschrab commentedFeb 28, 2025
I will check that asap! I somehow forgot about this PR 🫣 |
This PR exchanges the library
net.sourceforge.javacsv:javacsvwithcom.opencsv:opencsvwhich is used to parse CSV files from GTFS.The motivation for this change is, that
javacsvwas released 16 years ago, its source code cannot be found anymore, and it was distributed under LGPL, whereas opencsv is Apache 2.0.For the start I just wrote a wrapper to preserve the old API of the CsvReader from javacsv, and then used the new CSVReader from opencsv within that wrapper. There's also code for writing CSV files, which is not used anywhere though.