- Notifications
You must be signed in to change notification settings - Fork135
🍬 A library converting XLS and XLSX files to a list of Java objects based on Apache POI
License
ozlerhakan/poiji
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Poiji is a tiny thread-safe Java library that provides one way mapping from Excel sheets to Java classes.In a way it lets us convert each row of the specified excel data into Java objects.Poiji usesApache Poi (the Java API for Microsoft Documents) under the hood to fulfill the mapping process.
- Getting Started
- Usage
- Documentation
- Annotations
- Prefer Default Value
- Sheet Name Option
- Protected Excels
- Annotation ExcelCellName
- Super Class Inheritance
- Annotation ExcelCellsJoinedByName
- ExcelCellRange Annotation
- Support Consumer Interface
- Custom Casting Implementation
- Parse UnknownCells
- Optional Mandatory Headers and Cells
- Debug Cells Formats
- Modify Cells Formats
- Read Excel Properties
- Disable Cells Formats
- Create Custom Formatting
- Poi Sheet Support
- Update Default Locale
- License
In your Maven/Gradle project, first add the corresponding dependency:
<dependency> <groupId>com.github.ozlerhakan</groupId> <artifactId>poiji</artifactId> <version>4.7.0</version></dependency>
You can find the latest and earlier development versions including javadoc and source files onSonatypes OSS repository.
Poiji.fromExcel
Structurecom.poiji.bind.Poiji#fromExcel(java.io.File, java.lang.Class<T>)com.poiji.bind.Poiji#fromExcel(java.io.File, java.lang.Class<T>, java.util.function.Consumer<? super T>)com.poiji.bind.Poiji#fromExcel(java.io.File, java.lang.Class<T>, com.poiji.option.PoijiOptions)com.poiji.bind.Poiji#fromExcel(java.io.File, java.lang.Class<T>, com.poiji.option.PoijiOptions, java.util.function.Consumer<? super T>)com.poiji.bind.Poiji#fromExcel(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>)com.poiji.bind.Poiji#fromExcel(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>, java.util.function.Consumer<? super T>)com.poiji.bind.Poiji#fromExcel(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>, com.poiji.option.PoijiOptions)com.poiji.bind.Poiji#fromExcel(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>, com.poiji.option.PoijiOptions, java.util.function.Consumer<? super T>)com.poiji.bind.Poiji#fromExcel(org.apache.poi.ss.usermodel.Sheet, java.lang.Class<T>)com.poiji.bind.Poiji#fromExcel(org.apache.poi.ss.usermodel.Sheet, java.lang.Class<T>, com.poiji.option.PoijiOptions)com.poiji.bind.Poiji#fromExcel(org.apache.poi.ss.usermodel.Sheet, java.lang.Class<T>, com.poiji.option.PoijiOptions, java.util.function.Consumer<? super T>)com.poiji.bind.Poiji#fromExcelProperties(java.io.File, java.lang.Class<T>)com.poiji.bind.Poiji#fromExcelProperties(java.io.File, java.lang.Class<T>, com.poiji.option.PoijiOptions)com.poiji.bind.Poiji#fromExcelProperties(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>)com.poiji.bind.Poiji#fromExcelProperties(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>, com.poiji.option.PoijiOptions)
PoijiOptions.PoijiOptionsBuilder
Structurecom.poiji.option.PoijiOptions.PoijiOptionsBuilder #settings() #build() #dateLenient(boolean) #dateRegex(String) #datePattern(String) #dateTimeFormatter(java.time.format.DateTimeFormatter) #ignoreHiddenSheets(boolean) #password(String) #preferNullOverDefault(boolean) #settings(int) #sheetIndex(int) #skip(int) #limit(int) #trimCellValue(boolean) #headerStart(int) #withCasting(Casting) #withFormatting(Formatting) #caseInsensitive(boolean) #ignoreWhitespaces(boolean) #poijiNumberFormat(PoijiNumberFormat) #poijiLogCellFormat(PoijiLogCellFormat) #disableXLSXNumberCellFormat() #addListDelimiter(String) #setLocale(java.util.Locale) #rawData(boolean)
Here are the list of features with examples that the latest version of Poiji supports.
Create your object model:
publicclassEmployee {@ExcelRow (1)privateintrowIndex;@ExcelCell(0) (2)privatelongemployeeId; (3)@ExcelCell(1)privateStringname;@ExcelCell(2)privateStringsurname;@ExcelCell(3)privateintage;@ExcelCell(4)privatebooleansingle;@ExcelCellName("emails") (4)List<String>emails;@ExcelCell(5)List<BigDecimal>bills;//no need getters/setters to map excel cells to fields}
Optionally, we can access the index of each row item by using the
ExcelRow
annotation. Annotated variable should be of typeint
,double
,float
orlong
.A field must be annotated with
@ExcelCell
along with its property in order to get the value from the right coordinate in the target excel sheet.An annotated field can be either protected, private or public modifier. The field may be either of
boolean
,int
,long
,float
,double
, or their wrapper classes. You can add a field ofjava.util.Date
,java.time.LocalDate
,java.time.LocalDateTime
andString
as well.If one column contains multiple value, you can get them using a List field. A List field can store items which is of type
BigDecimal
,Long
,Double
,Float
,Integer
,Boolean
andString
.
This is the excel file (employees.xlsx
) we want to map to a list ofEmployee
instance:
ID | NAME | SURNAME | AGE | SINGLE | BILLS | EMAILS |
---|---|---|---|---|---|---|
123923 | Joe | Doe | 30 | TRUE | 123,10;99.99 | |
123123 | Sophie | Derue | 20 | TRUE | 1022 | |
135923 | Paul | Raul | 31 | FALSE | 73,25;70 |
The snippet below shows how to obtain the excel data usingPoiji
.
PoijiOptionsoptions =PoijiOptions.PoijiOptionsBuilder.settings() .addListDelimiter(";") (1) .build();List<Employee>employees =Poiji.fromExcel(newFile("employees.xls"),Employee.class,options);// alternativelyInputStreamstream =newFileInputStream(newFile("employees.xls"))List<Employee>employees =Poiji.fromExcel(stream,PoijiExcelType.XLS,Employee.class,options);employees.size();// 3EmployeefirstEmployee =employees.get(0);// Employee{rowIndex=1, employeeId=123923, name='Joe', surname='Doe', age=30, single=true, emails=[joe@doe.com, joedoe@gmail.com], biils=[123,10, 99.99]}
By default the delimiter/separator is
,
to split items in a cell. There is an option to change this behavior. Since we use;
between items, we need to tell Poiji to use;
as a separator.
By default, Poiji ignores the header row of the excel data. If you want to ignore the first row of data, you need to usePoijiOptions
.
PoijiOptionsoptions =PoijiOptionsBuilder.settings(1).build();// we eliminate Joe Doe.List<Employee>employees =Poiji.fromExcel(newFile("employees.xls"),Employee.class,options);EmployeefirstEmployee =employees.get(0);// Employee{rowIndex=2, employeeId=123123, name='Sophie', surname='Derue', age=20, single=true, emails=[sophie.derue@gmail.com, sophie@derue.com], biils=[1022]}
By default, Poiji selects the first sheet of an excel file. You can override this behaviour like below:
PoijiOptionsoptions =PoijiOptionsBuilder.settings() .sheetIndex(1) (1) .build();
Poiji should look at the second (zero-based index) sheet of your excel file.== Documentation
If you want a date field to returnnull
rather than a default date, usePoijiOptionsBuilder
with thepreferNullOverDefault
method as follows:
PoijiOptionsoptions =PoijiOptionsBuilder.settings() .preferNullOverDefault(true) (1) .build();
a field that is of type either
java.util.Date
,Float
,Double
,Integer
,Long
orString
will have anull
value.
Poiji allows specifying the sheet name using annotation
@ExcelSheet("Sheet2") (1)publicclassStudent {@ExcelCell(0)privateStringname;@ExcelCell(1)privateStringid;@ExcelCell(2)privateStringphone;@OverridepublicStringtoString() {return"Student {" +" name=" +name +",pl-s1">id +"'" +", phone='" +phone +"'" +'}'; }}
With the
ExcelSheet
annotation we are configuring the name of the sheet to read data from. The other sheets will be ignored.
Consider that your excel file is protected with a password, you can define the password viaPoijiOptionsBuilder
to read rows:
PoijiOptions options = PoijiOptionsBuilder.settings() .password("1234") .build();List<Employee> employees = Poiji.fromExcel(new File("employees.xls"), Employee.class, options);
UsingExcelCellName
, we can read the values by column names directly.
publicclassPerson {@ExcelCellName("Name") (1)protectedStringname;@ExcelCellName("Address")protectedStringaddress;@ExcelCellName("Age")protectedintage;@ExcelCellName("Email")protectedStringemail;@ExcelCellName(value ="",expression ="Surname|Second name") (2)privateStringsurname;}
We need to specify the
name
of the column for which the corresponding value is looked. By default,@ExcelCellName
is case-sensitive and the excel file should’t contain duplicated column names. However, you can manipulate this feature usingPoijiOptionsBuilder#caseInsensitive(boolean)
and you can ignore white spaces usingPoijiOptionsBuilder#ignoreWhitespaces(boolean)
.In rare situations a column can have synonyms, especially when the column was renamed and backward compatibility is needed.Here we specify the
expression
leaving the columnname
empty.
For example, here is the excel (person.xls
) file we want to use:
Name | Address | Age | Surname | |
---|---|---|---|---|
Joe | San Francisco, CA | 30 | Doe | |
Sophie | Costa Mesa, CA | 20 | Doe |
List<Person>people =Poiji.fromExcel(newFile("person.xls"),Person.class);people.size();// 2Personperson =people.get(0);// Joe// San Francisco, CA// 30// joe@doe.com// Doe
Given that the first column always stands for the names of people, you’re able to combine theExcelCell
annotation withExcelCellName
in your object model:
publicclassPerson {@ExcelCell(0)protectedStringname;@ExcelCellName("Address")protectedStringaddress;@ExcelCellName("Age")protectedintage;@ExcelCellName("Email")protectedStringemail;}
Your object model may be derived from a super class:
publicabstractclassVehicle {@ExcelCell(0)protectedStringname;@ExcelCell(1)protectedintyear;}publicclassCarextendsVehicle {@ExcelCell(2)privateintnOfSeats;}
and you want to map the table (car.xlsx
) below to Car objects:
NAME | YEAR | SEATS |
---|---|---|
Honda Civic | 2017 | 4 |
Chevrolet Corvette | 2017 | 2 |
Using Poiji, you can map the annotated field(s) of super class(es) of the target class like so:
List<Car>cars =Poiji.fromExcel(newFile("cars.xls"),Car.class);cars.size();// 2Carcar =cars.get(0);// Honda Civic// 2017// 4
UsingExcelCellsJoinedByName
we can read columns which name meets same regular expression. Values will be combined as a multi valued map.
Please pay attention the variable must be initialized explicitly.
publicclassAlbum {@ExcelCellsJoinedByName(expression ="Artist") (1)privateMultiValuedMap<String,String>artists =newArrayListValuedHashMap<>();@ExcelCellsJoinedByName(expression ="Track[0-9]+") (2)privateMultiValuedMap<String,String>tracks =newArrayListValuedHashMap<>();}
Here we map multiple columns with
name
Artist.Here we map multiple columns with
name
Track1,Track2,Track3, etc.
For example, here is the excel (album.xls
) file we want to use:
Artist | Artist | Artist | Track1 | Track2 |
---|---|---|---|---|
Michael Jackson | Lionel Richie | Stevie Wonder | We are the World | We are the World (instrumental) |
artist 1 | artist 1 | artist 1 | track 1 | track 1 |
List<Album>albums =Poiji.fromExcel(newFile("album.xls"),Album.class);albums.size();// 2Albumalbum1 =albums.get(0);// artists = { Artist = [Michael Jackson, Lionel Richie, Stevie Wonder] }// tracks = { Track1 = [We are the World], Track2 = [We are the World (instrumental)] }Albumalbum2 =albums.get(1);// artists = {Artist = [artist 1, artist 1, artist 1] }// tracks = {Track2 = [track 1], Track1=[track 1] }
Json presentation foralbum1
will be as follows
{"artists": {"Artist": ["Michael Jackson","Lionel Richie","Stevie Wonder" ] },"tracks": {"Track1": ["We are the World" ],"Track2": ["We are the World (instrumental)" ] }}
Consider you have a table like below:
No. | Personal Information | Credit Card Information | ||||||
Name | Age | City | State | Zip Code | Card Type | Last 4 Digits | Expiration Date | |
1 | John Doe | 21 | Vienna | Virginia | 22349 | VISA | 1234 | Jan-21 |
2 | Jane Doe | 28 | Greenbelt | Maryland | 20993 | MasterCard | 2345 | Jun-22 |
3 | Paul Ryan | 19 | Alexandria | Virginia | 22312 | JCB | 4567 | Oct-24 |
TheExcelCellRange
annotation lets us aggregate a range of information in one object model. In this case, we collect the data inPersonCreditInfo
plus details of the person inPersonInfo
and for the credit card inCardInfo
:
publicclassPersonCreditInfo {@ExcelCellName("No.")privateIntegerno;@ExcelCellRangeprivatePersonInfopersonInfo;@ExcelCellRangeprivateCardInfocardInfo;publicstaticclassPersonInfo {@ExcelCellName("Name")privateStringname;@ExcelCellName("Age")privateIntegerage;@ExcelCellName("City")privateStringcity;@ExcelCellName("State")privateStringstate;@ExcelCellName("Zip Code")privateStringzipCode; }publicstaticclassCardInfo {@ExcelCellName("Card Type")privateStringtype;@ExcelCellName("Last 4 Digits")privateStringlast4Digits;@ExcelCellName("Expiration Date")privateStringexpirationDate; }}
Using the conventional way, we can retrieve the data usingPoiji.fromExcel
:
PoijiOptionsoptions =PoijiOptions.PoijiOptionsBuilder.settings().headerCount(2).build();List<PersonCreditInfo>actualPersonalCredits =Poiji.fromExcel(newFile(path),PersonCreditInfo.class,options);PersonCreditInfopersonCreditInfo1 =actualPersonalCredits.get(0);PersonCreditInfo.PersonInfoexpectedPerson1 =personCreditInfo1.getPersonInfo();PersonCreditInfo.CardInfoexpectedCard1 =personCreditInfo1.getCardInfo();
Poiji supports Consumer Interface. As@fmarazita explained the usage, there are several benefits of having a Consumer:
Huge excel file ( without you have all in memory)
Run time processing/filtering data
DB batch insertion
For example, we have a Calculation entity class and want to insert each row into a database while retrieving:
classCalculation {@ExcelCell(0)Stringname@ExcelCell(1) inta@ExcelCell(2) intbpublic intgetA(){returna; }publicintgetB(){returnb; }publicintgetName(){returnname; }}
FilefileCalculation =newFile(example.xlsx);PoijiOptionsoptions =PoijiOptionsBuilder.settings().sheetIndex(1).build();Poiji.fromExcel(fileCalculation,Calculation.class,options,this::dbInsertion);privatevoiddbInsertion(CalculationsiCalculation) {intvalue=siCalculation.getA() +siCalculation.getB();Stringname =siCalculation.getName();insertDB(name ,value);}
You can create your own casting implementation without relying on the default Poiji casting configuration using theCasting
interface.
publicclassMyCastingimplementsCasting {@OverridepublicObjectcastValue(Class<?>fieldType,Stringvalue,PoijiOptionsoptions) {returnvalue.trim(); }}publicclassPerson {@ExcelCell(0)protectedStringemployeeId;@ExcelCell(1)protectedStringname;@ExcelCell(2)protectedStringsurname;}
Then you can add your custom implementation with thewithCasting
method:
PoijiOptionsoptions =PoijiOptions.PoijiOptionsBuilder.settings() .withCasting(newMyCasting()) .build();List<Person>people =Poiji.fromExcel(excel,Person.class,options);
You can annotate aMap<String, String>
with@ExcelUnknownCells
to parse all entries,which are not mapped in any other way (for example by index or by name).
This is our object model:
publicclassMusicTrack {@ExcelCellName("ID")privateStringemployeeId;@ExcelCellName("AUTHOR")privateStringauthor;@ExcelCellName("NAME")privateStringname;@ExcelUnknownCellsprivateMap<String,String>unknownCells;}
This is the excel file we want to parse:
ID | AUTHOR | NAME | ENCODING | BITRATE |
---|---|---|---|---|
123923 | Joe Doe | The example song | mp3 | 256 |
56437 | Jane Doe | The random song | flac | 1500 |
The object corresponding to the first row of the excel sheet then has a map with{ENCODING=mp3, BITRATE=256}
and the one for the second row has{ENCODING=flac, BITRATE=1500}
.
Note that If you use thePoijiOptionsBuilder#caseInsensitive(true)
option, the ExcelUnknownCells map will be parsed with lowercase.
As of 4.0.0,@ExcelCellName
and@ExcelCell
are given to providemandatoryHeader
andmandatoryCell
fields, meaning we can be more specific by declaring rules in headers or cells or both in excel files.mandatoryHeader
expects the labeled excel column exists in a given excel file, whilemandatoryCell
looks into the values of the mentioned header/column. If any of the options is active and your excel breaks the rule, either aHeaderMissingException
orPoijiMultiRowException
will be thrown accordingly.
// default behavior@ExcelCellName(value = "COLUMN_NAME", mandatoryHeader = false, mandatoryCell = false)String fieldName;// default behavior@ExcelCell(value = COLUMN_INDEX, mandatoryHeader = false, mandatoryCell = false)String fieldName;
Note | The |
Note | The |
We can observe each cell format of a given excel file. Assume that we have an excel file like below:
Date |
12/31/2020 12.00 AM |
We can get all the list of cell formats usingPoijiLogCellFormat
withPoijiOptions
:
PoijiLogCellFormat log = new PoijiLogCellFormat();PoijiOptions options = PoijiOptions.PoijiOptionsBuilder.settings() .poijiCellFormat(log) .build();List<Model> dates = Poiji.fromExcel(stream, poijiExcelType, Model.class, options);Model model = rows.get(0)model.getDate();// 12.00
Hmm, It looks like we did not achieve the correct date format since we get the date value as (12.00
). Let’s see how internally the excel file is being parsed viaPoijiLogCellFormat
:
List<InternalCellFormat> formats = log.formats();InternalCellFormat cell10 = formats.get(1);cell10.getFormatString()// mm:ss.0cell10.getFormatIndex()// 47
Now that we know the reason of why we don’t see the expected date value, it’s because the default format of the date cell is themm:ss.0
format with a given index 47, we need to change the default format of index (i.e.47
). This format was automatically assigned to the cell having a number, but almost certainly with a special style or format. Note that this option should be used for debugging purpose only.
We can change the default format of a cell usingPoijiNumberFormat
. RecallDebug Cells Formats
, we are unable to see the correct cell format, what’s more the excel file uses another format which we do not want to.
Date |
12/31/2020 12.00 AM |
UsingPoijiNumberFormat
option, we are able to change the behavior of the format of a specific index:
PoijiNumberFormat numberFormat = new PoijiNumberFormat();numberFormat.putNumberFormat((short) 47, "mm/dd/yyyy hh.mm aa");PoijiOptions options = PoijiOptions.PoijiOptionsBuilder.settings() .poijiNumberFormat(numberFormat) .build();List<Model> rows = Poiji.fromExcel(stream, poijiExcelType, Model.class, options);Model model = rows.get(0)model.getDate();// 12/31/2020 12.00 AM(1)
Voila!
We know that the index 47 uses the formatmm:ss.0
by default in the given excel file, thus we’re able to override its format withmm/dd/yyyy hh.mm aa
using theputNumberFormat
method.
It is possible to read excel properties from xlsx files. To achieve that, create a class with fields annotated with@ExcelProperty
.
Example:
publicclassExcelProperties {@ExcelPropertyprivateStringtitle;@ExcelPropertyprivateStringcustomProperty;}
The field name corresponds to the name of the property inside the Excel file.To use a different one than the field name, you can specify apropertyName
(e.g.@ExcelProperty(propertyName = "customPropertyName")
)
The list of built-in (e.g. non-custom) properties in an Excel file, which can be read by Poiji can be found in the classDefaultExcelProperties
.
Poiji can only read Text properties from an Excel file, so you have to use aString
to read them.This does not apply to "modified", "lastPrinted" and "created", which are deserialized into aDate
.
Consider we have a xls or xlsx excel file like below:
Amount |
25,00 |
(50,00) |
(65,00) |
Since we use a cell format on line 4 and 5 (i.e.(50,00)
and(65,00)
), we don’t want to see the formatted value of each cell after processing. In order to do that, we can use@DisableCellFormatXLS
on a field if the file ends withxls
ordisableXLSXNumberCellFormat()
for xlsx files usingPoijiOptions
.
public class TestInfo { @ExcelCell(0) @DisableCellFormatXLS(1) public BigDecimal amount;}
we only disable cell formats on the specified column.
public class TestInfo { @ExcelCell(0) private BigDecimal amount;}PoijiOptions options = PoijiOptions.PoijiOptionsBuilder.settings() .disableXLSXNumberCellFormat()(1) .build();
when disablingnumber cell format, we disable it in the entire cells for xlsx files.
and let Poiji ignores the cell formats:
List<TestInfo> result = Poiji.fromExcel(new File(path), TestInfo.class, options);(1)result.get(1).amount// -50
Add
options
, if your excel is xlsx file.
You can create your own formatting implementation without relying on the default Poiji formatting configuration using theFormatting
interface.
publicclassMyFormattingimplementsFormatting {@OverridepublicStringtransform(PoijiOptionsoptions,Stringvalue) {returnvalue.toUpperCase().trim(); (1) }}publicclassPerson {@ExcelCellName("ID")protectedStringemployeeId;@ExcelCellName("NAME")protectedStringname;@ExcelCellName("SURNAME")protectedStringsurname;}
Suppose that all the header names of an excel file have different formatting. Using custom formatting, we are able to look at headers with a custom format. All the headers will be uppercase and don’t have white spaces before and after.
Then you can add your custom implementation with thewithFormatting
method:
PoijiOptionsoptions =PoijiOptions.PoijiOptionsBuilder.settings() .withFormatting(newMyFormatting()) .build();List<Person>people =Poiji.fromExcel(excel,Person.class,options);
Poiji accepts excel records via Poi Sheet object as well:
Filefile =newFile("/tmp/file.xlsx");FileInputStreamfileInputStream =newFileInputStream(file);Workbookworkbook =newXSSFWorkbook(fileInputStream);Sheetsheet =workbook.getSheetAt(0);List<Model>result =Poiji.fromExcel(sheet,Model.class);
For parsing numbers and dates java.lang.Locale is used. Also Apache Poi uses the Locale for parsing.As default, Poij uses Locale.US irrespective of Locale used on the running system. If you want to change thatyou can use a option to pass the Locale to be used like shown below.
In this example the Jvm default locale is used. Beware that if your code run’s on a other Jvm with another Locale set as default parsing could give different results. Better is to use a fixed locale.Also be aware of differences how Locales behave between Java 8 and 9+. For example AM/PM in Locale.GERMANY is displayed as AM/PM in Java 8 but Vorn./Nam. in Java 9 or higher.This is due to the changes in Java 9. SeeJEP-252 for more details.
PoijiOptionsoptions =PoijiOptions.PoijiOptionsBuilder.settings() .setLocale(Locale.getDefault()) .build();
About
🍬 A library converting XLS and XLSX files to a list of Java objects based on Apache POI