Issue
I am experimenting with Apache POI 5.2.4
and managed to read a toy Excel file using the following cut-together code (Java):
// .. get file
FileInputStream excelFile = new FileInputStream(file);
Workbook workbook = new XSSFWorkbook(excelFile);
Sheet datatypeSheet = workbook.getSheetAt(0);
Iterator<Row> iterator = datatypeSheet.iterator();
while (iterator.hasNext()) {
Row currentRow = iterator.next();
Iterator<Cell> cellIterator = currentRow.iterator();
while (cellIterator.hasNext()) {
Cell currentCell = cellIterator.next();
if (currentCell.getCellType() == CellType.FORMULA) {
// !!! Interesting part !!!
String cellLabel = currentCell.getCellFormula();
// Here I get values like "[1]Sheet1!$A$3"
// instead of "[Book1.xlsx]Sheet1!$A$3"
// !!! End interesting part !!!
}
}
}
The Excel sheet I read in contains just a cell referencing another workbook in the same folder. It was created using Excel 2019 for Mac and is correctly displaying the linked value.
Instead of the file name I put into the Excel [Book1.xlsx]Sheet1!$A$3
I only get
[1]Sheet1!$A$3
.
All other fields like values or strings are returned correctly.
I tried looking in the returned structures for currentCell
for a string mapping table but could not find anything that was not empty. How can I resolve this [1] into the original workbook name of the referenced Excel? It does not need to find or load it, just return the file name.
Solution
The formula string [1]Sheet1!$A$3
is how Excel in Office Open XML format (*.xlsx
) stores links to external workbooks in formulas. So you get what is stored in the file.
You can see that if you unzip the *.xlsx
file an have a look into /xl/worksheets/sheet*.xml
. There you will find someting like:
...
<row ...>
<c ...>
<f>[1]Sheet1!$A$3</f>
<v>...</v>
</c>
...
</row>
...
Probably this is to avoid getting formula strings too long.
The index within the brackets refers a external reference stored in /xl/workbook.xml
:
...
<externalReferences>
<externalReference r:id="rId2"/>
<externalReference r:id="rId3"/>
</externalReferences>
...
The r:id
refers a external link in /xl/workbook.xml.rels
.
This then refers a external link in /xl/externalLinks/externalLink*.xml
.
And this external link then refers to the external workbook using /xl/externalLinks/_rels/externalLink*.xml.rels
.
Apache POI provides org.apache.poi.xssf.model.ExternalLinksTable. But that also may contain DDE and/or OLE linked files and not only linked external workbooks. So the correct way is to get the externalReference
first from the workbook and then get the corresponding ExternalLinksTable
instead to get all via XSSFWorkbook.getExternalLinksTable.
Following method should do what you want if cell
is XSSFCell
.
... String getCellFormula(XSSFCell cell) {
String cellFormula = cell.getCellFormula();
java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("(\\[)(\\d*)(\\])");
java.util.regex.Matcher matcher = pattern.matcher(cellFormula);
while (matcher.find()) {
String externalWorkbookIndex = matcher.group(2); // 1 based : [1] is first
int externalReferenceIdx = Integer.valueOf(externalWorkbookIndex) - 1; // 0 based : 0 is first
XSSFWorkbook workbook = cell.getSheet().getWorkbook();
if (workbook.getCTWorkbook().getExternalReferences() != null) {
if (workbook.getCTWorkbook().getExternalReferences().getExternalReferenceList().size() > externalReferenceIdx) {
String rId = workbook.getCTWorkbook().getExternalReferences().getExternalReferenceList().get(externalReferenceIdx).getId();
if (workbook.getRelationById(rId) instanceof org.apache.poi.xssf.model.ExternalLinksTable) {
org.apache.poi.xssf.model.ExternalLinksTable externalLinksTable =
(org.apache.poi.xssf.model.ExternalLinksTable)workbook.getRelationById(rId);
String referencedFileName = externalLinksTable.getLinkedFileName();
String match = matcher.group(1) + matcher.group(2) + matcher.group(3);
String replacement = matcher.group(1) + referencedFileName + matcher.group(3);
cellFormula = cellFormula.replace(match, replacement);
}
}
}
}
return cellFormula;
}
Answered By - Axel Richter
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.