Manipulating Excel Files with Apache POI in Java
Handling srpeadsheet data programmatically in Java is most efficiently achieved using the Apache POI library. To support both legacy .xls and modern .xlsx formats, include the following Maven dependency in your project configuration:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.3</version>
</dependency>
Workbook API Architecture
Apache POI separates Excel handling based on file versions and memory requirements:
HSSFWorkbook: Targets older Excel 97-2003 files (.xls).XSSFWorkbook: Targets Excel 2007+ files (.xlsx).SXSSFWorkbook: A low-memory streaming implementation of XSSF. Standard XSSF structures consume excessive heap space and will throw errors when processing datasets exceeding approximate 65,000 rows.WorkbookFactory: An abstract factory that inspects the file signature and automatically returns the appropriateWorkbookinstance, eliminating manual format detection.
Reading Spreadsheet Data
Extraction follows a hierarchical traversal: load the workbook, select a sheet, iterate through row indices, and extract cell values. Always verify cell existence and utilize safe type conversion to avoid runtime casting exceptions.
Data Extraction Implementation
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class ExcelDataSource {
public static void main(String[] args) {
String targetPath = "C:/data/inventory.xlsx";
List<String> parsedRecords = new ArrayList<>();
try (FileInputStream stream = new FileInputStream(new File(targetPath))) {
Workbook spreadsheet = WorkbookFactory.create(stream);
Sheet activeSheet = spreadsheet.getSheetAt(0);
DataFormatter valueFormatter = new DataFormatter();
int startRow = activeSheet.getFirstRowNum();
int endRow = activeSheet.getLastRowNum();
for (int r = startRow; r <= endRow; r++) {
Row currentLine = activeSheet.getRow(r);
if (currentLine != null) {
int startCol = currentLine.getFirstCellNum();
int endCol = currentLine.getLastCellNum();
for (int c = startCol; c < endCol; c++) {
Cell targetCell = currentLine.getCell(c, Row.MissingCellPolicy.RETURN_BLANK_AS_NULL);
if (targetCell != null) {
// Automatically handles numeric, date, and boolean types
String cellText = valueFormatter.formatCellValue(targetCell);
parsedRecords.add(cellText);
}
}
}
}
parsedRecords.forEach(System.out::println);
} catch (IOException | InvalidFormatException e) {
System.err.println("Spreadsheet read operation failed: " + e.getMessage());
}
}
}
Writing Spreadsheet Data
Document generation requires instantiating workbook and sheet objects, mapping data to specific coordinates, and flushing the in-memory structure to a file output stream. Proper stream closure is critical to prevent file corruption or handle leaks.
Data Export Implementation
import org.apache.poi.ss.usermodel.*;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
public class ExcelDataSink {
public static void main(String[] args) {
String outputPath = "C:/data/report.xlsx";
try (Workbook targetBook = new XSSFWorkbook();
FileOutputStream outputStream = new FileOutputStream(new File(outputPath))) {
Sheet outputSheet = targetBook.createSheet("ProcessedReport");
Row insertRow = outputSheet.createRow(2);
Cell dataCell = insertRow.createCell(3);
dataCell.setCellValue("Automated Report Entry");
targetBook.write(outputStream);
} catch (IOException e) {
System.err.println("Export stream error: " + e.getMessage());
}
}
}
Development Best Practices
- Dimension Calculation: Avoid
getPhysicalNumberOfRows()andgetPhysicalNumberOfCells()for boundary detection. These methods ignore empty slots, resulting in truncated loops. UsegetLastRowNum()andgetLastCellNum()combined with null checks to safely handle sparse datasets. - Type Safety: Directly invoking
getStringCellValue()on numeric or boolean cells triggers anIllegalStateException. ImplementDataFormatteror a custom type-switching mechanism to guarantee stable extraction across mixed cell formats. - Resource Lifecycle: Always manage
WorkbookandOutputStreaminstances via try-with-resources or explicitfinallyblocks. Unclosed streams frequently causeFileLockexceptions in subsequent operations.