Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Manipulating Excel Files with Apache POI in Java

Tech May 9 4

Handling srpeadsheet data programmatically in Java is most efficiently achieved using the Apache POI library. To support both legacy .xls and modern .xlsx formats, include the following Maven dependency in your project configuration:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.3</version>
</dependency>

Workbook API Architecture

Apache POI separates Excel handling based on file versions and memory requirements:

  • HSSFWorkbook: Targets older Excel 97-2003 files (.xls).
  • XSSFWorkbook: Targets Excel 2007+ files (.xlsx).
  • SXSSFWorkbook: A low-memory streaming implementation of XSSF. Standard XSSF structures consume excessive heap space and will throw errors when processing datasets exceeding approximate 65,000 rows.
  • WorkbookFactory: An abstract factory that inspects the file signature and automatically returns the appropriate Workbook instance, eliminating manual format detection.

Reading Spreadsheet Data

Extraction follows a hierarchical traversal: load the workbook, select a sheet, iterate through row indices, and extract cell values. Always verify cell existence and utilize safe type conversion to avoid runtime casting exceptions.

Data Extraction Implementation

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class ExcelDataSource {
    public static void main(String[] args) {
        String targetPath = "C:/data/inventory.xlsx";
        List<String> parsedRecords = new ArrayList<>();

        try (FileInputStream stream = new FileInputStream(new File(targetPath))) {
            Workbook spreadsheet = WorkbookFactory.create(stream);
            Sheet activeSheet = spreadsheet.getSheetAt(0);
            DataFormatter valueFormatter = new DataFormatter();

            int startRow = activeSheet.getFirstRowNum();
            int endRow = activeSheet.getLastRowNum();

            for (int r = startRow; r <= endRow; r++) {
                Row currentLine = activeSheet.getRow(r);
                if (currentLine != null) {
                    int startCol = currentLine.getFirstCellNum();
                    int endCol = currentLine.getLastCellNum();

                    for (int c = startCol; c < endCol; c++) {
                        Cell targetCell = currentLine.getCell(c, Row.MissingCellPolicy.RETURN_BLANK_AS_NULL);
                        if (targetCell != null) {
                            // Automatically handles numeric, date, and boolean types
                            String cellText = valueFormatter.formatCellValue(targetCell);
                            parsedRecords.add(cellText);
                        }
                    }
                }
            }
            parsedRecords.forEach(System.out::println);
        } catch (IOException | InvalidFormatException e) {
            System.err.println("Spreadsheet read operation failed: " + e.getMessage());
        }
    }
}

Writing Spreadsheet Data

Document generation requires instantiating workbook and sheet objects, mapping data to specific coordinates, and flushing the in-memory structure to a file output stream. Proper stream closure is critical to prevent file corruption or handle leaks.

Data Export Implementation

import org.apache.poi.ss.usermodel.*;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

public class ExcelDataSink {
    public static void main(String[] args) {
        String outputPath = "C:/data/report.xlsx";

        try (Workbook targetBook = new XSSFWorkbook();
             FileOutputStream outputStream = new FileOutputStream(new File(outputPath))) {

            Sheet outputSheet = targetBook.createSheet("ProcessedReport");
            Row insertRow = outputSheet.createRow(2);
            Cell dataCell = insertRow.createCell(3);

            dataCell.setCellValue("Automated Report Entry");

            targetBook.write(outputStream);
        } catch (IOException e) {
            System.err.println("Export stream error: " + e.getMessage());
        }
    }
}

Development Best Practices

  • Dimension Calculation: Avoid getPhysicalNumberOfRows() and getPhysicalNumberOfCells() for boundary detection. These methods ignore empty slots, resulting in truncated loops. Use getLastRowNum() and getLastCellNum() combined with null checks to safely handle sparse datasets.
  • Type Safety: Directly invoking getStringCellValue() on numeric or boolean cells triggers an IllegalStateException. Implement DataFormatter or a custom type-switching mechanism to guarantee stable extraction across mixed cell formats.
  • Resource Lifecycle: Always manage Workbook and OutputStream instances via try-with-resources or explicit finally blocks. Unclosed streams frequently cause FileLock exceptions in subsequent operations.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.