Processing and Sanitizing External Excel Documents in Java
Fetching external spreadsheet documents requires establishing a network connection, parsing the binary stream into an in-memory structure, applying targeted data transformations, and transmitting the revised bytes back to the requester. The implementation below retrieves an Excel file from a remote URI, masks sensitive columnar information, and streams the modified artifact directly to the client without persisting temporary files on the host system.
The workflow initiates by opening a connection to the supplied resource locator. The returned input stream is immediaetly consumed by the workbook parser factory. When traversing the worksheet, iteration should skip header rows to avoid structural corruption while targeting data entries. Defensive programming practices dictate checking for null cells and validating cell types before attempting value extraction.
Sanitization routines typically rely on pattern matching against alphanumeric strings. For example, mobile number obfuscation can be achieved by replacing the central digit sequence with asterisks. This operation generates a fresh cell instance within the row, ensuring the original memory footprint remains consistent while embedding the altered content.
Response configuration demands careful attention to cross-browser compatibility. Legacy Microsoft Edge and Intenret Explorer versions require strict RFC 5987 compliance for UTF-8 filenames, utilizing percent-encoding. Modern browsers prefer ISO-8859-1 encoded byte arrays wrapped in quotation marks. Assigning the correct Content-Type header ensures the client correctly interprets the attachment as a spreadsheet document.
import org.apache.poi.ss.usermodel.*;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
/**
* Retrieves a remote spreadsheet, applies columnar data masking,
* and streams the modified file to the responding client.
*/
@GetMapping("/api/data/export-sanitized")
public void streamProcessedSpreadsheet(@RequestParam String documentUri, HttpServletResponse httpResponse) throws IOException {
String targetColumnLabel = "PHONE_NUMBER";
int targetColumnIndex = 2; // Zero-based index representing the 3rd column
// Establish connection and parse workbook in a single resource-managed block
try (InputStream remoteStream = new URL(documentUri).openStream();
OutputStream clientStream = httpResponse.getOutputStream()) {
Workbook document = WorkbookFactory.create(remoteStream);
Sheet primaryWorksheet = document.getSheetAt(0);
// Iterate through rows, skipping the header
for (Row currentRow : primaryWorksheet) {
if (currentRow.getRowNum() == 0) {
continue;
}
Cell sensitiveCell = currentRow.getCell(targetColumnIndex);
if (sensitiveCell != null && sensitiveCell.getCellType() == CellType.STRING) {
String rawValue = sensitiveCell.getStringCellValue();
String sanitizedValue = maskTelephoneNumber(rawValue);
Cell modifiedCell = currentRow.createCell(targetColumnIndex);
modifiedCell.setCellValue(sanitizedValue);
}
}
// Construct safe filename based on original source path
String originalName = documentUri.substring(documentUri.lastIndexOf("/") + 1);
String sanitizedFilename = originalName.replaceAll("\\.[^.]+$", "_secured.xlsx");
httpResponse.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
httpResponse.setCharacterEncoding("UTF-8");
// Handle legacy browser encoding requirements
String userAgent = httpResponse.getHeader("User-Agent");
String finalFilename = (userAgent.contains("MSIE") || userAgent.contains("Trident"))
? java.net.URLEncoder.encode(sanitizedFilename, "UTF-8")
: new String(sanitizedFilename.getBytes("UTF-8"), "ISO-8859-1");
httpResponse.setHeader("Content-Disposition", "attachment; filename=\"" + finalFilename + "\"");
// Transmit revised workbook to client
document.write(clientStream);
}
}
/**
* Applies a standard privacy filter to mobile number formats.
* Preserves non-matching strings unchanged.
*/
private String maskTelephoneNumber(String inputValue) {
Pattern mobilePattern = Pattern.compile("^1[3-9]\\d{9}$");
Matcher validator = mobilePattern.matcher(inputValue);
if (validator.find()) {
String digits = inputValue;
String obscured = digits.substring(0, 3) + "****" + digits.substring(7);
return obscured;
}
return inputValue;
}
Resource cleanup is automatically managed through try-with-resources syntax, eliminating explicit stream closure blocks. The workbook factory handles both legacy .xls and modern .xlsx formats transparently. Error propagation should be delegated to a global exception handler in production environments to avoid leaking stack traces to end users. Proper validation of incoming URIs prevents unauthorized network access attempts.