Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Processing and Sanitizing External Excel Documents in Java

Tech 1

Fetching external spreadsheet documents requires establishing a network connection, parsing the binary stream into an in-memory structure, applying targeted data transformations, and transmitting the revised bytes back to the requester. The implementation below retrieves an Excel file from a remote URI, masks sensitive columnar information, and streams the modified artifact directly to the client without persisting temporary files on the host system.

The workflow initiates by opening a connection to the supplied resource locator. The returned input stream is immediaetly consumed by the workbook parser factory. When traversing the worksheet, iteration should skip header rows to avoid structural corruption while targeting data entries. Defensive programming practices dictate checking for null cells and validating cell types before attempting value extraction.

Sanitization routines typically rely on pattern matching against alphanumeric strings. For example, mobile number obfuscation can be achieved by replacing the central digit sequence with asterisks. This operation generates a fresh cell instance within the row, ensuring the original memory footprint remains consistent while embedding the altered content.

Response configuration demands careful attention to cross-browser compatibility. Legacy Microsoft Edge and Intenret Explorer versions require strict RFC 5987 compliance for UTF-8 filenames, utilizing percent-encoding. Modern browsers prefer ISO-8859-1 encoded byte arrays wrapped in quotation marks. Assigning the correct Content-Type header ensures the client correctly interprets the attachment as a spreadsheet document.

import org.apache.poi.ss.usermodel.*;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;

/**
 * Retrieves a remote spreadsheet, applies columnar data masking, 
 * and streams the modified file to the responding client.
 */
@GetMapping("/api/data/export-sanitized")
public void streamProcessedSpreadsheet(@RequestParam String documentUri, HttpServletResponse httpResponse) throws IOException {
    
    String targetColumnLabel = "PHONE_NUMBER";
    int targetColumnIndex = 2; // Zero-based index representing the 3rd column
    
    // Establish connection and parse workbook in a single resource-managed block
    try (InputStream remoteStream = new URL(documentUri).openStream();
         OutputStream clientStream = httpResponse.getOutputStream()) {
        
        Workbook document = WorkbookFactory.create(remoteStream);
        Sheet primaryWorksheet = document.getSheetAt(0);
        
        // Iterate through rows, skipping the header
        for (Row currentRow : primaryWorksheet) {
            if (currentRow.getRowNum() == 0) {
                continue;
            }
            
            Cell sensitiveCell = currentRow.getCell(targetColumnIndex);
            if (sensitiveCell != null && sensitiveCell.getCellType() == CellType.STRING) {
                String rawValue = sensitiveCell.getStringCellValue();
                String sanitizedValue = maskTelephoneNumber(rawValue);
                
                Cell modifiedCell = currentRow.createCell(targetColumnIndex);
                modifiedCell.setCellValue(sanitizedValue);
            }
        }
        
        // Construct safe filename based on original source path
        String originalName = documentUri.substring(documentUri.lastIndexOf("/") + 1);
        String sanitizedFilename = originalName.replaceAll("\\.[^.]+$", "_secured.xlsx");
        
        httpResponse.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
        httpResponse.setCharacterEncoding("UTF-8");
        
        // Handle legacy browser encoding requirements
        String userAgent = httpResponse.getHeader("User-Agent");
        String finalFilename = (userAgent.contains("MSIE") || userAgent.contains("Trident")) 
            ? java.net.URLEncoder.encode(sanitizedFilename, "UTF-8") 
            : new String(sanitizedFilename.getBytes("UTF-8"), "ISO-8859-1");
            
        httpResponse.setHeader("Content-Disposition", "attachment; filename=\"" + finalFilename + "\"");
        
        // Transmit revised workbook to client
        document.write(clientStream);
    }
}

/**
 * Applies a standard privacy filter to mobile number formats.
 * Preserves non-matching strings unchanged.
 */
private String maskTelephoneNumber(String inputValue) {
    Pattern mobilePattern = Pattern.compile("^1[3-9]\\d{9}$");
    Matcher validator = mobilePattern.matcher(inputValue);
    
    if (validator.find()) {
        String digits = inputValue;
        String obscured = digits.substring(0, 3) + "****" + digits.substring(7);
        return obscured;
    }
    return inputValue;
}

Resource cleanup is automatically managed through try-with-resources syntax, eliminating explicit stream closure blocks. The workbook factory handles both legacy .xls and modern .xlsx formats transparently. Error propagation should be delegated to a global exception handler in production environments to avoid leaking stack traces to end users. Proper validation of incoming URIs prevents unauthorized network access attempts.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.