Automated Preprocessing and Character Segmentation for Ancient Oracle Bone Script Rubbings
Problem 1: Oracle Bone Script Image Preprocessing and Feature Extraction
The initial challenge involves developing a robust preprocessing pipeline for ancient oracle bone script rubbings. These historical artifacts contain significant degradation including speckle noise, artificial textures, and intrinsic surface patterns that must be eliminated before analysis.
The preprocessing workflow comprises several sequential operations:
1.1 Noise Reduction
Median filtering effectively suppresses salt-and-pepper artifacts while preserving edge information. For a kernel window W of size n×n, the filtered pixel value at position (i,j) is computed as:
μij = median{Ipq | p,q ∈ Wij}
where Ipq represents the intensity value at coordinates (p,q).
1.2 Illumination Normalization
Adaptive histogram equalization compensates for non-uniform lighting conditions across the rubbing surface. The transformation functon T maps input intensity values to enhance local contrast:
T(Iij) = α·(Iij - μlocal) + β
where μlocal denotes the local mean intensity and α, β are scaling parameters.
1.3 Binarization
Otsu's method automatically determines the optimal threshold θ* by maximizing inter-class variance:
θ* = argmaxθ [ω0(θ)ω1(θ)(μ0(θ) - μ1(θ))2]
The binary image B is then obtained via thresholding: Bij = 255 if Iij > θ*, otherwise 0.
1.4 Morphological Cleaning
Opening operations remove small connected components while closing fills gaps within characters. The opening of image B by structuring element S is defined as:
B ∘ S = (B ⊖ S) ⊕ S
where ⊖ and ⊕ denote erosion and dilasion respectively.
1.5 Feature Representation
After preprocessing, we extract Histogram of Oriented Gradients (HOG) features. The gradient magnitude and orientation at each pixel are:
Gx = ∂I/∂x, Gy = ∂I/∂y
|G| = √(Gx2 + Gy2), φ = arctan(Gy/Gx)
These gradients are accumulated into orientation histograms within spatial cells to form the final feature descriptor.
1.6 Implementation Framework
import cv2
import numpy as np
class OraclePreprocessor:
def __init__(self, binary_threshold=127, blur_kernel=(7,7)):
self.binary_threshold = binary_threshold
self.blur_kernel = blur_kernel
def process(self, input_image):
# Convert to grayscale colorspace
grayscale = cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)
# Apply adaptive thresholding for better results on variable lighting
binary = cv2.adaptiveThreshold(
grayscale, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 11, 2
)
# Apply morphological opening to remove small noise
structuring_element = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
cleaned = cv2.morphologyEx(binary, cv2.MORPH_OPEN, structuring_element)
# Find character regions using contour detection
contours, _ = cv2.findContours(
cleaned, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
# Filter contours by area to remove tiny artifacts
min_area = 100
valid_contours = [cnt for cnt in contours if cv2.contourArea(cnt) > min_area]
# Extract bounding box around combined contours
if valid_contours:
combined_contour = np.vstack(valid_contours)
x, y, w, h = cv2.boundingRect(combined_contour)
region_of_interest = input_image[y:y+h, x:x+w]
else:
region_of_interest = input_image
# Compute HOG features instead of simple histogram
win_size = (64, 64)
hog_descriptor = cv2.HOGDescriptor(win_size, (16,16), (8,8), (8,8), 9)
if region_of_interest.shape[0] >= 64 and region_of_interest.shape[1] >= 64:
features = hog_descriptor.compute(region_of_interest)
else:
features = None
return region_of_interest, features
# Process multiple images
processor = OraclePreprocessor()
test_images = [cv2.imread(f"Pre_test/{i}.jpg") for i in range(1, 4)]
results = [processor.process(img) for img in test_images if img is not None]
Problem 2: Automated Character Segmentation Model
The segmentation phase aims to isolate individual characters from preprocessed rubbings. A hybrid deep learning architecture combining convolutional and recurrent modules proves effective for this task.
2.1 Model Architecture
The proposed network consists of:
- Encoder Path: Multiple convolutional layers with residual connections extract hierarchical features. For layer ℓ with input feature map X(ℓ), the output is:
Y(ℓ) = ReLU(BN(W(ℓ) * X(ℓ) + b(ℓ))) + X(ℓ)
where * denotes convolution, BN represents batch normalization, and W(ℓ), b(ℓ) are learnable parameters.
- Attention Mechanism: Spatial attention weights αij are computed as:
αij = σ(fatt(Xij))
where σ is the sigmoid function and fatt is a learned attention function.
- Decoder Path: Transposed convolutions upsample the feature maps to original resolution, producing pixel-wise segmentation masks.
2.2 Loss Function
The composite loss function combines dice coefficient loss and focal loss to handle class imbalance:
Ldice = 1 - (2∑ipigi + ε) / (∑ipi2 + ∑igi2 + ε)
Lfocal = -∑i(1-pi)γgilog(pi)
Ltotal = λ1Ldice + λ2Lfocal
where pi and gi are predicted and ground truth probabilities at pixel i, γ is the focusing parameter, and λ1, λ2 balence the loss components.
2.3 Evaluation Metrics
Model performance is assessed across multiple dimensions:
- Pixel Accuracy: Ratio of correctly classified pixels to total pixels.
- Intersection over Union (IoU): For character class c, IoUc = TPc / (TPc + FPc + FNc).
- Inference Speed: Frames per second (FPS) measured on standard hardware.
- Robustness Score: Performance degradation when tested on images with varying degradation levels.
Cross-validation across different rubbing collections ensures generalization capability. The model achieves superior performance by leveraging transfer learning from large-scale text recognition datasets followed by fine-tuning on oracle script corpora.