Automated Oracle Bone Script Analysis: Noise Reduction, Segmentation, and Character Recognition
Let the original rubbing image be represented as $I_{raw}$. The preprocessing stage applies a transformation $T_{prep}$ to yield a cleaned image $I_{clean} = T_{prep}(I_{raw})$. Subsequently, a descriptor extractor $T_{feat}$ maps the cleaned image to a feature vector $\mathbf{v} = T_{feat}(I_{clean})$. A classification model $C$ then determines the presence of noise: $y = C(\mathbf{v})$, where $y=0$ indicates the presence of interference and $y=1$ signifies a clean region.
Interference in ancient script rubbings primarily manifests as point noise, artificial textures, and inherent surface textures. Point noise is effectively suppressed using adaptive median filtering, which dynamically adjusts the kernel size to preserve edges while eliminating isolated artifacts. For artificial and inherent textures, which predominantly occupy the high-frequency domain, frequency-domain filtering is applied. A Gaussian low-pass filter smooths the image to attenuate high-frequency textural patterns, while wavelet transformation allows for multi-scale texture separation by selecting appropriate basis functions.
Feature extraction encompasses shape, texture, and intensity characteristics. For a given region $R$ with area $A$, the features are defined as:
Shape Feature: $$ \Phi_{shape}(R) = \frac{1}{A} \iint_R (x^2 + y^2) ,dx,dy $$
Texture Feature: $$ \Phi_{tex}(R) = \frac{1}{A} \iint_R G(x,y) \cdot I(x,y) ,dx,dy $$
Intensity Feature: $$ \Phi_{int}(R) = \frac{1}{A} \iint_R I(x,y) ,dx,dy $$
where $G(x,y)$ represents a Gaussian kernel. The feature vector $\mathbf{v} = [\Phi_{shape}, \Phi_{tex}, \Phi_{int}]$ is fed into classifiers such as Support Vector Machines (SVM), Random Forests, or Convolutional Neural Networks (CNN) to categorize the region as script or noise.
import cv2
import numpy as np
def preprocess_rubbing(img_path, out_path):
src_img = cv2.imread(img_path)
gray_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2GRAY)
# Denoising using median filter
denoised_img = cv2.medianBlur(gray_img, 3)
# Otsu's thresholding
_, binary_img = cv2.threshold(denoised_img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Morphological operations to clean up
morph_kernel = np.ones((3, 3), np.uint8)
cleaned_img = cv2.morphologyEx(binary_img, cv2.MORPH_OPEN, morph_kernel, iterations=2)
# Contour detection
contours, _ = cv2.findContours(cleaned_img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
result_img = cv2.drawContours(src_img.copy(), contours, -1, (0, 255, 0), 2)
cv2.imwrite(out_path, result_img)
return result_img
The segmentation of individual characters from rubbings can be modeled as a composite function $S(I_{raw}) = N_{point}(F(I_{raw})) \cup N_{art}(F(I_{raw})) \cup N_{inher}(F(I_{raw}))$, where $F$ extracts the feature space, and $N_{point}$, $N_{art}$, and $N_{inher}$ denote the respective noise removal operators. To achieve robust single-character isolation, a U-Net architecture is employed for pixel-wise segmentation. This encoder-decoder structure captures contextual information while maintaining spatial localization, effectively separating characters from complex backgrounds.
The model is trained using categorical cros-entropy loss and optimized via Adam or SGD. Performance is quantified using $k$-fold cross-validation, ensuring generalization across unseen data. Metrics include precision, recall, and the F1-score.
import torch
import torch.nn as nn
class OracleSegNet(nn.Module):
def __init__(self, num_classes=4):
super(OracleSegNet, self).__init__()
self.feature_extractor = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(128 * 16 * 16, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, num_classes)
)
def forward(self, x):
x = self.feature_extractor(x)
return self.classifier(x)
Applying the segmentation model to a batch of test images involves preprocessing, inference, and post-processing. The watershed algorithm is utilized for boundary refinement. For an image $I_t$, the foreground and background are separated using distance transforms, and connected components label isolated regions. Bounding boxes are extracted for components exceeding a minimum area threshold.
import pandas as pd
def extract_characters(src_img):
gray = cv2.cvtColor(src_img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
kernel = np.ones((3, 3), np.uint8)
opened = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
background = cv2.dilate(opened, kernel, iterations=3)
dist_transform = cv2.distanceTransform(opened, cv2.DIST_L2, 5)
_, foreground = cv2.threshold(dist_transform, 0.5 * dist_transform.max(), 255, 0)
foreground = np.uint8(foreground)
unknown_region = cv2.subtract(background, foreground)
_, markers = cv2.connectedComponents(foreground)
markers = markers + 1
markers[unknown_region == 255] = 0
markers = cv2.watershed(src_img, markers)
bounding_boxes = []
for label in range(1, markers.max() + 1):
mask = np.uint8(markers == label)
x, y, w, h = cv2.boundingRect(mask)
if w > 15 and h > 15:
bounding_boxes.append([x, y, x + w, y + h])
return bounding_boxes
def process_test_dataset(test_dir, output_excel):
results = []
for idx in range(1, 201):
img_path = f"{test_dir}/img_{idx}.jpg"
img = cv2.imread(img_path)
boxes = extract_characters(img)
results.append({"image_id": idx, "bounding_boxes": str(boxes)})
df = pd.DataFrame(results)
df.to_excel(output_excel, index=False)
Character recognition is formulated as a multi-class classification task over $K$ distinct character categories. The output is a $K$-dimensional probability vector $\mathbf{p} = \text{Softmax}(\mathbf{W} \cdot \text{CNN}(I_{char}) + \mathbf{b})$. The pipeline consists of:
- Preprocessing: $I_{ref} = \Psi_{prep}(I_{char})$
- Feature Extraction: $\mathbf{f} = \Psi_{feat}(I_{ref})$
- Model Training: $\Theta = \Psi_{train}(\mathbf{f})$
- Data Augmentation: $D_{aug} = \Psi_{aug}(D_{orig})$ usinng elastic deformations, rotations, and scaling to mitigate class imbalance.
- Prediction: $\hat{y} = \Psi_{predict}(I_{test}) = \Theta(\Psi_{feat}(\Psi_{prep}(I_{test})))$
To address the high variance in ancient script morphology and the presence of variant characters, Attention Mechanisms are integrated into the CNN backbone. This allows the network to focus on critical stroke regions while ignoring residual background interference. Additionally, transfer learning from a CRNN (Convolutional Recurrent Neural Network) pre-trained on modern or synthetic script datasets accelerates convergence and improves feature representation. The predicted labels are mapped back to their corresponding characters and exported in a structured format.