Deploying YOLOv8 Pose Models on RKNN, Horizon, and TensorRT Platforms
Model Training
Training utilizes the official YOLOv8 pose implementation, with modifications to replace SiLU activation functions with ReLU for compatibility with embedded platforms. The dataset used is a subset of COCO, intended solely for deployment testing purposes. Results may vary depending on image content.
ONNX Export Process
The following steps outline how to generate an ONNX model suitable for edge devices:
Step 1: Save Model Weights
To isolate weights for further processing, update the model saving logic:
import torch
self.model.fuse()
self.model.eval()
torch.save(self.model.state_dict(), './weights/yolov8pos_relu_dict.pt')
Execute inference using the saved weights:
model = YOLO('./weights/yolov8pos_relu.pt')
results = model(task='pose', mode='predict', source='./images/test.jpg', line_width=3, show=True, save=True, device='cpu')
Step 2: Modify Detection Head
Update the Detect class to prepare for ONNX export:
class Detect(nn.Module):
dynamic = False
export = False
shape = None
anchors = torch.empty(0)
strides = torch.empty(0)
def __init__(self, nc=80, ch=()):
super().__init__()
self.nc = nc
self.nl = len(ch)
self.reg_max = 16
self.no = nc + self.reg_max * 4
self.stride = torch.zeros(self.nl)
c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], self.nc)
self.cv2 = nn.ModuleList(
nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch)
self.cv3 = nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch)
self.dfl = DFL(self.reg_max) if self.reg_max > 1 else nn.Identity()
# ONNX export enhancement
self.conv1x1 = nn.Conv2d(16, 1, 1, bias=False).requires_grad_(False)
x = torch.arange(16, dtype=torch.float)
self.conv1x1.weight.data[:] = nn.Parameter(x.view(1, 16, 1, 1))
def forward(self, x):
shape = x[0].shape
y = []
for i in range(self.nl):
t1 = self.cv2[i](x[i])
t2 = self.cv3[i](x[i])
y.append(self.conv1x1(t1.view(t1.shape[0], 4, 16, -1).transpose(2, 1).softmax(1)))
y.append(t2)
return y
for i in range(self.nl):
x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
if self.training:
return x
elif self.dynamic or self.shape != shape:
self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
self.shape = shape
x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
if self.export and self.format in ('saved_model', 'pb', 'tflite', 'edgetpu', 'tfjs'):
box = x_cat[:, :self.reg_max * 4]
cls = x_cat[:, self.reg_max * 4:]
else:
box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides
y = torch.cat((dbox, cls.sigmoid()), 1)
return y if self.export else (y, x)
Step 3: Update Pose Head
Modify the Pose class accordingly:
class Pose(Detect):
def __init__(self, nc=80, kpt_shape=(17, 3), ch=()):
super().__init__(nc, ch)
self.kpt_shape = kpt_shape
self.nk = kpt_shape[0] * kpt_shape[1]
self.detect = Detect.forward
c4 = max(ch[0] // 4, self.nk)
self.cv4 = nn.ModuleList(nn.Sequential(Conv(x, c4, 3), Conv(c4, c4, 3), nn.Conv2d(c4, self.nk, 1)) for x in ch)
def forward(self, x):
bs = x[0].shape[0]
ps = []
for i in range(self.nl):
ps.append(self.cv4[i](x[i]))
x = self.detect(self, x)
return x, ps
if self.training:
return x, kpt
pred_kpt = self.kpts_decode(bs, kpt)
return torch.cat([x, pred_kpt], 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))
Step 4: Generate ONNX Model
Include the following snippet in the model initialization process:
import torch
self.model.fuse()
self.model.eval()
self.model.load_state_dict(torch.load('./weights/yolov8pos_relu_dict.pt', map_location='cpu'), strict=False)
print("=========== onnx =========== ")
dummy_input = torch.randn(1, 3, 640, 640)
input_names = ["data"]
output_names = ["cls1", "reg1", "cls2", "reg2", "cls3", "reg3", "ps1", "ps2", "ps3"]
torch.onnx.export(self.model, dummy_input, "./weights/yolov8pos_relu.onnx", verbose=False, input_names=input_names, output_names=output_names, opset_version=11)
print("======================== convert onnx Finished! .... ")
Run the following command to produce the ONNX model:
model = YOLO('./ultralytics/models/v8/yolov8-pose.yaml')
Testing Results
Compare results obtained from PyTorch and ONNX versions to ensure consistency.
C++ Deployment for RKNN
Refer to full C++ code samples for RKNN deployment.
Simulation Testing on RKNN and Horizon Platforms
Detailed setup instructions for RKNN and Horizon environments can be found in the respective documentation:
References
- YOLOv8 Detection on RKNN and Horizon Chips
- C++ Deployment for YOLOv8 on RKNN
- Official Model Export for RKNN and Horizon
- YOLOv8 Segment Deployment