Skip to main content
Associate II
April 27, 2026
Question

DMS on STM32MP2

  • April 27, 2026
  • 0 replies
  • 29 views

Hi everyone,

I’m working on a Driver Monitoring System (DMS) pipeline on the STM32MP25 and trying to understand both NPU utilization and overall performance limitations.

Setup details:
Platform: STM32MP25

Framework: stai_mpu_network (using .nb models with use_hw_acceleration=True)

Camera pipeline: GStreamer (appsink + cairooverlay)

Current performance: ~15 FPS

GPU load: ~25%

Models in use:
Face Detection (.nb) → runs every frame

Face Landmark (.nb) → runs every 3 frames

Iris Landmark (.nb) → runs every 3 frames (same schedule as landmarks)

YOLOv8n (Smoking/Calling, .nb) → runs every 3 frames (offset scheduling)

Scheduling logic:
Face detection: every frame

Landmark + eye models: frame_count % 3 == 0

YOLO model: frame_count % 3 == 1

So not all models are executed every frame, but FPS is still limited to ~15.

Questions:
1. NPU verification
Even though .nb models are used with HW acceleration enabled, I don’t have confirmation that inference is actually running on the NPU.

How can I verify that the NPU is being used (logs, tools, counters)?

Is there a way to monitor NPU utilization in real time?

2. Performance / FPS optimization
Given that:

GPU usage is only ~25%

Models are not all running every frame

Still only ~15 FPS

What are the typical bottlenecks on STM32MP25 in this kind of pipeline?

3. Face recognition pipeline
We are planning to extend this DMS pipeline with face recognition (driver identification).

Considering STM32MP25 constraints:

What would be a suitable lightweight face recognition pipeline?
Any suggested models (quantized / NPU-friendly) for:
Face embedding (e.g., MobileFaceNet or similar)
Best practices for integrating recognition without significantly impacting FPS?


Any guidance on confirming NPU usage, improving FPS, and selecting a suitable face recognition pipeline would be very helpful.

Thanks.