DMS on STM32MP2
Hi everyone,
I’m working on a Driver Monitoring System (DMS) pipeline on the STM32MP25 and trying to understand both NPU utilization and overall performance limitations.
Setup details:
Platform: STM32MP25
Framework: stai_mpu_network (using .nb models with use_hw_acceleration=True)
Camera pipeline: GStreamer (appsink + cairooverlay)
Current performance: ~15 FPS
GPU load: ~25%
Models in use:
Face Detection (.nb) → runs every frame
Face Landmark (.nb) → runs every 3 frames
Iris Landmark (.nb) → runs every 3 frames (same schedule as landmarks)
YOLOv8n (Smoking/Calling, .nb) → runs every 3 frames (offset scheduling)
Scheduling logic:
Face detection: every frame
Landmark + eye models: frame_count % 3 == 0
YOLO model: frame_count % 3 == 1
So not all models are executed every frame, but FPS is still limited to ~15.
Questions:
1. NPU verification
Even though .nb models are used with HW acceleration enabled, I don’t have confirmation that inference is actually running on the NPU.
How can I verify that the NPU is being used (logs, tools, counters)?
Is there a way to monitor NPU utilization in real time?
2. Performance / FPS optimization
Given that:
GPU usage is only ~25%
Models are not all running every frame
Still only ~15 FPS
What are the typical bottlenecks on STM32MP25 in this kind of pipeline?
3. Face recognition pipeline
We are planning to extend this DMS pipeline with face recognition (driver identification).
Considering STM32MP25 constraints:
What would be a suitable lightweight face recognition pipeline?
Any suggested models (quantized / NPU-friendly) for:
Face embedding (e.g., MobileFaceNet or similar)
Best practices for integrating recognition without significantly impacting FPS?
Any guidance on confirming NPU usage, improving FPS, and selecting a suitable face recognition pipeline would be very helpful.
Thanks.
