Hi,Could you share documentation or examples for running quantized foundational models (e.g. Google Gemma) on the STM32MP257F-DK—first in Python, then in C/C++ using the STM32MP2 NPU? Specifically:Does the STM32MP2 NPU support transformer-based architectures, or is it limited to CNNs (like the STM32N6)?Which inference frameworks are supported for GenAI on this platform? does this NPU ported by ST for llama.cpp ?Sorry, couldn't find required info from STM32 MPU wiki pages.Thanks!

Associate III

Question

Quantized Gemma Model Inference on STM32MP257F-DK Board

Forum|Forum|1 year ago
April 25, 2025
3 replies
297 views

Hi,

Could you share documentation or examples for running quantized foundational models (e.g. Google Gemma) on the STM32MP257F-DK—first in Python, then in C/C++ using the STM32MP2 NPU? Specifically:

Does the STM32MP2 NPU support transformer-based architectures, or is it limited to CNNs (like the STM32N6)?
Which inference frameworks are supported for GenAI on this platform? does this NPU ported by ST for llama.cpp ?

Sorry, couldn't find required info from STM32 MPU wiki pages.

Thanks!

J

JS_PWS

Visitor II

Hello

We are currently evaluating hardware options and have the same question. Can somebody from ST answer it here?

Thank you and best regards
Jan

S

Steven-LIN

Associate III

Hello,

I have the same question. Is it possible to run LLMs on the STM32MP2 series?

Additionally, what is the expected performance/inference efficiency?

Thanks!

V

VABRI

Technical Moderator

Hello,

The NPU architecture of the STM32MP2 series does not support transformer based architecture models.
LLM models can be run using the CPU.

The framework supported by the X-LINUX-AI package are listed in this wiki page:
https://wiki.st.com/stm32mpu/wiki/Category:X-LINUX-AI_expansion_package

BR

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

PatrickF

Technical Moderator

An example of local running LLM on STM32MP257F-EV1 (as said, using CPU only).
https://www.linkedin.com/posts/danilopietropau_another-great-example-of-llm-on-stm32mp2-activity-7293222309333495809-Qe0d

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.NEW ! Sidekick STM32 AI agent, see here

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded