Skip to main content
Explorer
June 11, 2024
Question

SPI DMA request triggered by Timer-compare

  • June 11, 2024
  • 12 replies
  • 6948 views

Hallo everyone,

As engineer who started to explore the world of MCU's years after my retirement, I regularly encounter problems that puzzle me for some time. However, since I'm stuck for several days now, I decided to ask for assistance by this forum.

For my project, I selected a high-performance MCU, (STM32H723VGT6) mounted on a WeAct test-board, because of its price and atractive form-factor.
I want to digitize the analoge output of a linear CCD by an ADC with SPI interface. This ADC (ADS8319) needs a rising edge to start a conversion and >1.6us later it expects a serie of 16 Clock pulses to export its data. I have read an attractive solution for such challenge at the 'StackExchange' site, that sends dummy uint16 data to a SPI with DMA, triggered by a Timer.
A SPI in Full-Duplex mode, will produce 16 Clock-pulses to transmit an U16 word. The Timer must produce PWM pulses with the wanted frequency for the CCD readout. The rising edge of the pulse will trigger the ADC conversion, while the falling edge triggers the DMA transfer of a dummy uint16 to SPI, that produces the Clock-pulses needed by the ADC to export its data. The ADC's uint16 data is sent to the SPI MISO pin, to let it be transferred to the global data-array by another DMA action. The scheme below shows the timing:

Scheme_Readout_CCD_w_SPI_ADC.png

Unfortunately, the organization of DMA on my MCU (BDMA, DMA with Mux, MDMA) is different from the STM32F746 on the 'StackExchange' forum, meaning the shown code snippets cannot be copied.

I configured SPI2 as Master in Full-Duplex mode with HW NSS signal handling, set 16bit frame-size, the baudrate and MSB-first. Despite many changes and different approaches, I have not been able to make SPI2 produce Clock-pulses, while the task looks quite simple: define and start a DMA action which, triggered by a Timer-compare event, writes a uint16 value to the SPI2 TXDR register.

 

 

I sincerely hope I made myself clear and that some readers are willing to think with me about how to crack this problem. Thank you in advance,

Fred Schimmel

 

 

 

 

 

 

 

 

    This topic has been closed for replies.

    12 replies

    ST Employee
    June 14, 2024

    Hello @FredS 

     

    I suggest you to start a CubeMX STM32H723 project with SPI2 + DMA1 configuration. Then check data full-duplex transfer with oscilloscope and ADS8319 response (with an know analog voltage input). 

    After that, use CubeMX to implement TIM8 OC or PWM interrupt handler in order to trigger externally ADS8319 at the falling edge and a second one to start SPI2 DMA transfert. Again use instrument to synchronize Rising/Falling OC on GPIO to trigger ADS8319 and SPI2 CLK/NSS.

    In same time, could you share the project that you already did I can check it to see what is wrong?

    Best regards,

    Romain

     

    FredSAuthor
    Explorer
    June 15, 2024

      Dear Roman,

    Thank you for your reply on my question, I really appreciate you took your time for it.

    I'm sorry to say, I don't understand how your suggestion in the first two lines of your reply should work. I can configure SPI2 with DMA for transmission (Tx) and reception (Rx), but have not a clue how to connect my ADS8319 ADC to this configuration to achieve readouts of the ADC (which should be about a fixed number for a known fixed analoge input).

    In your second alinea, you mention "implement TIM8 OC or PWM interrupt handler". My question just arose from the wish to bypass the use of interrupts on this time-critical appliance.

    I would prefer a solution with a tiny, fixed delay, identical to the subject in a threat of another forum:

    https://electronics.stackexchange.com/questions/353152/stm32f-how-to-config-dma-transfer-to-spi-triggered-by-timer 

    From the suggested solution I understood the following concept:

    1. configure a timer (TIM8 in my case) to create PWM pulses,
    2. configure DMA for TIM8,
    3. define the period equal to the ADC repetition rate,
    4. connect the PWM-output to the ADC 'CNVST' input (= trigger for conversion),
    5. define the pulse-length to the time difference between the ADC trigger (= rising edge) and the start event of the TIM-DMA (falling edge),
    6. configure the TIM-DMA to retrieve a constant uint16 value (from a prefixed address, no pointer increment) and transfer this value to SPI Tx-DMA register 'hspi2.Instance->TXDR',
    7. configure the SPI2-DMA-Rx to receive uint16 values on its RXDR register and transfer this value to the next index of a global data-array 'g_CCD_Buff'.
    8. a value on a SPI_TXDR register means the peripheral will transmit the bit-pattern on its MOSI-pin (= not connected), synchronous with clock-pulses on the SPI_CLK pin, which is connected to the ADC_CLK pin.
    9. if the duration of the TIM8 pulse is more than the ADC conversion time, the ADC will output its conversion value at its 'SDO' pin, on the rythm of the pulses on its CLK input,
    10. the ADC 'SDO' pin is connected to the SPI_MISO pin (so hspi2.Instance-> RXDR receives the ADC output),
    11. as SPI2 has a DMA configured for its Rx channel, the received value will become transferred to the next index of the global data-array.

    I understand most actions described above and know how to implement them. But I I cannot figure out how to define a Timer-DMA to perform its action on a SPI peripheral.

    Or, in case I understand it all wrong: how I can let a TIM8- OC1 event trigger a SPI-Tx DMA to execute.

    I hope I made myself more clear now,

    best regards,

    Fred Schimmel

    Graduate II
    June 15, 2024

    Hi Fred,

     

    Your post is quite complex and I've spent a good long while attempting to figure out what you're actually trying to do.

     

    [Heavily pared down to try and simplify matters]

    Is it fair to summarize your post as:

    "I want to read one 16bit word from SPI (with CSn asserted) every N microseconds,

    and I want to use DMA to do it so that the incoming data is stored into a memory buffer"

     

    ...Is that it?

     

     

    FredSAuthor
    Explorer
    June 15, 2024

        Dear Barry,

    Thank you for your reply.

    I will try to give my reactions on the four sections of your reply, one by one.

    About 1): I have learnt the trick to transmit a dummy word on a Full-Duplex SPI to force the SPI peripheral to produce clock pulses that are needed for the ADC to output its conversion word. And the SPI-Rx side receives the ADC word and transfers that data to an array. It was mentioned that "Receive Only Master" expects external clock-pulses fed to its SCLK pin to synchronize. I may experiment with your suggestion, but in that setup I still need to trigger the SPI-rRx action to read the ADC data (which I don't know how).

    About 2): In my text I didn't explain the complete setup of the application. There are 120,000us are needed for the complete readout of 3700 CCD-pixels. It takes 8us to convert a pixel, so 3700*8us = 29,600us for digitizing all. In the past I encountered problems with the export of collected data as ASCII stream, converting new data, so I decided to do it sequentially. Writing 3700 uint16 values as hex ASCII chars on a 921,600 Baud port takes ~80.3ms => one complete readout takes 109.9ms, rounded to 120ms.

    About 3): Apparently, I failed to explain my reasoning (valid or invalid). I want to use the TIM8 PWM for two things:

    1. the rep.rate of the ADC trigger. Each new pulse should trigger a new conversion of a CCD pixel voltage, where the period is determined by the ARR register.
    2. the delay between the ADC trigger (start of the conversion by the rising edge) and the readout of the ADC data (by triggering a SPI-DMA action on the falling edge), the delay equals the pulse length, determined by CCR1.

    The SPI clock frequency is only important.to be fast enough to read 16 bits from the ADC within 8us - ADC-conversion time (= ~6.2us), but not too fast to allow my HW to keep the pulses separated. And, as I already made clear, I want to read 3700 samples, per cycle, not one.

    About 4): Maybe I'm wrong, but in my perception the suggested solution in the stackexchange example defines a Timer-DMA that is triggered by TIM8.OC1REF, just as you describe in the first two sentences of your point 3.

    The next step is to configure this DMA transfer for another peripheral (SPI2), by the specification of the address of a uint16 constant as 'source pointer' and the the address of SPI2.TXDR as 'destination pointer'. Until here I think I understand this approach, but I miss how to program the last part: how to start the Timer-DMA.

    I hope I finally managed to make clear my ideas,

    many greetings from the Netherlands,

     

    Fred Schimmel

    Graduate II
    June 16, 2024

    It was mentioned that "Receive Only Master" expects external clock-pulses fed to its SCLK pin to synchronize

     

    This is wrong. The definition of "Master" in SPI is the side that controls SCLK. CubeMX offers both "Receive-only master" and "Receive-only slave". The difference between them is precisely whether the MCU is in charge of SCLK.

     

    I still need to trigger the SPI-rRx action to read the ADC data (which I don't know how).

    a DMA read from the proper SPI register will trigger an SPI read by the peripheral, and after the peripheral acknowledges with the data, the DMA will place the result in memory.

     

     >  In the past I encountered problems with the export of collected data as ASCII stream, converting new data, so I decided to do it sequentially.

    Writing 3700 uint16 values as hex ASCII chars

     

    I don't understand this. You mean you had trouble transferring binary data over a serial terminal? that's a software issue on the host side, not the STM. At least on linux, you can set the terminal to "raw mode" for binary data. I'm sure there's an equivalent in every OS.

     

     It takes 8us to convert a pixel,

     

    Where did this number come from? The datasheet says max conversion time is 1400ns, and min acquisition time (data readout time) is 600ns. In particular, your can clock out the 16 bits at 33Mhz (30ns min SCLK period for 3.3V VDD). Which the STM32H723 can easily do, and this takes ~600ns. Perhaps it's not realistic to expect to hit this optimal point exactly with an MCU (you could with an FPGA), but I think your timing budget is off.

     

    on a 921,600 Baud port takes ~80.3ms => one complete readout takes 109.9ms, rounded to 120ms.

     

    FYI, USB->Serial ports can easily run at 1/2/4Mbps and even 8Mbps. The STM32H723 also has HS USB which can easily do 16bit*500ksps .

     

    If your timing budget is tight, there's no need to separate SPI reception and UART transmission into distinct phases, it can in principle be done concurrently (word by word for example).

     

     

    It is HIGHLY recommended that your egress channel (UART or maybe USB in the future) be at least a bit faster then your ingress channel (SPI). The required throughput depends on your target sample rate for the ADC.  

     

    It might be possible to cobble a together ADC->DMA->(SPI->MEM)->DMA->(MEM->UART) using the DMAMUX's DMA request generator and request chaining (no CPU involvement). I'm not sure, but it would be an interesting  exercise to try. For this, egress throughout must be higher then ingress (real-time streaming).

     

     >  Each new pulse should trigger a new conversion of a CCD pixel voltage, where the period is determined by the ARR register.

     

    The DS is a little hard to parse, but it looks like this ADC has several modes.

     

    In "3-wire without busy" mode, IIUC, a conversion is started whenever CSn is deasserted (the SPI bus is idle). If you time your SPI read till at least tcnv later, you can have the STM32 read out the data at 33Mhz. All you need to do is figure out the right period in which to trigger the reads, and anything more than ~2us should be ok. I really think that might be all that's required.

     

    Alternatively, in "3-wire with busy" mode you can have the ADC output an interrupt signal when data is ready, and you can configure the DMAMUX request generator to use this signal that to trigger a DMA write. This could work as well.

     

    I just don't think using "Timer Output Compare" is a good solution here. 

     

    I miss how to program the last part: how to start the Timer-DMA.

     

    I'm gonna assume you'll relinquish the output compare idea.

     

    For a timer to periodically trigger a DMA transfer, follow the StackExchange code.

    The important steps are:

    1. link the DMA to the timer with __HAL_LINKDMA() (or some such).

    2. Set DMA mode (Peripheral-To-Memory)

    2. program the DMA with src/dest addresses and width with HAL_Start_DMA.

    3. use  __HAL_TIM_ENABLE_DMA(&htim, TIM_DMA_UPDATE); to allow the update event of the timer

    to trigger DMA (this is important)

    4. start the timer.

     

    You cannot do (all of) this with CubeMX code generation.

     

    Every time the timer overflows, the DMA will issue a DMA request to the SPI peripheral, which will

    read the number of bits configured at the baud rate configured. When the data is read, it will signal the

    DMA peripheral, which will read the data and store it in memory. That's it.

     

     

     

     

     

     

     

     

    Super User
    June 16, 2024

    You appear to want to run before having learned to walk.

    The 'H7 are overcomplicated beasts and now you have to cope with many concepts at once.

    > I cannot figure out how to define a Timer-DMA to perform its action on a SPI peripheral.

    First, have a look at Figure 1 System architecture in the RM. The MDMA is probably not very helpful in this situation, it's mostly aimed at heavy lifiting in the AXI domain. BDMA is mainly aimed at working autonomously, when the rest of the chip is in sleep, within the low-power domain, and it doesn't have access to peripherals beyond that domain (except AHB3/APB3).

    Thus assuming you are going to use one of the DMA1/DMA2, set up trigger first. In the timer, enable DMA from the source you want to (Update, or one of the Capture/Compare channels, by setting respective TIMx_DIER.UDE/CCxDE). In DMAMUX1, look up in Table 118. DMAMUX1: assignment of multiplexer inputs to resources the trigger from timer, and write it to one of the DMAMUX1_CxCR.DMAREQ_ID - the x there then determines, to which DMA Stream will be this request routed.

    You then set up that DMA Stream to perform the transfers from memory (beware of caching issues) to given SPI data register or FIFO (the 'H7 SPI is again an overcomplicated beast, much more complicated than SPI in 'F7 or other families; and I am not familiar with it so can't give specific clues for that) by setting the SPI data register address in DMA Stream's Peripheral Address register, memory buffer address in Memory Address register, number of transfers in NDTR, set the appropriate direction, transfer size, etc. in Control register. You don't need to use FIFO at this point.

    This may or may not be that simple to click in CubeMX. I don't know, I don't use Cube/CubeMX. Generally, Cube inevitably implements only a fraction of what the hardware is capable of - whatever Cube's authors deemed "typical" - and is helpful as long as you want something from that fraction. Otherwise it may or may not get into your way more than help. Now you've been warned.

    JW

    FredSAuthor
    Explorer
    June 16, 2024

         Good day Jan,

    Thank you very much for your straight forward response in my help request. Over time I have read many of your replies on questions, posed by me and by other members. From those texts I conclude you are a "no-nonsens" guru, with a broad overview and willing to advise people with all kind of skill-levels. And despite your aversion for CUBEIDE you respect people who need such framework (like me) to get their application running and still guide a way towards understanding and/or a solution. Thank you very much for such attitude!

    BarryWhit and you gave me a lot of directions, corrections, and advices which I have to chew on. It will take me some time to comprehend the new information and then implement and test a new approach, so no updates on this forum don't mean ignorance on my side.

    Thank you both a lot for your attempts to get me on the right track,

    many greetings,

    Fred Schimmel.

    Graduate II
    June 16, 2024

    Dear Fred,

     

    >> a conversion is started whenever CSn is deasserted 

    > This is just where I want to apply TIM8-CH1

     

    Fred, I've done my best to communicate to you my belief (It is ever possible that I'm the one in the wrong of course) that you have a misunderstanding of how this parts works. I'll try one last time. You seem to think that the 1.4us delay required between toggling CSn and the start of clocking out the data means that you have to toggle the CSn separately, carefully manage a delay for 1.4us and only then trigger a SPI read. This is not what the (Obfuscating) datasheet says or what its timing diagrams show. You're thinking about in the wrong way.

    The CSn logic starts the conversion when you *deassert* CSn (i.e. when it goes High) and it is only when you issue an SPI read that the SPI peripheral will assert CSn (set it low).  So *The conversion doesn't start just before you issue a read, it starts as soon as the previous read concludes*, i.e. when the SPI peripheral releases the bus. Thus, you don't have to do anything special to trigger a conversion, except to space your SPI reads sufficiently apart for the ADC to complete a full conversion during the "Idle time" between SPI transfers.

     

    I hope this drives the point home. If it doesn't - I surrender. :)

     

    Finally, I will also counsel you that ST has parts (cheaper, simpler, and less power-hungry ones than the H723) that include  two 4msps 12bit (16bit with oversampling) ADC. I've recently used the G431 for a pet project and was very happy with it (the G474 is its beefy bigger brother). WeAct sell very affordable boards with both these parts, and switching to those might just simplify your design a great deal. Of course this is assuming your CCD isn't part of some specialized module that bakes in the ADC chip as the sole interface to the sensor.

     

    Good luck.

    FredSAuthor
    Explorer
    June 17, 2024

        Hallo Barry,

    At first I want to express my appreciation for your help and attempts to convince me to be misunderstanding how the ADS8319 ADC must be handled to retrieve its data.

    You may have been I misled you by my description of my configuration of the SPI peripheral: "Full-Duplex with HW NSS control". The reality is: I left the pins for SPI MOSI and NSS un-connected and applied the TIM8-CH1 PWM pulses on the CNVST input of the ADC. The timing scheme for '3-Wire CS Mode Without Busy Indicator (SDI = 1)' on Datasheet page 20 shows the following, I added my signals at the left and below:

    ADCread_Timing_Diagram_2024-06-17.png

     I may seem a stubborn old ***, but I think this approach should be able to work (assuming I manage to program the SPI-read part). I already had a working situation with such layout, but running on a STM32F401CCU6 with an interrupt on TIM8 by DIER.CC1IE. The ISR was simple:

    void Adapted_SPI_TxISR_16BIT(struct __SPI_HandleTypeDef *hspi)
    {
     /* Transmit data in 16 Bit mode */
    	hspi->Instance->DR = (uint32_t)(*hspi->pTxBuffPtr);
    	// don't update Tx-parameters, always transmit same word:
    
    }	// end of ISR 'Adapted_SPI_TxISR_16BIT()'

    I was not satisfied with this solution because I had the impression the MCU could not handle 125k interrupts per second in a timely manner, causing some ADC conversions to be skipped.

    The remaining question is now: is this approach feasible? or are you still convinced, I am thinking in the wrong direction?

    As I wrote before: I have a lot to read, to think about and to experiment with, so don't expect any result on short notice.

    Have a nice day,

    Fred Schimmel

    Graduate II
    June 16, 2024

    @waclawek.jan just replied to someone in another thread and he made me realize something I didn't before (thanks for that).

    You can indeed trigger a dma directly from an output compare/Input capture event. You do not need the DMAMUX (though you might be able to do it that way too). 

    Again, you can't do it with CubeMX, but you can get a sense of how to do it by enabling a timer and then setting

    a DMA channel with DMA request "TIMx_COM". Generate the code, then look at the implementation

    of `HAL_TIM_OC_Start_DMA`. It hardcodes the source/target peripheral address to be one of the timer's registers (*), so it's unusable for general purpose, but other than that it's a complete example of how to trigger DMA from a timer event.

    To settle this once and for all, the list of DMA triggers sources is plainly listed in the RM in the documentation of the  TIMx_DIER register. I should have checked this, but I did not.

     

    And yes, I still claim this is the wrong way to do what you're trying to do.

     

    (*) In the case of Input Capture, you can HAL_TIM_IC_Start_DMA to have a series of capture values written to memory, which can be quite useful. In the case of Output Compare, I think HAL_TIM_OC_Start_DMA is used for waveform generation - for example you can modify the interval between toggles (from a list in memory), each time the timer expires.

    Super User
    June 16, 2024

    > You do not need the DMAMUX

    In STM32 families with DMAMUX there is no way you can avoid using it, if you want to use DMA. Triggers (requests) for DMA go through DMAMUX.

    JW

    Graduate II
    June 16, 2024

    JW,  that's an invisible implementation detail. There is no explicit configuration of DMAMUX facilities, namely the DMA Request generator. Instead, you set the TIMx_DIER->CC1R register bit. This is exactly how it works in the STM32F103 for example. Still, I accept you're correct in terms of architecture.

    Super User
    June 17, 2024

    > that's an invisible implementation detail. There is no explicit configuration of DMAMUX facilities, namely the DMA Request generator.

    You still have to set up DMAMUX facilities, namely the DMA request selection.

    This may appear to be "invisible" to Cube users, but not everybody uses Cube.

    > Instead, you set the TIMx_DIER->CC1R register bit.

    You probably meant TIMx_DIER.CCxDE. It's not "instead", it's "in addition".

    > This is exactly how it works in the STM32F103 for example.

    The way how DMA requests are handled is not irrelevant. In the 'F103 example, DMA requests from particular sources are steered to particular DMA Channels, thus you cannot arbitrarily select a DMA Channel (as you can in models with DMAMUX). Also, you have to take into mind that the requests to that particular DMA Channel are ORed, i.e. if you enable multiple colliding requests in the respective peripherals, you are in for a nasty surprise.

    There are other DMA requests schemes implemented in other STM32 families, too; these things evolve and DMAMUX is one of the evolution steps. You may click in Cube to find out the constraints, but also you can simply read the respective manual/datasheet and design the system from a thorough understanding of it.

    JW

    Graduate II
    June 17, 2024

    JW, I appreciate both your attention to detail and your deep knowledge. And your corrections. I also agree that the way to gain a deep understanding of a part is to know it at the register level, not the GUI level. But I'm not there yet.

     

    I've verified what you say is true. When  Cube generates for G4 (a Part which includes a DMAMUX),

    main->MX_TIMx_Init->HAL_TIM_Base_Init->HAL_TIM_Base_MspInit ->HAL_DMA_Init

    does program the DMAMUX registers. Like you said, those who write bare-metal code can't ignore the DMAMUX in this scenario, so it's by no means "transparent" as I suggested.

     

    Graduate II
    June 16, 2024

    I created a bug report thread for the recurring pain point of triggering DMA from timer event with CubeMX/HAL

    Usability: CubeMX/HAL is a footgun if you want to trigger SPI/UART/foo DMA from timer events 

    Super User
    June 17, 2024

    > I had the impression the MCU could not handle 125k interrupts per second in a timely manner

    At 84MHz system clock, you had 672 machine cycles per interrupt. While the ISR entry/exit, C function prologue/epilogue, qualifying the interrupt source and clearing it, may take around 50 cycles - maybe a tad bit more, if the ISR does nothing but pulling a variable and storing it into a peripheral register, that may push towards 100 cycles. Out of the 672. That's more than enough, provided you don't use other lengthy interrupt (e.g. USB) with the same or higher priority.

    What I guess choked your mcu was the overhead brought in by Cube's method of handling the interrupts, possibly exacerbated by not using compiler optimizations.

    JW

     

    FredSAuthor
    Explorer
    June 17, 2024

        Hallo Jan,

    After stating that 8us were more than enough to handle the interrupt you made a reservation "provided you don't use other lengthy interrupt (e.g. USB) with the same or higher priority".

    Just that was the case: I also added code to export the collected data as a stream of hex values in ASCII by means of USB_OTG_FS, which is a big piece of code and probably will consume a lot of CPU cycles.

    Anyway, thank you for pointing out that the simple ISR alone would fit in the available time on a 84MHz clocked MCU.

    Have a nice day,

    Fred Schimmel

    Graduate II
    June 17, 2024

    Fred, I wish you nothing but success with your chosen approach. It's possible it will work just fine. I think taking advantage of the hardware support for CSn in the SPI peripheral would be a far simpler (and natural, and perhaps even the manufacturer's recommended) way to do it, and I tried to push you in that direction. But it's your project - only you can decide on the direction you want to take.

     

    You now know you can trigger a DMA directly from an output compare event (I laid out roughly how to go about it with Cube/HAL, and JW gave a detailed explanation at the register level). You said several times that implementing this was the primary obstacle. That means you've made definite progress and therefore all is well.

    Graduate II
    June 17, 2024

    Wow, this is interesting. I'm not sure which families it applies to but it seems to suggest the described behavior is by design (i.e. applies to all STM32 families):

    STM32 gotcha No. 21: SPI master NSS (CSn) is unusable 

     

    FredSAuthor
    Explorer
    June 17, 2024

    Good day Barry,

    Thank you for your understanding and best wishes.

    And what's more important: informing me (and all other readers) about the real behavior of the HW SPI NSS signal prevents me from struggling with an incomplete picture of the reality.

    I hope to report some success after some time,

    many greetings,

    Fred Schimmel