Skip to main content
Explorer
January 31, 2024
Question

NUCLEO-U5A5ZJ-Q USB CDC ACM maximum speed using USBx

  • January 31, 2024
  • 2 replies
  • 4949 views

Hi,

I am using NUCLEO-U5A5ZJ-Q board  to setup a USB VCP connection with PC. I am currently using the example project CDC-ACM, and getting a maximum speed of up to 4.4MBPS via the connection by increasing the Tx FIFO  size and Max Packet size in device side example code.

 

 HAL_PCDEx_SetTxFiFo(&hpcd_USB_OTG_HS, 0, USBD_MAX_EP0_SIZE/4);
 HAL_PCDEx_SetTxFiFo(&hpcd_USB_OTG_HS, 1, 1920);
 HAL_PCDEx_SetTxFiFo(&hpcd_USB_OTG_HS, 2, USBD_CDCACM_EPINCMD_HS_MPS/4);

 

Since this example is for USB - UART bridge, I have removed the UART part of it and just kept the usbx_cdc_acm_write_thread_entry thread function enabled for sending data to PC. I have written a python script in windows side to receive the buffer.

I have attached the file ux_user.h for reference,

 

 

And I have increased the memory pool size or the device, 

 

#define TX_APP_MEM_POOL_SIZE (1024*1024)

#define UX_DEVICE_APP_MEM_POOL_SIZE (500*1024)

#define USBPD_DEVICE_APP_MEM_POOL_SIZE (10000)

 

Also the stack size,

 

#define USBX_DEVICE_MEMORY_STACK_SIZE 100*1024

#define UX_DEVICE_APP_THREAD_STACK_SIZE 1024

 

And providing the buffer size as 32768 bytes to ux_device_class_cdc_acm_write function. I am sending 100 * 32768 bytes in a loop for checking the speed. and receiving at the PC side. 

Or do I need to use the non blocking function ux_device_class_cdc_acm_write_with_callback for getting maximum throughputAnyway I have tried that by disabling the macro UX_DEVICE_CLASS_CDC_ACM_TRANSMISSION_DISABLE and setting up the callback functions in USBD_CDC_ACM_Activate function.

 

 /* Start Bulk transmission thread */
 UX_SLAVE_CLASS_CDC_ACM_CALLBACK_PARAMETER CDC_VCP_Callback;
 CDC_VCP_Callback.ux_device_class_cdc_acm_parameter_read_callback = &USBD_CDC_ACM_read_callback;
 CDC_VCP_Callback.ux_device_class_cdc_acm_parameter_write_callback = &USBD_CDC_ACM_write_callback;
 if (ux_device_class_cdc_acm_ioctl(cdc_acm, UX_SLAVE_CLASS_CDC_ACM_IOCTL_TRANSMISSION_START,
 &CDC_VCP_Callback) != UX_SUCCESS)
 {
 Error_Handler();
 }

 

Unfortunately the code goes to hardfault handler somehow. I haven't dig into it much, since I am not sure if it solves the issue with speed. Is there any example project for this callback mode if it provides a better throughput.

Is there any configuration to change in this example code so that I could get at least 10MBPS over USB HS VCP class. Or is this a limitation of the USBx USB stack? Do I need to write a separate USB stack code other than USBx provided to get more speed. I think since USB HS supports up to 480mbps, so I should get at least half of it that is 240mbps(30MBPS). What should be factor here to limit this speed in this application.

Please let me know the suggestions.

Thanks

    This topic has been closed for replies.

    2 replies

    Kannan1Author
    Explorer
    January 31, 2024

    Update:

    Here is the OTG_HS_PCD_Init function, and dma is disabled, do I need to enable DMA for better speed?

    void MX_USB_OTG_HS_PCD_Init(void)
    {
    
     /* USER CODE BEGIN USB_OTG_HS_Init 0 */
    
     /* USER CODE END USB_OTG_HS_Init 0 */
    
     /* USER CODE BEGIN USB_OTG_HS_Init 1 */
    
     /* USER CODE END USB_OTG_HS_Init 1 */
     hpcd_USB_OTG_HS.Instance = USB_OTG_HS;
     hpcd_USB_OTG_HS.Init.dev_endpoints = 9;
     hpcd_USB_OTG_HS.Init.speed = PCD_SPEED_HIGH;
     hpcd_USB_OTG_HS.Init.phy_itface = USB_OTG_HS_EMBEDDED_PHY;
     hpcd_USB_OTG_HS.Init.Sof_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.low_power_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.lpm_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.use_dedicated_ep1 = DISABLE;
     hpcd_USB_OTG_HS.Init.vbus_sensing_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.dma_enable = DISABLE;
     if (HAL_PCD_Init(&hpcd_USB_OTG_HS) != HAL_OK)
     {
     Error_Handler();
     }
     /* USER CODE BEGIN USB_OTG_HS_Init 2 */
    
     /* USER CODE END USB_OTG_HS_Init 2 */
    
    }

     

    Visitor II
    February 1, 2024

    Good question:
    I had to measure my U5A5 VCP UART speed as well (yes, I am using also U5A5 with VCP UART).
    The U5A5 should use OTG USB HS (with integrated PHY) and we would "assume": the speed is now 480 Mbps.

    But it is not true!

    Do you know, that MCU, running a VCP UART device - depends on the host (in terms of USB timing). The link (wire) speed (e.g. 480 Mbps) does not matter so much: it more related to the question: "when would the host (PC) ask again for new characters (or when would it send new characters from host to MCU)?").

    I understand VCP UART in this way:

    • MCU provides the Device - but the host controls the timing, e.g. when MCU would be able to send something, or with which period the host would send something to MCU
    • FS and HS can differ, e.g. HS can use USB "micro-frames" (and maybe send more often)

    Usually I take VCP UART (as Device on MCU) this way:

    • the host (as master) will "query" the device (MCU), e.g. every 1 ms (in FS mode)
    • it will ask for new bytes to receive on host side (so that MCU is "allowed" to send) or it will send from host some bytes to MCU
    • the max. packet size (number of characters) is 64 for USB VCP

    But all comes down to "understand USB", FS vs. HS (with faster transmission, more frequently, with "micro-frames").

    You cannot "judge" the MCU: the USB speed is controlled by USB and esp. the host. The Device cannot send anything without "requested/allowed' to do so. So, the possible throughput depends heavily on the host side.

    My suggestion:

    • send always the maximum packet size, e.g these 64 characters, on both sides (from host to MCU, from MCU to host)
    • now measure the "throughput" (as bits per seconds)
    • faster as what you get - you cannot improve on MCU side: the Master (the PC as USB Host) sets the timing (when MCU is asked to send something back to host - if this "query" comes just every x ms - nothing you can do on Device side)

    I guess: with VCP UART you can reach up to 4..8 Mbps as throughput, never mind if FS or HS is used.

    Technical Moderator
    February 5, 2024

    Hello @Kannan1 

    > Unfortunately the code goes to hardfault handler somehow.

    Would you share the fault analysis? 

    For USB OTG HS can potentially improve the speed of data transfer. OTG_HS embeds an internal DMA with thresholding support and software selectable AHB burst type in DMA mode.

    Visitor II
    February 6, 2024

    Just raising questions:

    You configure to many large memories, e.g.: 

    #define TX_APP_MEM_POOL_SIZE (1024*1024)
    
    #define UX_DEVICE_APP_MEM_POOL_SIZE (500*1024

    This is already 1.5 MB plus some other configs (malloc heap, stack...).

    Are you sure, you have so much memory?

    You get potentially a hardfault handler because you try to use memory outside the available space ("jumping into the forest").

    For ACM (USB VCP UART) it is really enough to test with 64 byte packets: as I understand:

    • the max. packet size for one ACM/VCP UART packet is 64 bytes - anyway!
    • see the EP configuration in USB stack, e.g. for Enumeration/Descriptors
    • check the DMA config for USB: if it would be really able to send larger as 64 bytes per USB packet
    • but why more as USB allows to send as one packet? (64bytes)

    Even in my ACM project the DMA is set to "disable":

    void MX_USB_OTG_HS_PCD_Init(void)
    {
     hpcd_USB_OTG_HS.Instance = USB_OTG_HS;
     hpcd_USB_OTG_HS.Init.dev_endpoints = 9;
     /*
     * ATTENTION: this must be FS speed in order to work on Android Ethernet tethering!
     * enumeration works but not the DHCP server to get an IP address. This setting solves the issue!
     */
     hpcd_USB_OTG_HS.Init.speed = PCD_SPEED_HIGH;		//PCD_SPEED_FULL;	//PCD_SPEED_HIGH;
     hpcd_USB_OTG_HS.Init.phy_itface = USB_OTG_HS_EMBEDDED_PHY;
     hpcd_USB_OTG_HS.Init.Sof_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.low_power_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.lpm_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.use_dedicated_ep1 = DISABLE;
     hpcd_USB_OTG_HS.Init.vbus_sensing_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.dma_enable = DISABLE;
     if (HAL_PCD_Init(&hpcd_USB_OTG_HS) != HAL_OK)
     {
     Error_Handler();
     }
    }

    I assume: running USB with DMA needs much more as just setting to ENABLE: a DMA engine/channel must be enabled, DMA INT handlers have to be there... maybe this causes your hardfault_handler called...

     

     

    Visitor II
    February 6, 2024

    BTW: I tried: I have enabled DMA

    void MX_USB_OTG_HS_PCD_Init(void)
    {
     hpcd_USB_OTG_HS.Instance = USB_OTG_HS;
     hpcd_USB_OTG_HS.Init.dev_endpoints = 9;
     /*
     * ATTENTION: this must be FS speed in order to work on Android Ethernet tethering!
     * enumeration works but not the DHCP server to get an IP address. This setting solves the issue!
     */
     hpcd_USB_OTG_HS.Init.speed = PCD_SPEED_HIGH;		//PCD_SPEED_FULL;	//PCD_SPEED_HIGH;
     hpcd_USB_OTG_HS.Init.phy_itface = USB_OTG_HS_EMBEDDED_PHY;
     hpcd_USB_OTG_HS.Init.Sof_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.low_power_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.lpm_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.use_dedicated_ep1 = DISABLE;
     hpcd_USB_OTG_HS.Init.vbus_sensing_enable = DISABLE;
     hpcd_USB_OTG_HS.Init.dma_enable = ENABLE;	//DISABLE;
     if (HAL_PCD_Init(&hpcd_USB_OTG_HS) != HAL_OK)
     {
     Error_Handler();
     }
    }

    It works as before (no issues for me - no difference).

    I have tried with my ACM code (U5A5, AZURE RTOS). Even I do not trust the Tick count frequency... what I get:

    Send 640 KB of data from MCU to host:

    	int i;
    	unsigned int startTS, endTS;
    	startTS = HAL_GetTick();
    	for (i = 0; i < 10000; i++)
    		VCP_UART_Send((const uint8_t *)"1111111111222222222233333333334444444444555555555566666666667777", 64);
    	endTS = HAL_GetTick();
    	print_log(out, "\r\nstart: %ul | end: %ul | delta: %ul | %ul bytes\r\n", startTS, endTS, endTS - startTS, i * 64);

    It reports:

    start: 763 | end: 1264 | delta: 501 | 640000 bytes

    My AZURE RTOS HAL_GetTick() seems to be wrong:
    It takes approx. 1 sec to send these 640,000 bytes (based on host terminal observation).
    The expired tick as 501 seems to be actually 1000 (milli-seconds).

    It would result in: VCP UART (ACM) is: approx.: 5,120,000 bps (5.1 Mbps).

    Even I have enabled HS - I was also expecting a higher throughput.
    Anyway: it is still in the range I have expected for ACM/VCP UART: 5...7 Mpbs (even USB configured as HS). It depends on host, e.g. host requests just every 1ms a new response from MCU or when busy even with longer period.

    Maybe:
    The host display of received bytes slows it down! The PC prints on TeraTerm all the received bytes. Maybe this slows down the speed (host will not request so fast anymore when it cannot be displayed so fast as well).

    Remark:
    to measure the USB VCP (ACM) throughput depends heavily how the host interacts. You cannot judge the performance/speed of MCU just by measuring what you can do on MCU side: the MCU is a Device and the timing depends on the Host (PC) - host can slow down!

    What you measure at the end is: "what the host is capable to process in real time" (allows to be sent by MCU).