Skip to main content
DHase.1
Associate III
June 7, 2022
Question

Can the STM32 CRC peripheral be made to work with the CRC-15_CAN polynomial?

  • June 7, 2022
  • 11 replies
  • 4833 views

Can the CRC for the ADBMS1818 (and other Analog Devices BMS parts) be generated using the STM32 CRC peripheral?

The ADBMS1818 datasheet shows a 15 bit polynomial for the CRC as--

x 15 + x 14 + x 10 + x 8 + x 7 + x 4 + x 3 + 1

Other sources call this a CAN-15-CRC polynomial, e.g., the Wikipedia article Cyclic redundancy check.

Wikipedia lists this polynomial as "even", and the Ref Manual for the STM32L431 says that the CRC peripheral does not work for even polynomials. However, it is not clear as to the exact definition of what Wikipedia and ST are using for "even." So, it is not clear that this polynomial can be handled with the 'L431 CRC.

However, if the problem is odd/even, there might be some tricks to make it work, e.g. reversal of the polynomial with a zero added, but I'm not sure if that is possible. I've made some attempts that have not been successful.

Finally, the datasheet has an example of software routine for generating the CRC. It uses a polynomial representation of 0x4599. However, the Wikipedia and its references show it as 0xC599 (16b!).

For software implementation the usual table lookup is probably satisfactory, however if it is possible to make use of the 'L431 hardware it would save the memory space for the table and provide a small improvement in computation time.

This topic has been closed for replies.

11 replies

Tesla DeLorean
Guru
June 7, 2022

Pretty sure the ST hardware doesn't explicitly support 15-bit, but it might work if you get it set up well and mask/shift the answer, really depends on the feed direction, alignment and injection point.

0x4599 and 0xC599 are basically the same, the high order-bit is typically out of scope ie 2**15 (x**15) fits in a 16-bit number space, 2**16 would not, but can be seen as the carry as the register shifts.

The STM32 implementation isn't a rocket, the bus takes at least 4 cycles.

Do you have some example test patterns?

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
DHase.1
DHase.1Author
Associate III
June 8, 2022

Thanks for the response.

I tried shifting 0x4599 left one bit, making it 0x8B32. With the seed/initial shifted left one bit, the result matches the software routine's output when the input data is all zeroes and various lengths, but fails if there is a 1 bit in the data.

As for an example test pattern, I've been using the ADBMS1818 datasheet example of a two byte {0x00, 0x01} input producing a 0x3D6E output, which uses the polynomical 0x4599 with seed/initial of 0x10, and shifts the 15b result by 1.

One possibility that I haven't investigated is if two crcs could be generated using the 8b and 7b polynomial size selections and combine them. It would depend on being to able to factor the 15b polynomial to make 8b and 7b polynomials that could be multiplied. ...

The speed isn't a big issue in this application, and saving flash for the lookup table is not likely to be critical, but could become a factor in staying within the limits of the flash for a 'B, 128K flash, part. A simple bit-by-bit shifting computation saves the table, if speed is not important, so the issue for this application is somewhat academic (and also interesting).

Tesla DeLorean
Guru
June 8, 2022
uint16_t Quick_CRC_Calc15Bits(uint16_t crc, int Size, uint8_t *Buffer)
{
 static const uint16_t CrcTable[] = { // Nibble Table for 0x4599, sourcer32@gmail.com
 0x0000,0xC599,0xCEAB,0x0B32,0xD8CF,0x1D56,0x1664,0xD3FD,
 0xF407,0x319E,0x3AAC,0xFF35,0x2CC8,0xE951,0xE263,0x27FA };
 
 while(Size--)
 {
 crc = crc ^ (*Buffer++ << (15-8)); // Align upper bits
 
 crc = (crc << 4) ^ CrcTable[(crc >> (15-4)) & 0xF]; // Process byte 4-bits at a time
 crc = (crc << 4) ^ CrcTable[(crc >> (15-4)) & 0xF];
 }
 
 return(crc & 0x7FFF);
}
 
{
 uint8_t data[] = {0x00, 0x01 };
 
 printf("crc=%04X Quick\n", Quick_CRC_Calc15Bits(0x0010, sizeof(data), data) << 1);
}

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
DHase.1
DHase.1Author
Associate III
June 9, 2022

Thanks again. The different approaches I now have all agree. Works!

I was close, but I wasn't using HAL_CRC_Calculate and my casting of the pointer was setting up a word rather than a byte pointer.

So, the answer to the posted question is "yes," and it is done by shifting the polynomial and seed/initial left one bit.

DHase.1
DHase.1Author
Associate III
June 12, 2022

I did some machine cycle comparisons of 256 table (byte) lookup, 16 table (nibble) lookup, HAL_CRC_Calculate, and a using the CRC peripheral without HAL. For a six byte ADBMS1818 command the number of machine cycles was 111, 155, 114, 24, respectively.

The problem with HAL_CRC_Calculate (byte format) is that it consumes machine cycles packing the bytes into words, and 1/2 word when possible, and that takes cycles. A "for" or "while" loop sending one byte at a time directly is faster; 86 cycles for six bytes.

HAL also brings in a lot of code. I didn't try to count the number of bytes used in the MX_CRC_Init(), but the number of machine cycles was 283. Estimating roughly two bytes for a machine cycle, would be 566 bytes, but the number of machine cycles would overstate the number 1/2 words somewhat given branches, push and pops (and I didn't see any loops when looking at the code). For convenience the test was being run in a hacked FreeRTOS program and the size jumped 2048 bytes when the CRC as activated in STM32CubeMX. I suspect the additional code merely pushed the code size into another block.

The advantage of using HAL is that it takes care of the low level setup and one gains flexibility when it comes to moving to different STM32 versions, and different applications. However, for a specific application such as this, going bare metal has size and speed advantages. Getting the castings, volatile, etc., for pointers can be a bit tricky. For this application here is my non-HAL implementation for STM32L431--

/******************************************************************************
* File Name : pec15_reg.c
* Date First Issued : 06/11/2022
* Description : ADBMS1818 PEC computation: non-HAL register direct
*******************************************************************************/
#include "pec15_reg.h"
 
/* *************************************************************************
 * uint16_t pec15_reg_init (void);
 * @brief : Iniitalize RCC and CRCregisters for ADBMS1818 CRC-15 computation
 * *************************************************************************/
#define CRCBASE ((__IO uint32_t*)0x40023000)
#define SEED 0x10 // ADBMS1818 PEC15 initial 
void pec15_reg_init (void)
{
 __IO uint32_t* rccbase = (uint32_t*)0x40021000;
 
 /* Bit 12 CRCEN: CRC clock enable */
 *(rccbase+0x12) |= 0x1000; // Set CEN bit
 
 /* Set CRC registers. */
 *(uint32_t*)(CRCBASE+4) = SEED*2; // CRC_INT: 
 *(uint32_t*)(CRCBASE+5) = 0x8B32; // CRC_POL: Polynomial * 2
 
 return;
}
 
/* *************************************************************************
 * uint16_t pec15_reg (uint8_t *pdata , int len);
 * @brief : Reset and compute CRC
 * @param : pdata = pointer to input bytes
 * @param : len = number of bytes
 * @return : CRC-15 * 2 (ADBMS1818 16b format)
 * *************************************************************************/
uint16_t pec15_reg (uint8_t *pdata , int len)
{
 /* Control register configuration includes reset. */
 *(CRCBASE+2) = 0x9; // CRC_CR: 16b + reset
 
 uint8_t* pend = pdata + len;
 do
 {
 *(__IO uint8_t*)CRCBASE = *pdata++;
 } while (pdata < pend);
 
 return *CRCBASE;
}
/******************************************************************************
* File Name : pec15_reg.h
* Date First Issued : 06/11/2022
* Description : ADBMS1818 PEC computation: 16 1/2 word table lookup
*******************************************************************************/
#include <stdint.h>
 
#ifndef __PEC15_REG
#define __PEC15_REG
 
/* *************************************************************************/
 uint16_t pec15_reg_init (void);
 /*	@brief	: Iniitalize RCC and CRCregisters for CRC-15 computation
 * *************************************************************************/
 uint16_t pec15_reg (uint8_t *pdata , int len);
/*	@brief	: Reset and compute CRC
 * @param : pdata = pointer to input bytes
 * @param : len = number of bytes
 * @return : CRC-15 * 2 (ADBMS1818 16b format)
 * *************************************************************************/
 
 #endif

waclawek.jan
Super User
June 12, 2022

If you always calculate CRC on 6 bytes, you can consider an unrolled loop. You can also consider the packing - I don't know why is the ST implementation inefficient but I am not going to investigate, I really don't care about Cube.

 > The advantage of using HAL is that it takes care of the low level setup and one gains flexibility when it comes to moving to different STM32 versions

Yes, this is how ST advertises it and academia for some inexplicable reason echo that. But if you are willing to read the manual, low level setup is in most cases trivially simple - how hard was it to set up the CRC? - and portability ends exactly at the place where hardware starts to be different - try to port your algorithm to 'F4 (hint: the CRC in 'F4 has poly fixed in hardware).

JW

DHase.1
DHase.1Author
Associate III
June 13, 2022

waclawek.jan,

>You can also consider the packing - I don't know why is the ST implementation inefficient but I am not going to investigate, I really don't care about Cube.

Unless I missed it, the Ref Manual doesn't mention that when a 32b word is loaded into the CRC_DR, the byte order when the data format is bytes, is that the byte order in that word is backwards, i.e. big endian rather than the "natural" little endian. What HAL_CRC_Calculate does, when there are 4 or more bytes, is that it packs 4 bytes into a word, but it has to do that by a series of 4 shift/or to construct the word in the proper byte order. The compiled code shows that this takes up as much time as just loading a series of single bytes into the CRC_DR register.

It took me a while to realize the byte order in the word when I first skimmed over the HAL_CRC_Calculate source code. I didn't catch that their shift|or was setting up the bytes in reverse order. The comments suggest the programmer thought this was optimizing the speed.

The other problem is that HAL_CRC_Calculate includes switch statements and branches to handle all the different ways the CRC peripheral might be configured.

It would have helped if the Ref Manual had included few sentences which would have clarified the byte order for word and 1/2 word loading.

DHase.1
DHase.1Author
Associate III
June 13, 2022

PS: I just realized that the Cortex-M series processors have a "REV" instruction that will reverse the byte order in a word. That would provide a way for efficient packing.

waclawek.jan
Super User
June 13, 2022

> It would have helped if the Ref Manual had included few sentences which would have clarified the byte order for word and 1/2 word loading.

Yes. Unfortunately, the ST documentation leaves a lot of such details behind. This is the consequence of creating the documentation in the same way as they create the chips themselves: by slapping modules together with not much thought given to the "long proven" modules themselves or properly documenting the interconnections and their consequences. In particular, I'd guess the CRC unit comes from a development around a naturally big-endian processor core (maybe the Power architecture on which the automotive STC56 are based).

And this is one of my problems with Cube, too - instead of ST providing clean and documented examples, they hide this kind of problems inside Cube with little or no comment/explanation.

> Cortex-M series processors have a "REV" instruction

I guess whoever wrote that Cube implementation either was unaware of this instruction (and REV16 for the 16-bit variant) and the __REV() (__REV16()) CMSIS intrinsics (which you may not be aware either, as you seem to avoid using the CMSIS-mandated device header and symbols from therein - I wonder why). But then, whoever wrote the CRC module implementation and in particular its newer incarnations, providing the bit reverse (which has an instruction in CM core, too) and not byte reverse, might have a similar mindset.

But at least we have an 8-bit scratch register available in CRC.

JW

DHase.1
DHase.1Author
Associate III
June 14, 2022

> I'd guess the CRC unit comes from a development around a naturally big-endian processor core

(maybe the Power architecture on which the automotive STC56 are based).

My guess is that ST's CRC was based on using "network order" rather than the big endian processor. I think the networking concepts originated on the early 1960's and on IBM machines that were what we now call big endian. The closest thing to networking I worked on in those days was a system using our own format on a slow 30 bps teletype network with no thought of something so impractical as implementing a CRC with relays and discrete transistors. A lot of progress in the last 60 years.

> I guess whoever wrote that Cube implementation either was unaware of this instruction (and REV16 for the 16-bit variant) and the __REV() (__REV16())

I think one problem with inserting ASM instructions is that it is compiler/linker specific and HAL is designed to work on a several different compilers, so dealing with that might have been an issue.

> CMSIS intrinsics (which you may not be aware either, as you seem to avoid using the CMSIS-mandated device header and symbols from therein

I was quite aware of the CMSIS, but was having some difficulty with getting the pointer casting sorted out when mixed in with the other uncertainties and took the lazy/direct approach.

waclawek.jan
Super User
June 14, 2022

>> and the __REV() (__REV16())

>> CMSIS intrinsics

>> 

>I think one problem with inserting ASM instructions is that it is compiler/linker specific

> and HAL is designed to work on a several different compilers,

> so dealing with that might have been an issue.

​That's why I mentioned the CMSIS intrinsics. They are C functions (static inline) provided by ARM, designed to work with the 3 dominant translators.

JW​

DHase.1
DHase.1Author
Associate III
June 15, 2022

Thanks. I see your point. I was not familiar with the term "intrinsics." The last time I embedded ASM in STM32 C code was 10 years ago and it was compiler (gcc) specific.

The following makes use of ASM instructions REV and REV16 to reverse bytes, and compute a 6 byte PEC15 inline. The initialization is called once (assuming STM32CubeMX isn't used, i.e. CRC is not activated). The pec15_reg is looping routine loading bytes into the CRC.

/* *************************************************************************
 * uint16_t pec15_reg_init (void);
 * @brief : Iniitalize RCC and CRCregisters for ADBMS1818 CRC-15 computation
 * *************************************************************************/
#define SEED 0x10 // PEC15 initial value
void pec15_reg_init (void)
{
 /* Bit 12 CRCEN: CRC clock enable */
 RCC->AHB1ENR |= RCC_AHB1ENR_CRCEN;
 
 /* Set CRC registers. */
 CRC->INIT = SEED*2;
 CRC->POL = 0x8B32; // CRC_POL: 0x4599 Polynomial * 2
 
 return;
}
 
/* *************************************************************************
 * uint16_t pec15_reg (uint8_t *pdata , int len);
 * @brief : Reset and compute CRC
 * @param : pdata = pointer to input bytes
 * @param : len = number of bytes
 * @return : CRC-15 * 2 (ADBMS1818 16b format)
 * *************************************************************************/
uint16_t pec15_reg (uint8_t *pdata , int len)
{
 /* Control register configuration includes reset. */
 //*(CRCBASE+2) = 0x9; // CRC_CR: 16b + reset
 CRC->CR = 0x9;
 
 uint8_t* pend = pdata + len;
 do
 {
 *(__IO uint8_t*)CRC_BASE = *pdata++;
 } while (pdata < pend);
 
 return CRC->DR;//*CRCBASE;
}
 
 // ###SNIP Six bytes: Word, 1/2 word ###
 /* Six byte PEC15 computation */
 CRC->CR = 0x9; // 16b poly, + reset
 *(__IO uint32_t*)CRC_BASE = (uint32_t)__REV (*(uint32_t*)&data[0]);
 *(__IO uint16_t*)CRC_BASE = (uint16_t)__REV16 (*(uint16_t*)&data[4]);
 p15H = CDC->DR; // Store 1/2 word result
 
 
// ###SNIP Six bytes three 1/2 words ###
 /* Six byte PEC15 computation */
 CRC->CR = 0x9; // 16b poly, + reset
 *(__IO uint16_t*)CRC_BASE = (uint16_t)__REV16 (*(uint16_t*)&data[0]);
 *(__IO uint16_t*)CRC_BASE = (uint16_t)__REV16 (*(uint16_t*)&data[2]);
 *(__IO uint16_t*)CRC_BASE = (uint16_t)__REV16 (*(uint16_t*)&data[4]);
 p15E = CRC->DR; // Store 1/2 word result

My test routine to run the various approaches computes the CRC on the same data input, i.e. they should all produce the same result. In the following, the number of machine cycles is listed in the row below.

A: FOR loop in main: one byte at a time to CRC DR

B: subroutine: pec15: 256 entry table lookup, 8b bytes

C: CRC_Handle_8 (routine is embedded in HAL_CRC_Calculate

D: HAL_CRC_Calculate

E: inline: three 1/2 words

F: subroutine: pec15_nibble: 16 entry table lookup, 4b nibbles

G: subroutine: pec15_reg: loop one byte at a time to CRC DR

H: inline: 32b word + 16b 1/2word

POLY: 8B32 SEED: 20 SIZE: 6 DATA[]: A5 02 03 04 05 FE
 0 A:EA4C B:EA4C C:EA4C D:EA4C E:EA4C F:EA4C G:EA4C H:EA4C
 85 109 78 115 23 156 80 21

As expected the inline statements are noticeably faster, and the table lookup by nibbles the slowest.

One item I have not investigated is the computation time issue. The clock setup might have the bus divided, in which case inline instructions could overrun the CRC input. The Ref Manual says,

"An input buffer allows to immediately write a second data without waiting for any wait states

due to the previous CRC calculation."

This suggests that if inline instructions overtake the CRC peripheral, wait states would be generated. When the bus is not divided, it looks like overrunning would not be possible. Here is a snip of the compiled code for the word + 1/2 word six byte CRC computation--

 CRC->CR = 0x9; // 16b poly, + reset
 8001efe:	2009 	movs	r0, #9
 8001f00:	6098 	str	r0, [r3, #8]
 *(__IO uint32_t*)CRC_BASE = (uint32_t)__REV (*(uint32_t*)&data[0]);
 8001f02:	4d51 	ldr	r5, [pc, #324]	; (8002048 <StartDefaultTask+0x22c>)
 8001f04:	682a 	ldr	r2, [r5, #0]
__STATIC_FORCEINLINE uint32_t __REV(uint32_t value)
 8001f06:	ba12 	rev	r2, r2
 8001f08:	601a 	str	r2, [r3, #0]
 *(__IO uint16_t*)CRC_BASE = (uint16_t)__REV16 (*(uint16_t*)&data[4]);
 8001f0a:	88aa 	ldrh	r2, [r5, #4]
__STATIC_FORCEINLINE uint32_t __REV16(uint32_t value)
 __ASM volatile ("rev16 %0, %1" : __CMSIS_GCC_OUT_REG (result) : __CMSIS_GCC_USE_REG (value) );
 8001f0c:	ba52 	rev16	r2, r2
 8001f0e:	b292 	uxth	r2, r2
 8001f10:	801a 	strh	r2, [r3, #0]
 p15H = *(uint32_t*)0x40023000;//hcrc.Instance->DR;//*(uint32_t*)(crcbase+0);
 8001f12:	681a 	ldr	r2, [r3, #0]
 8001f14:	920c 	str	r2, [sp, #48]	; 0x30

Loading both the word and 1/2 word takes 4 cycles. I don't think the uxth instruction is needed for the 1/2 word, but that is not important. When the clock setup has the bus running at the same rate as the system, the above instructions could not overrun the CRC computation.