Fatal connection bug in bluenrg2 V2.X stack causes device lockup
Hello all,
I have fond a fatal flaw in the 2.x BT library that comes with the latest SDK:
STSW-BLUENRG1-DK 3.2.1
Setup: bluenrg2, external balun
Problem: When connecting a device using initial connection parameters with very small connection intervals (min = 7ms), the stack sometimes ends up in a connection limbo state where the stack (and the app) gets confused.
It appears to be a race and is triggered like this:
1) The stack emits a normal hci_le_connection_complete_event() with status 0 (zero).
2) The remote end (a dongle in a PC) clearly signals on HCI level that the connection first succeeded with status 0 (success), but shortly after gets a "disconnection complete" with error code "0x3e" = "Reason: Connection Failed to be Established". This is a normal scenario if connection timeouts occur during the initial connection itself, and this has also been observed on other stacks/chipsets.
3) Locally (on the bluenrg2), the stack the does _not_ signal signal "hci_disconnection_complete_event" correctly, causing it to be in a bad state (fatal).
4) No matter what is done on the remote end (e.g. eject dongle), the link never drops on the bluenrg2 with "hci_disconnection_complete_event".
5) If the state is forced on the bluenrg2 by calling hci_disconnect(), still nothing happens.
It would seem that the very short connection intervals causes the connection process itself to fail (in the middle of it). This seems to trigger a race or missing handling of a state to occur inside the bluenrg2 bluetooth v2.X stack. I have seen problems with short connection intervals on other stacks and chipsets, which is not a big problem. The state lockup is a big problem however, and the only way to get out of it is a reset of the chip.
1) It can be triggered fairly easily with most devices. I can make it fail after very few tries with e.g. a "BT-400" from Asus (Broadcom/Cypress chipset).
2) It can be reproduced with any example peripheral project from the SDK. Just a few connections and it goes south.
3) The reason you do not normally see this is a e.g. iphone and android devices never do initial connections with such low intervals. They use high intervals and then re-negotiate after connection.
Any light on this or a fix/workaround would be great. Our device is suffering due to this exotic but fairly obvious bug.
Thanks,
/pedro
