Frame lengths are way too big

jenna · December 22, 2022, 12:14am

Hi, I’ve been working with the DWM3001CDK, which uses an nRF52833. I have 4 tags and 5 responders. Each tag is set on a time slot to begin polling to each of the responders. I can not get any faster than 11Hz which is ridiculously slow for my setup. My UWB config is as follows:

Preamble length: 64
PAC: 8
Data Rate: 6.8Mbps
STS Mode: Off
Payload size: 65 Bytes

POLL_TX_DLY_UUS 450
POLL_RX_TO_RESP_TX_DLY_UUS 1300
RESP_RX_TO_FINAL_TX_DLY_UUS 1200

SPI rate is set to 32MHz.

The time it takes for the frame to be sent seems ridiculously wrong. With this configuration I should expect my frame length to be around 155us. However, I’m seeing that it takes 304us.

There is time eaten up for the SPI to talk to the UWB chip, about 150us. This is accounted for when I set my delays. However the packet should only require about 155us, which means that a delay of 600 should be plenty of time for it to not throw a timeout error. However, a delay less than 1400us throws me timeout errors nearly every time.

To measure the time it takes to send out a packet I set up a timer to start right after dwt_starttx() and end the timer in the txdone callback. I have compared sending immediately and delayed and I see barely any difference. Which tells me my delays are the smallest I can make them.

Why is it taking so long to send out packets? With my current delays I can complete a TWR between a tag and a responser in 3ms. I have 4 tags and 5 responders which means it take a single tag 15ms to range with all 5 responders. This gives me a time of about 10Hz-16Hz. According to the data sheet I should be able to get around 30Hz with this setup.

AndyA · December 22, 2022, 4:28pm

How are your responders detecting the initial packet? Are they interrupt based or polling? If polling what’s that interval?

I make your packet 174us over the air (100.5us of data and 73.3us of preamble). The SPI transfers can eat up a lot of time, you should verify that you’re not performing unnecessary operations and so taking longer than needed there. Also check things like the setup times and idle times between cycles on your SPI bus, running it fast doesn’t matter much if you have a large wait times between cycles in there. The chip requirement is nanoseconds but if the driver has any explicit delays in there they are probably going to be far larger than that.

I have in the past managed a full DS TWR exchange of 4 packets in under 1 ms using a DW1000 with an SPI speed of 20 MHz. I had to increase this to 1.25ms due to the data rate dropping to 850kb/s for range reasons. But still far less than the times you’re seeing. I was however using a far smaller payload, 65 bytes seems huge.

Even with the lower data rate I can get 800 double sided two way ranges per second (so 40 Hz for your setup). So what you are aiming for is certainly completely possible with the correct optimisation.
By making use of the fact that I always use either 8 or 12 responders and modifying the radio protocol to exploit that my current implementation can hit 2400 double sided two way ranges per second. If a high update rate is critical and you have a controlled enough system you could probably do something similar.

So sorry, no idea exactly where your issue is but your expectations are reasonable and what you are aiming for is certainly possible.

jenna · December 22, 2022, 6:10pm

Hi Andy,

My responders aren’t on a true interrupt. In the received packet callback I set a Boolean to true. It takes about 88-150us for the code to actually check that Boolean to determine whether or not it received a packet. My whole timing is as follows:

150us for poll SPI transfer. This is how long it takes starttx to complete.

240-304us for frame length. This is the time it takes after starttx has completed to the TX complete callback. This includes delay times but I’m not seeing a difference between send immediate and send delayed.

150us for my responder to get the packet received interrupt to actually checking the packet is received.

150us for response SPI transfer.

304-560us for frame length. Measured the same way for the poll message. (My poll packets are smaller, around 40 bytes so I except to see longer times for my response and final messages which I do)

150us for my tags to get packet received interrupt to actually check the packet they received.

150us SPI transfer for final message.

304-560us for frame length.

In total I’m seeing that is takes about 2ms to complete one ranging processes. So to make sure I cover any larger time necessities my minim time allowed for one ranging protocol is 3ms.

I’ll confirm that my SPI isn’t taking up more time than it needs to, thanks for the suggestion.

It seems that as soon at starttx is complete it takes forever to get the tx done interrupt. Are there other processes going on between this time that might account for the long transmission times? I don’t believe this is reliant on SPI during this time so I ruled it out.

My estimated frame lengths should be under 200us but I’m seeing about 500us, which means there’s a missing 300us.

Keonte45 · December 25, 2022, 8:44pm

There are a few factors that could be contributing to the slower-than-expected UWB performance you are seeing with your DWM3001CDK.

One potential issue is the time it takes for the SPI communication between the nRF52833 and the DWM3001. This can be a significant overhead, particularly if you are using a slower SPI clock rate. It is worth double-checking that you have optimized your SPI settings as much as possible to minimize this overhead.

Another factor to consider is the payload size. A larger payload will take longer to transmit, so if you are using a payload size of 65 bytes, this could be contributing to the longer transmission times you are seeing. You could try reducing the payload size to see if this improves your UWB performance.

It is also worth considering the impact of the STS (short training sequence) mode on your UWB performance. STS mode is designed to improve the range and robustness of UWB communication, but it comes at the cost of reduced data rate. If you have STS mode disabled, this could be contributing to the slower-than-expected UWB performance you are seeing.

Finally, it is worth checking that you have optimized the delay settings (POLL_TX_DLY_UUS, POLL_RX_TO_RESP_TX_DLY_UUS, and RESP_RX_TO_FINAL_TX_DLY_UUS) for your specific setup. These delays are used to allow time for the UWB transceivers to transmit and receive the UWB frames, and if they are set too low, this can result in timeout errors. On the other hand, if they are set too high, this can also negatively impact your UWB performance.

I hope this information is helpful. Let me know if you have any further questions or if you would like more guidance on optimizing your UWB performance with the DWM3001CDK.

jenna · December 29, 2022, 9:42pm

Andy and Keonte,

I double checked the SPI and it appears that it is taking longer than I initially thought but it still doesn’t alleviate my confusion. I don’t see any extra processes happening and the DW3000 is the only chip using SPI. I also confirmed that I am running a 32MHz SPI clock by hooking it up to a scope.

I experimented with different payload sizes and see about a 50us difference in frame length between a 65 byte payload and a 9 byte payload. However this is still a long frame length and the size of the payload is not the reason for that. I’m not using STS or PDOA mode so those are not contributing factors either.

My time is being sucked up after dwt_starttx() and the txdone callback. My understanding is that this time is used by the hardware to send out the packet. There is no overhead from the SPI during this time. I can’t imagine that it actually takes the hardware so much time to send out the packet but I don’t see any other options. I’ve tried on several DWM3001CDK boards and see the same thing.

Still not sure what’s causing the need for such long delay times.

jenna · December 30, 2022, 1:08am

I forgot to include an important piece of information. I am using a softdevice for BLE and flash save functionality. Everything works fine but there is a known history that SPIM3 and softdevices don’t get along very well.

SPIM3 uses specially allocated memory for the softdevice. The nRF52840 needed the 198 anomaly workaround in order to access these memory locations while softdevice is enabled but it looks like the nRF52833 was sorted out to resolve this issue. The nRF52833 SPIM3 can access the softdevice memory locations without throwing hard fault errors but these SPI transfers have much lower interrupt priority than the softdevice. Therefore the SPIM3 is completing transactions but then getting interrupted by the softdevice in between the transfers. This would explain the long SPI transaction times even with a 32MHz clock.

I’m removing the softdevice from my project and will post an update. Has anyone seen issues with SPIM3 with the softdevice enabled?

AndyA · January 3, 2023, 9:10am

OK a few things to try:

Firstly - any way you can move things so that the reply is being processed and sent on the interrupt? I know this will result in long interrupt times that could cause issues for the softdevice but it would save you several hundred us. I’ve not used the softdevice or even that processor so can’t comment on the potential impacts it could be having.

Secondly can you shrink your messages? Absolute bare bones I managed to get the messages down to between 3 and 11 bytes. That doesn’t allow much flexibility but does make for far faster transfers.

What is starttx doing? All it needs to do is transfer the data message, set the message size, and initiate the operation. 3 SPI writes and then exit. No need to wait for anything. So total time should be the time taken to transfer your message plus around 5 bytes. Maybe 20-30 us. Certainly no where near the time you are seeing. Can you probe the SPI bus to see where the time is going?

The TX done callback is helpful for logging the actual TX time but the timing of it or how quickly you process it isn’t that important. You certainly don’t need to wait for it. My code exits the TX loop as soon as the last SPI command is sent, Tx done is treated as a completely independent interrupt. All it does is store the Tx time and exit. The next Rx interrupt will then look at the stored value if it needs it for calculations.

You don’t have SPI checksums enabled do you? I have no idea what impact that may have on performance.

jenna · January 3, 2023, 11:06pm

Thanks for the advice Andy.

I was able to reduce my delay times significantly. Having a softdevice enabled will screw with the TWR process.

The softdevice was interfering with the SPI transactions. The softdevice was completing events in between the SPI transfers which made the SPI times larger than they should be. I still see long times but now they are more predictable without the softdevice enabled. I also verified that there are no unexpected SPI transfers occurring.

Our project requires large payloads. Data is embedded within each packet to store the distance between devices in the network. This is to prevent having to send out extra messages. I could reduce each packet by a few bytes but it hardly made a difference in the frame lengths. However, I did notice that trying to parse the embedded data immediately after initiating a packet transfer somehow increased these times. I’m not sure how this would affect frame lengths because it doesn’t require any use of the DW3000 or SPI bus, but when I removed the data parsing I noticed that it helped. I now wait a specified time before parsing data to allow the frame to be sent, as this is not important to the TWR process.

The only reason I cared about TX times was to see how long it took to send out packets. I’m now able to see that I cut off about 100us off each packet transfer with my new configuration. My delay times are as follows:

POLL_TX_DLY_UUS 400
POLL_RX_TO_RESP_TX_DLY_UUS 1000
RESP_RX_TO_FINAL_TX_DLY_UUS 1000

Switching to a true interrupt based response will help significantly. Hooked up to a scope, I see there’s about 250us between the time the packet it received and until it is processed which contributes to the requirement of such large delays. So in total about 750us is wasted in the overall TWR exchange.

The frame lengths I’m seeing are still about 100us - 200us longer than the calculated time given the configuration mentioned in my initial post. I’ll look to see if there are other nRF processes that may interfere with frame transfers.