[DWM3001CDK] How to achieve continuous, high-frequency CIR data streaming over UART?

Hi Qorvo Support Team,
I am working on a research project using the DWM3001CDK development kit. My primary goal is to continuously stream the raw Channel Impulse Response (CIR) data to a host PC for real-time analysis of human activity.
Current Setup:
Hardware: Two DWM3001CDK boards (one Initiator, one Responder).
Firmware: Based on the DWM3001CDK-DW3_QM33_SDK-FreeRTOS example.
Modification: I have successfully modified the firmware on the Responder board to read the CIR data from the accumulator. Specifically, within the mcps_rx_cb callback function, I am calling dwt_readaccdata() to get the full CIR (approx. 4KB) and then printing it to the UART.
The Problem:
While I can successfully read and print the CIR data, I am facing a significant bottleneck. The system exhibits the following behavior:
It prints a few CIR data blocks in a quick burst.
It then completely freezes for a long period (approximately 15-17 seconds).
After the long pause, the cycle repeats.
This behavior prevents me from achieving the continuous, high-frequency data stream required for my research.
Troubleshooting Steps Already Taken:
Based on my investigation and similar issues reported in the forums, I have already tried the following, which did not solve the long pause issue:
Increased UART Baud Rate: I modified the UART initialization code to operate at 1,000,000 baud and confirmed the setting in Tera Term.
Increased Task Stack Size: Suspecting a stack overflow, I significantly increased the MCPS_TASK_STACK_SIZE_BYTES to 8192 bytes.

My Core Question:
Given that printing 4KB of data, even over a fast UART, is a relatively slow operation, what is the recommended architectural approach to continuously stream the full CIR data from the DWM3001CDK without causing the entire system to halt or reset (likely due to a watchdog timer)?
Is there a specific configuration, a more efficient printing method (e.g., DMA with RTT), or a fundamental flaw in my approach of trying to stream the entire CIR at a high frequency?
Any guidance, code examples, or pointers to relevant application notes would be immensely appreciated.
Thank you for your time and support.
Best regards,

Munther Abdulameer

How fast are you attempting to get the CIR data?

On the DWM3000 each CIR sample is 6 bytes containing 2 18 bit values. So depending on how you output it from your system that puts it between 4.5 and 6 bytes per sample. Slightly over the 4k per CIR that you said.

Assuming 4k bytes per CIR for simplicity and running the uart at 1,000,000 baud you can get 10e6 / (4000*10) = 25 CIR data sets per second. Or to put it another way you need to leave an absolute minimum of 40ms between each CIR that you attempt to send over serial. Generally running a UART at 100%, especailly at higher rates, results in syncronisation issues. You need occasional gaps in the data to let things re-sync. So realisitically a basic approach is going to be capped at around 20 outputs per second.

If you try to output faster than the maximum then data will be going into the UART output buffer faster than it can output it. If that happens for any length of time it doesn’t matter how much buffer you allow, it’s going to fail.
Two things to get the current system stable: 1) change your code so that when you get a new CIR you check if there is space for it in the output buffer. If there isn’t discard it. 2) Make sure the average rate at which you try to collect CIR data is less than the theoretical max for your UART output rate.

If you need data at a higher rate then look at some form of compression. Maybe you only need the magnitude rather than both the I and Q? In that case calculate it on the unit and send that, you’ve just halved the amount of data. Worst case you’re at 19 bits per value, if you don’t mind clipping if it hits the max or dropping the least significant couple of bits then you can get that down to 16 bits per sample.
On top of that the first 3/4 of the data, before the reception time, is generally fairly low values. You could probably get away with less bits for that period. Or possibly only send the changes between values.