DWM3000EVB Raw CIR Capture Issues

Hello,
I have 2 DWM3000EVB boards, attached to nRF52840 development kits, and I’m trying to capture raw channel impulse response. I have run into some issues while doing so, and I need some help. Here is the procedure -if anyone else is also interested-

  1. I have compiled and run the simple TX code in one of the boards. I set the inter-delay period to 1s, so that I can capture the frame fully. I have also experimented with other periods, and my issue does not seem to be related to this period.

  2. I am running rx with diagnostics example on the second device, and I have modified the diagnostics code as follows to output real/imaginary part of the channel impulse response. This is based on some other code snippets in the forum, but modified for the DWM3000EVB board based on the API documentation.

            int8_t *cir;
            int8_t cir_len = 992 * 6 + 1;
            cir = (int8_t *)malloc(cir_len * sizeof(int8_t));
            int32_t real = 0;
            int32_t imag = 0;
            dwt_readaccdata(cir, cir_len, 0);
            int8_t cidx = 0;

            for (int ijk = 0; ijk < 992; ijk++)
            {
                cidx = 6 * ijk;
                real = (int32_t)cir[cidx+3] << 16 | (int32_t)cir[cidx+2] << 8 | (int32_t)cir[cidx+1]; /*printf("%i,",real);*/
                imag = (int32_t)cir[cidx+6] << 16 | (int32_t)cir[cidx+5] << 8 | (int32_t)cir[cidx+4];    /*printf("%u\n",imag);*/

                //amp = max(abs(real), abs(imag)) + min(abs(real), abs(imag)) / 4;
                //printf("idx %d:", ijk);
                printf("%d ", real);
                printf("%d ", imag);

                //printf("%f,,,,,,,,,\n", amp);
                //printf("\r \n");
            }
            free(cir);
  1. I’m reading the output through JLink RTT Viewer, and I can see the CIR values displayed on the screen, save the output as a log file.

The issues and questions are as follows:
When I print the CIR, I can only see roughly 400 samples, instead of 992. I tried printing out the indices in the above script, and realized that some random indices are missing. E.g. it jumps from 55 to 105, and the jumps seem to be random. Instead of reading all data at once from accumulator (992*6+1 samples), I tried reading them one-by-one in the for loop, but this also gives a similar result.
Q1: How can I read the CIR without any package loss? Is there a better way to capture the CIR, instead of printing the values?
Q2: How can I read the output of the display without JLink RTT viewer, and what are good sources to read about that part further?
Q3: I’ve seen people verifying the CIR information by displaying, and checking values for DWM1000. Since the data format is different here (each sample is 3 bytes, instead of 2), how can I check if the CIR is corrupted or not? E.g. I have seen some discussions about the CIR values being less than 30,000 etc, which may not be applicable here due to 18-bit samples.
Q4: My application requires high sampling rate, and eventually I’d like to reduce the inter-frame periodicity. At this point, what would be the practical limits on the device?

Thanks!

For what it’s worth here’s my code for this that seems to work fine. I call this function in a loop reading blocks of 16 values. No specific reason why, it’s written so it should work fine for anything from 1 value to all of them at once. I just used smaller blocks when debugging it and never increased them once it was working because it wasn’t speed critical.

Keep in mind this is a lot of data, quick back of the envelope numbers, assuming you run the SPI bus at 30 MHz you’re still looking at over 1.5 ms to read all the data. Reading the full CIR is not very compatible with high sample rates. I only ever use it in special diagnostics modes rather than normal operation for this reason.

Depending on the application you could read the leading edge location & peak location and then only read the CIR data for that region, that would significantly reduce the time required assuming it gives you the information you need.

/// read len CIR values starting at offset
/// store magnitudes in float array CIRValues
bool Dw1000::readDW3000CIRData(float *CIRValues, int len, int offset) {

    // DW3000 is 3 byte I, 3 byte Q (18 bits per value) with a 1 byte blank at the start. 
    // DW1000 is 2 byte I, 2 byte Q with a 1 byte blank at the start. 
    //	Max of 992 or 1016 points (16/64 prf)

    if ((len + offset)>1016)
        len = 1016-offset;
    if (len <1) 
        return false;

    const int bytesPerValue = 6; // dw3000


    uint8_t cir_buffer[1016*bytesPerValue+1]; // worst case
    dwt_readaccdata(cir_buffer, len*bytesPerValue+1, offset);
    int byteAddress = 1;
    for (int i=0;i<len;i++) {
        int32_t iValue = cir_buffer[byteAddress++];
        iValue |= ((int32_t)cir_buffer[byteAddress++]<<8);
        iValue |= ((int32_t)(cir_buffer[byteAddress++] & 0x03)<<16);

        int32_t qValue = cir_buffer[byteAddress++];
        qValue |= ((int32_t)cir_buffer[byteAddress++]<<8);         
        qValue |= ((int32_t)(cir_buffer[byteAddress++] & 0x03)<<16);
	
        if (iValue & 0x020000)  // MSB of 18 bit value is 1
            iValue |= 0xfffc0000;
        if (qValue & 0x020000)  // MSB of 18 bit value is 1
            qValue |= 0xfffc0000;
        *(CIRValues+i) = sqrt((float)(iValue*iValue+qValue*qValue));
    }
    return true;	
}
2 Likes

I appreciate the suggestions and the code shared. I have used this code to capture CIR amplitude (as in the code), or just the real/imaginary part. I still cannot capture an entire CIR stream, and this code actually made the package loss issue a little bit worse, which may be due to sqrt operation - I remember using an approximation-, or the large memory allocation at the beginning, especially when calling this function repeatedly. I tried to read the buffer at once using this code, but the package loss is still there.

Since I’m still using a similar code for printing the CIR values, is it possible that printf() is creating the actual bottleneck? It may be a trivial question, but is there a better way to read and output the data from the device? I’m not very familiar with these devices, so, any reference is appreciated

Also, reading some portion of CIR is a good suggestion, as I indeed do not need to read everything. On the other hand, the stable duration is not too long (e.g. after first 50 values, some CIR values are lost), and it may not be long enough. Currently I’m using the TX at 1Hz, and I guess the problem can get worse easily with higher SR. I’m hoping to fix this part first, before getting into reading the CIR partially.

Thanks again!

How are you outputting this? Over a serial port to a PC?

Serial ports are painfully slow. Is it possible that your serial port code doesn’t have sufficient buffering and so that is dropping the data? In that situation I’d normally expect random bytes to drop rather than whole messages but it does depend a little on how the buffering is done. On the PC side use an FTDI based USB port adaptor rather than the one built into the PC (if it still has one). The USB ones have more buffering internally. At least the good ones do, I’ve seen some cheap ones drop data all over the place if you try to push too much through them. This buffering can be a pain for accurate receive timing but makes it far less likely that you’ll drop data at the PC end.

printf is fairly slow but nothing compared to a serial port, it’s also not thread safe so you shouldn’t call it from an interrupt, only the main loop.
sqrt (or any floating point operation) is slow on a micro that doesn’t have hardware floating point (100’s of cycles) so yes, that’s a big hit on some parts. If you can stick to integer operations then you’ll get a big speed up. I have seen abs(largest value) + abs(smallest value/4) used as an approximation of the magnitude.

The memory allocation in that code is on the stack so as long as you aren’t running out of memory it’s not going to be a significant performance hit and there is no risk of it not getting freed up as soon as the function exits. It’s a far faster and safer way to allocate a temporary buffer on an embedded system than using malloc and then free.

In terms of better ways to do it. No there isn’t a better way to read the data from the device. There may however be a better way of getting the data to the outside world depending on your application. e.g. if you send the raw binary data to the outside world then that will be far less data than converting it to text using printf. Ideally you’d put the raw data into a large buffer and exit your DW chip handling code. And then in the background idle loop you can process the contents of that buffer in whatever way you want.

Thanks for the reply, and the insights about the speed of different code pieces in the system. I connect to the device through the USB port on the Nordic board, and as far as I understand, I can either listen to the packages through UART, or the aforementioned JLink- which I’m using the latter.

I have tried to modify the code to find the bottleneck, it appears there are two issues. One is trying to read too many samples at once from the accmulator, and the second seems to be the speed during printing.Here are my findings:

  • If I try to read more than 33 symbols of CIR at once, reading to the memory fails, and the device only prints zeros. This seems to be related to the issue below. They talk about the SPI buffer possibly being too small, but I did not find any explicit solution. Incorrect reading CIR value from register
  • If I only print the index in the for loop (without any bit shifting/writing to memory etc., a very simple for loop), some of the indices are lost randomly. The device cannot even print (or maybe the serial link cannot capture) all of the data. However, if I add a Sleep(1) -1ms pause- within the for loop, I can print all indices successfully (1 to 992).
  • Likewise, if I read the data in chunks (smaller than 33), I can read the entire cir. On the other hand, I managed to make this work with 50ms Sleep in between each dwt_readaccdata. If I make this Sleep duration shorter, I start to miss the CIR packages again (maybe b/c of the print issue, or b/c of reading acc being slow, or data loss). On the other hand, CIR values do not seem to be meaningful, and the values jump abruptly in a periodic fashion. I have attached two figures, one is the absolute value (fig1), and the other one is the value difference between two packages (fig2) for full cir. I feel there might be a bit shift, that is corrupting all of the numbers but I’m not sure. Total number of samples captured to be as expected, so it should not be an issue with the printf command.The location of this shifting (or change), seems to happen randomly. I have tried to print a subset of the values (fig3: 640 samples, 300 before the peak location, fig4: 288 samples, 60 before the peak location) by extracting the peak, and printing nearby values, but they do not reveal anything either.

Thanks again
fig1_absolute_value_full_cir
fig2_diff_cir_packages
fig3_640tap_subset_cir
fig4_288tap_subset_cir

When reading the CIR you should be able to read all of the values in one go if you are reading them directly into a large enough buffer in processor internal memory. The internal memory will be far faster than the SPI link. The speed issues should only arise when trying to output from the processor memory to the outside world.
Are you correctly handling negative values? One of the most common issues I’ve seen is people not correctly converting the 18 bit signed values into 32 bit signed values.

And you seem to have an issue with reading the CIR peak. For any reasonable line of sight configuration and signal level the leading edge will be around an index of 730-740 with the peak a few bins later. I normally run with a PRF of 64 MHz so at 16 MHz this may be a little earlier but it should still be in the same region.

  1. I also believe I should be able to read all of the values to memory at once, but even without any mathematical operation or printing values, I still have the issue. If read the entire CIR at once, and print just a few numbers afterwards, I can observe that the values are all zeros. Maybe I have an issue with the MCU, I’ll try to reinstall the SDK/drivers.

  2. I believe so. I have written a separate parser (runs offline on my desktop after receiving hexadecimal representation of the bits) to convert the 8-bit numbers into a 32-bit representation, and then process it as signed 32-bit numbers, with appropriate operations -similar to the sample code as well-. I have made some basic tests, it was working as expected.

  3. Peak location is actually correct (from dwt_readdiagnostics fnc), I have checked the location of the peak, it was around 740. As I was not able to read the entire CIR, I only read nearby bins, and plotted them in the figures. Last figure reads 60 bins before the peak (which is again around 740), and another 240 bins. The CIR Tap/bins should have been from 680 to 968. (likewise for fig 3,). But in terms of the CIR shape (as in the first figure), it does not make sense, as you noted. Also, I see that similar pattern, no matter how I read, like some values seem to get corrupted after a while. -or maybe something else is happening-.

Also, I put the wrong link in the previous post, I was supposed to share the following:

Many thanks

Is there any chance you have signal integrity issues with the module that are resulting in bad data when reading large blocks? I would expect this to cause more fundamental issues than bad CIR data but it may be worth trying turning the SPI speed down just to make sure that doesn’t have any impact.

As I indicated the code I gave has worked for me on the DWM3000 to the fundamental API appears to be sound. That implies it is something with either the hardware or firmware in your implementation but I’m running out of ideas as to what it could be.

Hello mzozt,
Did you solve the problem of reading entire memory? Could you please reply my concern about more/less same issue here?

Best regards,
Jogi

Hi, Andy! Thanks for your sharing. I am using almost the same code but in python. Seems like in python int32 here is not enough for doing ^2 +^2 calculation. But when I correct int32 to int64 in pthon. The CIR result is very strange. I think then my bytes endian process and transfering to signed 18 bits process is wrong. I hope maybe you can help me out my situation. Here is the link to my topic: CIR data from DWM3000 is very strange