STM32 to nRF52832 porting; dwt_starttx() fails always in 6.8M data rate due to long processing time

I have started porting the TREK1000 navigation example which is based on STM32 + DWM1000 to nRF52832 + DWM1000. The following are the configuration parameters:-

                   5,              // channel
                    DWT_PRF_16M,    // prf
                    DWT_BR_6M8,    // datarate
                    3,             // preambleCode
                    DWT_PLEN_128,   // preambleLength
                    DWT_PAC8,       // pacSize
                    0,       // non-standard SFD
                    (129 + 8 - 8) //SFD timeout

#define TX_ANT_DLY 16300
#define RX_ANT_DLY 16456

I just started testing with 1 tag and 1 Anchor. The POLL request would go every 109ms, but there is no response from the Anchor. Further debugging instrumentation revealed that the function dwt_starttx() fails every time. Tre1(us) = 320us @ 6.81Mbps. That means when dwt_starttx() is invoked it starts a delayed transmission (RMARKER + 320us). Since this Tx event is never generated, I thought if the time is in the past causing the failure of the packet Tx from the Anchor. In order to validate my thought, i measured the time between POLL reception by anchor 0 till the dwt_startTx() function using dwt_readsystimestamphi32(). The time difference was ~ 1.1ms which validated my hypothesis if event being in the past for dwt_starttx() to fail. To further prove this, I added the below code snippet to determine the time at which the dwt_starttx() is successful:-

    u64temp = u64temp + 100;
    instance_data[instance].fixedReplyDelayAnc = u64temp;
instance_data[instance].delayedReplyTime +=  (instance_data[instance].fixedReplyDelayAnc >> 8);

when u64temp reached 0x032d78d4 (~836us), dwt_starttx() was successful. Next I changed RX_RESPONSE1_TURNAROUND_6M81 to 1218 and the anchor response followed by Tag FINAL message was successful.

RMARKER + 320us never worked as the dwt_starttx() event was always in the past. This worked with STM32 whereas did not work in nRF52832. This means that the cpu cycles in nRF52832 (16MHz) are slower than STM32 (72MHz).

Did anybody face simillat problem? How can we get the same level of system performance in nRF52832 as in STM32; Please advise. Any configuration value needs to be changed?
Sandeep Suresh.