I have been experimenting with the polling method and using the receive timeout. It seems that after setting a timeout delay, you need to wait a few usecs after turning on receiver before starting to poll the status register. This delay is not shown in the examples, and may be related to my SPI driver and MCU being faster, but wanted to bring the issue up.
I double checked with the interrupt method to see what it shows, and the timeout occurs when expected based on the programmed value. In summary… when using dwt_rxenable(DWT_START_RX_IMMEDIATE), you need to wait ~10 usecs before polling the status register. When using the dwt_starttx(DWT_START_TX_IMMEDIATE | DWT_RESPONSE_EXPECTED), you need to wait until the message has completed transmission before starting polling, i.e. based on config settings and message size, you can determine how long message takes to send and appropriate delay.
I am using timers on the STM32F413 to measure microsecond times in my test code.