Technical Document #108

Technical Document #108
Performing large PCI memory data transfers

The main methods for transferring large amounts of data between PCI-device memory addresses and a host machine's random-access memory (RAM) are block transfers (which may or may not result in PCI burst transfers), and bus-master DMA.
Block transfers are easier to implement, but DMA is much more effective and reliable for transferring large amounts of data, as explained in this document.

For general information on how to improve your driver's performance with WinDriver, refer to Technical Document #17.

Block Transfers

You can use the WinDriver WDC_ReadAddrBlock() and WDC_WriteAddrBlock() functions (or the low-level WD_Transfer() function with a string command) to perform block (string) transfers — i.e., transfer blocks of data from the device memory (read) or to the device memory (write); you can use also WDC_MultiTransfer() (or the low-level WD_MultiTransfer() function) to group multiple block transfers into a single function call. This is more efficient than performing multiple single transfers.
The WinDriver block-transfer functions use assembler string instructions (such as REP MOVSD, or a 64-bit MMX instruction for 64-bit transfers) to move a block of memory between PCI-mapped memory on the device and the host's RAM. From a software perspective, this is the most that can be done to attempt to initiate PCI burst transfers.

Burst Transfers

The hardware uses PCI burst mode to perform burst transfers — i.e., transfer the data in "bursts" of block reads/writes, resulting in a small performance improvement compared to the alternative of single WORD transfers. Some host controllers implement burst transfers by grouping access to successive PCI addresses into PCI bursts.

The host-side software has no way to control whether a target PCI transfer is issued as a burst transfer. The most the host can do is initiate transfers using assembler string instructions — as done by the WinDriver block-transfer APIs — but there's no guarantee that this will translate into burst transfers, as this is entirely up to the hardware. Most PCI host controllers support PCI burst mode for write transfers. It is generally less common to find similar burst-mode support for PCI read transfers.

64-Bit Transfers

WinDriver supports performing 64-bit transfers, using QWORD string (block) transfers, on both 64-bit and 32-bit platforms (see the WinDriver PCI User's Manual for the supported platforms). If you have 64-bit PCI hardware (card and bus), you may be able to improve the transfer rate by using 64-bit transfers, even if your host platform is only 32-bit. However, note that

  • The ability to perform actual 64-bit transfers requires that such transfers be supported by the hardware — including the CPU, the PCI card, the PCI host controller, and the PCI bridge — and it can be affected by any of these components or their specific combination.
  • The conventional wisdom among hardware engineers is that performing two 32-bit DWORD transfers is more efficient than performing a single 64-bit QWORD transfer; the reason is that the 64-bit transfer requires an additional CPU cycle to negotiate a 64-bit transfer mode, and this cycle can be used, instead, to perform a second 32-bit transfer. Therefore, performing 64-bit transfers is generally more advisable if you wish to transfer more than 64 bits of data in a single burst.


The best way to improve the performance of large PCI memory data transfers is by using bus-master direct memory access (DMA), and not by performing block transfers (which as explained above, may or may not result in PCI burst transfers).

Most PCI architectures today provide DMA capability, which enables data to be transferred directly between memory-mapped addresses on the PCI device and the host's RAM, freeing the CPU from involvement in the data transfer and thus improving the host's performance. DMA data-buffer sizes are limited only by the size of the host's RAM and the available memory.

For detailed information on DMA and how to implement it with WinDriver, refer to the WinDriver PCI User's Manual. (The low-level WinDriver DMA APIs are documented in the WinDriver PCI Low-Level API Reference.)
In addition, see the WinDriver DMA Technical Documents.