10.2. Improving the Performance of a User-Mode Driver

As a general rule, transfers to memory-mapped regions are faster than transfers to I/O-mapped regions, because WinDriver enables you to access memory-mapped regions directly from the user mode, without the need for a function call, as explained in Section 10.2.1.

In addition, the WinDriver APIs enable you to improve the performance of your I/O and memory data transfers by using block (string) transfers and by grouping several data transfers into a single function call, as explained in Section 10.2.2.

10.2.1. Using Direct Access to Memory-Mapped Regions

When registering a PCI card, using WDC_xxxDeviceOpen() (PCI [B.3.17] / ISA [B.3.18]) or the low-level WD_CardRegister() function (see the WinDriver PCI Low-Level API Reference), WinDriver returns both user-mode and kernel-mode mappings of the card's physical memory regions. These addresses can then be used to access the memory regions on the card directly, either from the user mode or from the kernel mode (respectively), thus eliminating the context switches between the user and kernel modes and the function calls overhead for accessing the memory.

The WDC_MEM_DIRECT_ADDR macro [B.4.4] provides the relevant direct memory access base address — user-mode mapping when called from the user-mode / kernel-mode mapping when called from a Kernel PlugIn driver [11] — for a given memory address region on the card. You can then pass the mapped base address to the WDC_ReadMem8/16/32/64 [B.3.24] and WDC_WriteMem8/16/32/64 [B.3.25] macros, along with the desired offset within the selected memory region, to directly access a specific memory address on the card, either from the user mode or in the kernel.
In addition, all the WDC_ReadAddrXXX() [B.3.26] and WDC_WriteAddrXXX() [B.3.27] functions — with the exception of WDC_ReadAddrBlock() [B.3.28] and WDC_WriteAddrBlock() [B.3.29] — access memory addresses directly, using the correct mapping, based on the calling context (user mode/kernel mode).

When using the low-level WD_xxx() APIs, described in the WinDriver PCI Low-Level API Reference, the user-mode and kernel-mode mappings of the card's physical memory regions are returned by WD_CardRegister() within the pTransAddr and pUserDirectAddr fields of the pCardReg->Card.Item[i] card resource item structures. The pTransAddr result should be used as a base address in calls to WD_Transfer()or WD_MultiTransfer() or when accessing memory directly from a Kernel PlugIn driver [11]. To access the memory directly from your user-mode process, use pUserDirectAddr as a regular pointer.

Whatever the method you select to access the memory on your card, it is important to align the base address according to the size of the data type, especially when issuing string transfer commands. Otherwise, the transfers are split into smaller portions.
The easiest way to align data is to use basic types when defining a buffer, i.e.:

BYTE buf[len];      /* for BYTE transfers - not aligned */
WORD buf[len];      /* for WORD transfers - aligned on a 2-byte boundary */
UINT32 buf[len];    /* for DWORD transfers - aligned on a 4-byte boundary */
UINT64 buf[len];    /* for QWORD transfers - aligned on a 8-byte boundary */

10.2.2. Block Transfers and Grouping Multiple Transfers

To transfer large amounts of data to/from memory addresses or I/O addresses (which by definition cannot be accessed directly, as opposed to memory addresses — see Section 10.2.1), use the following methods to improve performance by reducing the function calls overhead and context switches between the user and kernel modes:

  • Perform block (string) transfers using WDC_ReadAddrBlock() [B.3.28] / WDC_WriteAddrBlock() [B.3.29], or the low-level WD_Transfer() function (see WinDriver PCI Low-Level API Reference).
  • Group several transfers into a single function call, using WDC_MultiTransfer() [B.3.30] or the low-level WD_MultiTransfer() function (see the WinDriver PCI Low-Level API Reference).

10.2.3. Performing 64-Bit Data Transfers

The ability to perform actual 64-bit transfers is dependent on the existence of support for such transfers by the hardware, CPU, bridge, etc., and can be affected by any of these factors or their specific combination.

WinDriver supports 64-bit PCI data transfers on the supported Windows and Linux 64-bit platforms (see Appendix A for a full list), as well as on Windows and Linux 32-bit x86 platforms.

If your PCI hardware (card and bus) is 64-bit, the ability to perform 64-bit data transfers on 32-bit platforms will enable you to utilize your hardware's broader bandwidth, even if your host operating system is only 32-bit.

This innovative technology makes possible data transfer rates previously unattainable on 32-bit platforms. Drivers developed using WinDriver will attain significantly better performance results than drivers written with the WDK or other driver development tools. To date, such tools do not enable 64-bit data transfer on x86 platforms running 32-bit operating systems. Jungo's benchmark performance testing results for 64-bit data transfer indicate a significant improvement of data transfer rates compared to 32-bit data transfer, guaranteeing that drivers developed with WinDriver will achieve far better performance than 32-bit data transfer normally allows.

You can perform 64-bit data transfers using any of the following methods:

  • Call WDC_ReadAddr64() [B.3.26] or WDC_WriteAddr64() [B.3.27].
  • Call WDC_ReadAddrBlock() [B.3.28] or WDC_WriteAddrBlock() [B.3.29] with an access mode of WDC_SIZE_64 [B.3.1.4].
  • Call WDC_MultiTransfer() [B.3.30] or the low-level WD_Transfer() or WD_MultiTransfer() functions (see WinDriver PCI Low-Level API Reference) with QWORD read/write transfer commands (see the documentation of these functions for details).

You can also perform 64-bit transfers to/from the PCI configuration space using WDC_PciReadCfg64() [B.3.38] / WDC_PciWriteCfg64() [B.3.39] and WDC_PciReadCfgBySlot64() [B.3.36] / WDC_PciWriteCfgBySlot64() [B.3.37].