Improving the performance of your User Mode driver

As a general rule, transfers to memory mapped regions are faster than transfers to IO mapped regions. The reason is that WinDriver enables the user to directly access the memory mapped regions, without calling the WD_Transfer() function.

Using Direct access to memory mapped regions

After registering a memory mapped region, via WD_CardRegister(), two results are returned: dwTransAddr and dwUserDirectAddr. dwTransAddr should be used as a base address when calling WD_Transfer() to read or write to the memory region. A more efficient way to perform memory transfers would be to use dwUserDirectAddr directly as a pointer, and access with it the memory mapped range. This method enables you to read/write data to your memory mapped region without any function call overhead (i.e. Zero performance degradation).

Accessing IO mapped regions

The only way to transfer data on IO mapped regions is by calling WD_Transfer() function. If a large buffer needs to be transferred, the String (Block) Transfer commands can be used. For example: RP_SBYTE - Read Port String Byte command will transfer a buffer of Bytes to the IO port. In such a case the function calling overhead is negligible compared to the block transfer time.

In a case where many short transfers are called, the function calling overhead may increase to an extent of overall performance degradation. This may happen if you need to call WD_Transfer() more than 20,000 calls per second.

An example for such a case could be: A block of 1MB of data needs to be transferred Word by Word, where in each word that is transferred, first the LOW byte is transferred to IO port 0x300, then the HIGH byte is transferred to IO port 0x301.

Normally this would mean calling WD_Transfer() 1 million times - Byte 0 to port 0x300, Byte 1 to port 0x301, Byte 2 to port 0x300 Byte 4 to port 0x301 etc (WP_BYTE - Write Port Byte).

A quick way to save 50% of the function call overhead would be to call WD_Transfer() with a WP_SBYTE (Write Port String Byte), with two bytes at a time. First call would transfer Byte0 and Byte1 to ports 0x300 and 0x301, Second call would transfer Byte2 and Byte3 to ports 0x300 and 0x301 etc. This way, WD_Transfer() will only be called 500,000 times to transfer the block.

The third method would be by preparing an array of 1000 WD_TRANSFER commands. Each command in the array will have a WP_SBYTE command that transfers two bytes at a time. Then you call WD_MultiTransfer() with a pointer to the array of WD_TRANSFER commands. In one call to WD_MultiTransfer() - 2000 bytes of data will be transferred. To transfer the 1MB of data you will need only 500 calls to WD_Transfer(). This is 0.5% of the original calls to WD_Transfer(). The trade off in this case is the memory that is used to set-up the 1000 WD_TRANSFER commands.

Performing 64-bit data transfers

WinDriver supports 64-bit PCI data transfer on x86 platforms running 32-bit operating systems. If your PCI hardware (device and bus) is 64-bit, this feature will enable you to utilize your hardware's broader bandwidth, even though your host operating system is only 32-bit.

This innovative technology enables achieving data transfer rates previously unattainable on such platforms. Drivers developed using WinDriver will attain significantly better performance results than drivers written with the DDK or other driver development tools. To date such tools do not enable 64-bit data transfer under x86 platforms running 32-bit operating systems. Jungo's benchmark performance testing results for 64-bit data transfer indicate a significant improvement of data transfer rates compared to 32-bit data transfer, guaranteeing that drivers developed with WinDriver and KernelDriver achieve far better performance than 32-bit data transfer normally allows.