From: Jouni Malinen (jkmaline_at_cc.hut.fi)
Date: 2002-04-07 16:21:44 UTC
I did some profiling on which HFA384x commands are used and how long
they take to complete. RX path does not use any commands (it just
reads the received frame that is already prepared for this), but TX
path uses transmit command for each sent frame. Other commands are
used only during card initialization and/or configuration so they are
not that interesting as far as CPU usage in normal use is considered.
To say the least, I was not very happy to notice that transmit command completion took on avarage about 150 usec on my laptop with Compaq WL100. This time is spend in busy wait loop with interrupts disabled for each TX frame..
If the card is sending frames at full speed, let's say at about 5 Mbps and 1500 byte frames, there will be about 440 packets and 66 ms of busy waiting just for that one command per second. In other words, 6.6% of the (host) CPU time, no matter how fast it is, is used on busy loop. If the packet size is smaller, there will be more packets per second and larger portion of CPU time reserved for busy waiting. 6700 packets/sec would be enough to take all CPU time for busy waiting (again, no matter how fast CPU)..
This kind of loads may be quite awful on systems that have limited resources and would need to CPU for a bit more useful tasks. Also the high performance CPUs suffer a lot, but they usually have enough resources to spend (although keeping interrupts disabled for most of the time is certainly not nice).
Luckily HFA384x has a command completion event that can be used to avoid busy waiting. The transmit command can be started and then the host CPU can finish the processing after the command event has generated an interrupt. In the mean time, the host CPU is free for other tasks.
I modified Host AP driver to use this command event instead of busy waiting and the effect was quite huge on CPU use. System time used under high load situations was reduced a lot; in some test cases it dropped to about one tenth of the old value. For example, a CPU intensive task that used about 7.8 sec of CPU time on 400 MHz Pentium III took about 17.2 sec to complete (busy wait version) when sending a bit under 1000 packets per second to the wireless net. Using command interrupt without busy wait for transmit allowed the same task to be completed in 11.5 sec. There is probably still some work to be done on optimization, but the difference was already quite huge.
The changes have not yet been thoroughly tested, so I will keep the old method as the default in the next driver release. There will be a compile time definition for selecting the interrupt-based version and if it proves to be stable, I will eventually replace the old busy wait with the new code. Especially those who have had problems with dropped interrupts during high WLAN load might want to test the new version.
-- Jouni Malinen PGP id EFC895FA