Re: SMP instability


From: Jouni Malinen (jkmaline_at_cc.hut.fi)
Date: 2002-07-29 16:54:24 UTC



On Sun, Jul 28, 2002 at 04:49:31AM -0700, Aaron Kurtz wrote:

> I'm running the latest CVS driver with a Linksys WMP11 (1.3.4 secondary
> firwmare, btw.) in Master mode. When it's in SMP mode (2.4.18 kernel),
> downloading any large file has a tendency to crash the machine hard in a
> few seconds, although uploading seems to work just fine. It spits out
>
> wlan0: Interrupt, but SWSUPPORT0 does not match: 8A32 != 8A32 - card
> removed?

Looks like SWSUPPORT0 can return invalid values even without PCI bus mastering.. When the driver read the values for the first time, it got something else than 0x8a32 (probably 0x0000) and the second one (the one in that debug message) was already fixed. That verification in prism2_interrupt() could actually be removed completely from hostap_pci.o (it cannot be removed during operation like PC Cards).

> hostap_pci: wlan0: resetting card
>
> right before it dies. When I run it in UP mode, it works as an Access
> Point perfectly. What other information is needed for debugging
> purposes? I'd really love to see this work properly in SMP mode.

Even though this reset would probably not have been needed, it should certainly not kill the host system. I haven't seen this happening on my tests, but I will try to force similar cases and check that the card resetting code does not at least crash the host system.

I will be mostly away from home (and my SMP test env) during next few weeks, so I will probably not be testing these issues during that time. However, I just noticed a bug in TX path that locked a spinlock without disabling interrupts even though the same spinlock is locked in hw irq handler. This can lead to deadlocks and may have caused some of the reported hangs with SMP. Untested fix is in the CVS now.

To be able to do something about the reported problems, I would like to get resolved (i.e., function names) call trace of the crash/hang (Oops, NMI watchdog, wait_on_irq). I've added here some instructions on how to get useful error messages. This is based on what I usually do, but there are certainly also other methods.

Change to text mode if running X, configure all debug messages to console (e.g., with Alt-SysRq-9 if Magic SysRq is enabled or by configuring syslogd/klogd). In addition, enable NMI watchdog to catch some deadlock cases by adding nmi_watchdog=1 to kernel command line (see Documentation/nmi_watchdog.txt for more detailed information).

Assuming the kernel does not hang completely, the resulting call trace from the problem situation can then be recorded using serial console or by writing down the screen contents manually. If the system is still somewhat usable (e.g., after an Oops due to NULL pointer dereference from process context), this info can be saved to file from 'dmesg' output. Please also include whatever previous debug messages are available from the Host AP driver.

If the system is partially usable after the crash, it is usually possible to pipe the error message through ksymoops to get the symbols resolved also for the Host AP driver modules. If this is not possible, some preparations before the crash would be useful. The needed information can be saved, e.g., with following commands when loading the modules:

insmod -m hostap_crypt.o > hostap_crypt.o.map
insmod -m hostap.o > hostap.o.map
insmod -m hostap_pci.o > hostap_pci.o.map
cat /proc/modules > proc.modules
cat /proc/ksyms > proc.ksyms
sync

After this, do whatever is needed to crash/hang the system and record the error message (double check at least call trace values). After having rebooted the system, pipe the recorded error message to ksymoops with following options to use the saved data:

cat err.msg | ksymoops -k proc.ksyms -l proc.modules > err.msg.sym

(also make sure that System.map is correct for the current kernel; this can be changed with -m option)

Getting similar data (i.e., *.map, proc.modules, err.msg, and err.msg.sym) with the error report detailing the steps leading to crash would be quite useful. Especially if there are some problems running ksymoops, please also inclode proc.ksyms and both System.map and vmlinux from the kernel compilation (this will be created in the root directory of the kernel source tree).

It might be better not to send large files to the mailing list, so please send these (preferably as a compressed tar packet) to my mail box (jkmaline_at_cc.hut.fi). Description of the problem and the resolved error message from ksymoops might also be nice on the mailing list so that others could comment if they have noticed something similar.  

-- 
Jouni Malinen                                            PGP id EFC895FA


This archive was generated by hypermail 2.1.4.