7.0 CURRENT kernel's ath driver causes page fault,kernel panic (debugging kernel) - FreeBSD

This is a discussion on 7.0 CURRENT kernel's ath driver causes page fault,kernel panic (debugging kernel) - FreeBSD ; Hi, I am getting a kernel panic (page fault) caused by the FreeBSD kernel's ath driver shortly after I begin using the internet. I use FreeBSD 7.0 CURRENT. I used cvs a few days ago to update the source tree ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: 7.0 CURRENT kernel's ath driver causes page fault,kernel panic (debugging kernel)

  1. 7.0 CURRENT kernel's ath driver causes page fault,kernel panic (debugging kernel)

    Hi,

    I am getting a kernel panic (page fault) caused by the FreeBSD
    kernel's ath driver shortly after I begin using the internet. I use
    FreeBSD 7.0 CURRENT. I used cvs a few days ago to update the source
    tree (7/10/08) and rebuilt the world as well as the kernel (just got a
    brand new computer). There is very little software installed (ports
    only, also built 7/10/08), and no data. But let me describe the
    configuration and problem.

    I have recently purchased a Lenovo ThinkPad, with a ThinkPad 11a/b/g
    Wi-Fi wireless LAN Mini-PCIe card (Lenovo part #41W1685). My Lenovo
    representative claims this uses a Atheros Ar5006ex chipset (she's not
    an expert, though), but FreeBSD's dmesg detects an Atheros 5212
    chipset. The Linux oriented thinkwiki.org claims this card may use
    either chipset. I don't know who to trust; maybe someone knows the
    answer, but Google hasn't seemed to clear things up. In particular,
    from dmesg:

    ath0: mem 0xdf2f0000-0xdf2fffff irq 17 at device 0.0 on pci3
    ath0: [ITHREAD]
    ath0: using obsoleted if_watchdog interface
    ath0: Ethernet address: [Numbers]
    ath0: mac 10.3 phy 6.1 radio 10.2

    I don't know if the obsoleted statement should worry me. Google
    seemed to indicate some posts that it wasn't a big deal.

    I've built driver support into the kernel. I should note that I
    compile SMP support for the kernel. In particular, in my kernel
    configuration:
    device wlan # 802.11 support
    device wlan_wep # 802.11 WEP support
    device wlan_ccmp # 802.11 CCMP support
    device wlan_tkip # 802.11 TKIP support
    device wlan_amrr # AMRR transmit rate control algorithm
    device wlan_scan_ap # 802.11 AP mode scanning
    device wlan_scan_sta # 802.11 STA mode scanning
    device ath # Atheros pci/cardbus NIC's
    device ath_hal # Atheros HAL (Hardware Access Layer)
    device ath_rate_sample # SampleRate tx rate control for ath

    If I add "ifconfig_ath0="DHCP"" to /etc/rc.conf or run "dhclient
    ath0," the card connects and acquires an ip address from the router
    (mine is unsecured at home). This is what I get:

    DHCPREQUEST on ath0 to 255.255.255.255 port 67
    ip length 281 disagrees with bytes received 534.
    accepting packet with data after udp payload.
    DHCPNAK from 192.168.1.1
    DHCPDISCOVER on ath0 to 255.255.255.255 port 67 interval 3
    ip length 314 disagrees with bytes received 534.
    accepting packet with data after udp payload.
    DHCPOFFER from 192.168.1.1
    DHCPREQUEST on ath0 to 255.255.255.255 port 67
    ip length 314 disagrees with bytes received 534.
    accepting packet with data after udp payload.
    DHCPACK from 192.168.1.1
    bound to 192.168.1.8 -- renewal in 43200 seconds.

    I don't know what to make of those ip lengths disagreeing with bytes received...

    I can launch lynx (my favorite non-graphical browser) or firefox and
    load google. I can make a search and get results, but the kernel
    always panics by the time I try to load a third webpage (or earlier).
    This is what I get:

    Fatal trap 12: page fault while in kernel mode
    cpuid = 1; apic id = 01
    fault virtual address = 0x0
    fault code = supervisor read, page not present
    instruction pointer = 0x20:0xc0484aa6
    stack pointer = 0x28:0xe7ffe8bc
    frame pointer = 0x28:0xe7ffe928
    code segment = base 0x0, limit 0xfffff, type 0x1b
    = DPL 0, pres 1, def32 1, gran 1
    processor eflags = interrupt enabled, resume, IOPL = 0
    current process = 1427 (lynx) [I've also seen this from ath0 taskq]
    trap number = 12
    panic: page fault
    cpuid = 1
    Uptime: 51m37s
    Physical memory: 2014 MB
    Dumping 109 MB:
    Snyncing disks, vndoes remaining...[Zeros]

    I'm pretty much a freeBSD novice, so I don't know what to do as it
    prints out line after line of zeros (slowly). Normally I just shut
    the sucker down, because it seems content to print the zeros
    forever... But as FreeBSD starts up, it says "savecore: no dumps
    found." But I have compiled the kernel with debugging symbols
    (makeoptions DEBUG=-g), so maybe you can advise.

    I hope someone is able to help. Do you think I should file a problem report?

    Thanks for reading. Good night!

    Sincerely,

    -- Ned Ruggeri
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...reebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  2. Re: 7.0 CURRENT kernel's ath driver causes page fault,kernel panic (debugging kernel)

    On Fri, Jul 18, 2008 at 6:36 AM, Garrett Cooper wrote:
    > Some notes:
    >
    > 1. *blinks*... I hope you mean 8-CURRENT, not 7-CURRENT. 7 hasn't been
    > CURRENT for some months now (~6 months IIRC).


    Oh my, I am an idiot. I'm using 7-STABLE, making this the wrong list
    to ask; sorry. I guess I could repost to freebsd-stable in addition
    to filing a PR. Would that be wise?

    > 2. pciconf -lv might help with the PCI ID info. Then someone might be
    > able to tie your card back to the appropriate chipset.


    This gives me:
    ath0@pci0:3:0:0: class=0x020000 card=0x058a1014 chip=0x1014168c
    rev=0x01 hdr=0x00
    vendor = 'Atheros Communications Inc.'
    device = 'AR5212 Atheros AR5212 802.11abg wireless'
    class = network
    subclass = ethernet
    class = base peripheral

    I get 167 pages on google that contain ar5212 and 0x1014168c and
    0x058a1014. I only get one with ar5006ex instead of ar5212. I'm
    inclined to believe my Lenovo representative was wrong; she's just a
    sales rep and asked around about the part...

    > 3. KDB, DDB, WITNESS and INVARIANTS support compiled into the kernel
    > would be extremely helpful, if not required to debug your issue.


    I'm currently recompiling the kernel with these debug options:

    makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols
    options KDB
    options DDB
    options INVARIANTS
    options WITNESS

    As soon as it's done compiling, I'll try reproducing the error. I've
    added "set dumpdev="/var/crash" in /etc/rc.conf.

    > As for the actual debug process, there's a spot in the dev handbook
    > about it (http://www.freebsd.org/doc/en/books/...rneldebug.html),
    > but when I tried debugging my issue with NTFS and SMB I didn't really
    > find it helpful to be honest...


    Once I have a core dump, how should I proceed? Use kdb, and execute
    "list *[instruction pointer]" to find out what (NULL) pointer is being
    dereferenced? Run backtrace? If I post a PR, is it likely that
    someone can guide me through this? I'm fairly familiar with C, but my
    experience using debuggers is very limited...

    > You may also have to compile without SMP and with the 4BSD scheduler
    > just to see whether or not it's an issue reproducible with the ULE
    > scheduler, the driver, or something else...


    After I get the dump with the current options (+ debug options), I'll
    try w/o SMP and ULE...

    > Hopefully this gets you started on the right path...
    > -Garrett


    Thanks so much, Garrett!
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...reebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


+ Reply to Thread