Re: firefox3-bin crashes near arc4random_buf() - FreeBSD
This is a discussion on Re: firefox3-bin crashes near arc4random_buf() - FreeBSD ; On Sat, Oct 04, 2008 at 08:10:24PM +0400, Andrey Chernov wrote:
> On Sat, Oct 04, 2008 at 01:05:11AM -0700, Jos Backus wrote:
> > For a few weeks now firefox3-bin has been crashing semi-regularly for me.
> > Backtrace ...
-
Re: firefox3-bin crashes near arc4random_buf()
On Sat, Oct 04, 2008 at 08:10:24PM +0400, Andrey Chernov wrote:
> On Sat, Oct 04, 2008 at 01:05:11AM -0700, Jos Backus wrote:
> > For a few weeks now firefox3-bin has been crashing semi-regularly for me.
> > Backtrace attached. I selected `Build a debugging image' but the resulting
> > binary is stripped, so no symbols.
> > #3 0x28237381 in XRE_InitEmbedding () from /usr/local/lib/firefox3/libxul.so
> > #4
> > #5 0x2a39eb2d in arc4random_buf () from /lib/libc.so.7
> > #6 0x2a39aa7d in dbopen () from /lib/libc.so.7
> > #7 0x2a39973a in __srget () from /lib/libc.so.7
> > #8 0x2a39ab49 in dbopen () from /lib/libc.so.7
> > #9 0x2a39916f in __srget () from /lib/libc.so.7
> > #10 0x2a39c220 in __hash_open () from /lib/libc.so.7
> > #11 0x2aae9b9c in ?? () from /usr/local/lib/firefox3/libnssdbm3.so
>
> It looks like stack damaged at this moment. No libc functions, including
> db* functions calls arc4random_buf().
I was surprised to see that, too. The problem is perfectly repeatable on my
system. I tried building firefox3 using
WITH_DEBUG=true STRIP= make deinstall reinstall clean
but the resulting binary is still stripped:
lizzy:~% file /usr/local/lib/firefox3/firefox-bin
/usr/local/lib/firefox3/firefox-bin: ELF 32-bit LSB executable, Intel 80386,
version 1 (FreeBSD), for FreeBSD 8.0 (800049), dynamically linked (uses shared
libs), FreeBSD-style, stripped
lizzy:~%
A few weeks ago, after these crashes had started happening, I rebuilt most
ports on this machine, hoping it would fix the issue, but it has not.
Any suggestions on how to debug this?
--
Jos Backus
jos at catnook.com
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
-
Re: firefox3-bin crashes near arc4random_buf()
On Sat, Oct 04, 2008 at 05:49:06PM -0700, Tim Kientzle wrote:
> First, you need to share the first items in the
> backtrace, as they're more likely to be correct.
> I agree with Andrey that it looks like there's
> some stack corruption, so it's hard to trust
> anything except the first couple of entries.
Attached is a tarball containing firefox3.gdb which has the full output of
`bt'. Unfortunately it doesn't tell me very much more.
> You should also look at several independent core
> dumps and see how much the backtraces have in common.
I watched it crash a bunch more times and the backtraces are the same. That's
good, right? :-)
> It might also be worth running it under ktrace,
> forcing the crash, then sharing the last few dozen
> lines from kdump output.
Also attached is firefox3.kdump. The last few lines look like:
6855 firefox-bin RET clock_gettime 0
6855 firefox-bin CALL _umtx_op(0x8179760,0x8,0x1,0x8179740,0xbf8fdddc)
6855 firefox-bin PSIG SIGSEGV caught handler=0x28237290 mask=0x0 code=0x1
6855 firefox-bin CALL unlink(0x8179600)
6855 firefox-bin NAMI "/home/jos/.mozilla/firefox/tosfxhak.default/lock"
6855 firefox-bin RET unlink 0
6855 firefox-bin CALL sigaction(SIGSEGV,0x2978dfb4,0)
6855 firefox-bin RET sigaction 0
6855 firefox-bin CALL sigprocmask(SIG_UNBLOCK,0xbf4f906c,0)
6855 firefox-bin RET sigprocmask 0
6855 firefox-bin CALL thr_kill(0x1878c,SIGSEGV)
6855 firefox-bin RET thr_kill 0
6855 firefox-bin PSIG SIGSEGV SIG_DFL
6855 firefox-bin NAMI "firefox-bin.core"
6855 firefox-bin RET poll -1 errno 4 Interrupted system call
6855 firefox-bin RET _umtx_op -1 errno 4 Interrupted system call
6855 firefox-bin RET _umtx_op -1 errno 4 Interrupted system call
6855 firefox-bin RET _umtx_op -1 errno 60 Operation timed out
6855 firefox-bin RET _umtx_op -1 errno 4 Interrupted system call
6850 sh RET wait4 6855/0x1ac7
6850 sh CALL write(0x1,0x814e400,0x21)
6850 sh GIO fd 1 wrote 33 bytes
"Segmentation fault (core dumped)
"
6850 sh RET write 33/0x21
6850 sh CALL exit(0x8b)
6846 sh RET wait4 6850/0x1ac2
6846 sh CALL exit(0x8b)
This to me suggests that the segfault happens inside _umtx_op. Am I reading
that correctly?
Thanks for looking into this!
--
Jos Backus
jos at catnook.com
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
-
Re: firefox3-bin crashes near arc4random_buf()
On Mon, Oct 06, 2008 at 02:57:39AM +0300, Giorgos Keramidas wrote:
> Unfortunately, tarballs are stripped off by the list software.
>
> Can you upload this online somewhere and point us to a URL?
Oops, thanks for reminding me, Giorgos. Tarball at
http://lizzy.dyndns.org/~jos/firefox3.crash.tgz
--
Jos Backus
jos at catnook.com
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
-
Re: firefox3-bin crashes near arc4random_buf()
> I watched it crash a bunch more times and the backtraces are the same. That's
> good, right? :-)
Yes. For a suitable definition of "good." ;-)
>>It might also be worth running it under ktrace,
>>forcing the crash, then sharing the last few dozen
>>lines from kdump output.
>
> Also attached is firefox3.kdump. The last few lines look like:
>
> 6855 firefox-bin RET clock_gettime 0
> 6855 firefox-bin CALL _umtx_op(0x8179760,0x8,0x1,0x8179740,0xbf8fdddc)
> 6855 firefox-bin PSIG SIGSEGV caught handler=0x28237290 mask=0x0 code=0x1
> 6855 firefox-bin CALL unlink(0x8179600)
> 6855 firefox-bin NAMI "/home/jos/.mozilla/firefox/tosfxhak.default/lock"
> 6855 firefox-bin RET unlink 0
> 6855 firefox-bin CALL sigaction(SIGSEGV,0x2978dfb4,0)
> 6855 firefox-bin RET sigaction 0
> 6855 firefox-bin CALL sigprocmask(SIG_UNBLOCK,0xbf4f906c,0)
> 6855 firefox-bin RET sigprocmask 0
> 6855 firefox-bin CALL thr_kill(0x1878c,SIGSEGV)
> 6855 firefox-bin RET thr_kill 0
> 6855 firefox-bin PSIG SIGSEGV SIG_DFL
>
> This to me suggests that the segfault happens inside _umtx_op. Am I reading
> that correctly?
Not necessarily. Firefox is multi-threaded. The thread that
called _umtx_op() is not the thread that crashed (_umtx_op()
hadn't returned to userspace, so that thread was still in
the kernel).
This does, however, answer one puzzle: Firefox appears to
have a signal handler that catches SEGV, releases the lock
file, then re-throws SEGV to actually kill the program.
That explains stack frames #0-#4 in your backtrace; that's
the signal handler executing after the segfault but before
the program is terminated.
Something is still screwy about the backtrace. dbopen()
doesn't call arc4random_buf. However, it does call
mkstemp() which does call arc4random_uniform, which should
be right next to arc4random_buf in memory. GCC optimizations
could be obscuring the call stack here.
It's certainly possible that arc4random is involved
somehow but I don't yet see it. It does seem likely
that we're looking at a libc problem, so a debug
version of libc might help. Replacing libc on a
running system is a little tricky. I believe the
following works, though I've not tried it:
% cd /usr/src/lib/libc
% make clean
% make DEBUG_FLAGS=-g
% cp /lib/libc.so.7 /lib/libc.so.7-backup
.... reboot to single user, use /rescue/sh as your shell ...
% cp /usr/src/lib/libc/libc.so.7 /lib/libc.so.7
.... reboot ...
This should give you a standard libc with full
debugging symbols. Hopefully, the backtrace will
now give more details.
I think we're getting closer.
Tim
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
-
Re: firefox3-bin crashes near arc4random_buf()
On Sun, Oct 05, 2008 at 05:34:22PM -0700, Tim Kientzle wrote:
> > This to me suggests that the segfault happens inside _umtx_op. Am I reading
> > that correctly?
>
> Not necessarily. Firefox is multi-threaded. The thread that
> called _umtx_op() is not the thread that crashed (_umtx_op()
> hadn't returned to userspace, so that thread was still in
> the kernel).
>
> This does, however, answer one puzzle: Firefox appears to
> have a signal handler that catches SEGV, releases the lock
> file, then re-throws SEGV to actually kill the program.
> That explains stack frames #0-#4 in your backtrace; that's
> the signal handler executing after the segfault but before
> the program is terminated.
Understood.
> Something is still screwy about the backtrace. dbopen()
> doesn't call arc4random_buf. However, it does call
> mkstemp() which does call arc4random_uniform, which should
> be right next to arc4random_buf in memory. GCC optimizations
> could be obscuring the call stack here.
>
> It's certainly possible that arc4random is involved
> somehow but I don't yet see it. It does seem likely
> that we're looking at a libc problem, so a debug
> version of libc might help. Replacing libc on a
> running system is a little tricky. I believe the
> following works, though I've not tried it:
>
> % cd /usr/src/lib/libc
> % make clean
> % make DEBUG_FLAGS=-g
> % cp /lib/libc.so.7 /lib/libc.so.7-backup
> ... reboot to single user, use /rescue/sh as your shell ...
> % cp /usr/src/lib/libc/libc.so.7 /lib/libc.so.7
chflags noschg /lib/libc.so.7
/rescue/cp /usr/obj/usr/src/lib/libc/libc.so.7 /lib/libc.so.7
> ... reboot ...
>
> This should give you a standard libc with full
> debugging symbols. Hopefully, the backtrace will
> now give more details.
>
> I think we're getting closer.
Yeah. Oddly enough the debug version seems to make a difference; firefox3
hasn't crashed yet. Normally even without touching it firefox3 will segfault
within an hour or so. I will leave it up all night to see what happens.
Thanks, Tim. I'll keep you posted.
--
Jos Backus
jos at catnook.com
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
-
Re: firefox3-bin crashes near arc4random_buf()
> Yeah. Oddly enough the debug version seems to make a difference; firefox3
> hasn't crashed yet. Normally even without touching it firefox3 will segfault
> within an hour or so. I will leave it up all night to see what happens.
Either, as Peter Jeremy suggested, using -g changed
the compile or else you've built different sources.
Have you updated your source since you last updated libc?
Tim
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
-
Re: firefox3-bin crashes near arc4random_buf()
> Before following your instructions, I cvsupped, ran a `make kernel' and booted
> into the new kernel. My userland is from Oct 4th.
I presume your 'cvsup' also updated your libc sources.
So you've basically upgraded to the newest libc...
> As of this moment, firefox3 is still running. Would you like me to try
> anything different?
.... which seems to have fixed your problem. I suggest
you install a regular non-debugging libc and see if
everything remains fixed.
Tim
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
-
Re: firefox3-bin crashes near arc4random_buf()
On Tue, Oct 07, 2008 at 06:50:09PM -0700, Tim Kientzle wrote:
> This is a lot more interesting. This points to a crash
> within libc's db code. Somehow, it's trying to compute
> a hash for some element with length -10618, which is
> getting converted to an unsigned 4294956678, which is
> causing the crash.
>
> Does Firefox have knobs to use a newer Berkeley DB?
Not that I am aware of. Maybe I should ask ports@...
> I can't
> recall whether newer Berkeley DB versions are thread-safe but
> I'm pretty sure the old version in our libc isn't. If Firefox
> is assuming the BDB code is thread-safe that could certainly
> cause corruption of the BDB data with all sorts of unpleasant
> consequences. That's just a random guess, though. Maybe someone
> else on this mailing list knows better.
I think you're on to something.
Also, I have found a reliable way to cause the crash. It happens when I go to
https://wellpointnextrx.com/ and try to accept the cert for the session.
--
Jos Backus
jos at catnook.com
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/lis...reebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"