X "Hangs" with RS690 + 2.6.26 - Kernel

This is a discussion on X "Hangs" with RS690 + 2.6.26 - Kernel ; Hi. I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel. The symptoms are that load average goes up, X stops accepting keypresses or mouse clicks, but the cursor still moves around the screen in ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: X "Hangs" with RS690 + 2.6.26

  1. X "Hangs" with RS690 + 2.6.26

    Hi.

    I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
    The symptoms are that load average goes up, X stops accepting keypresses
    or mouse clicks, but the cursor still moves around the screen in
    response to the mouse being moved. I can't switch to a VT but can ssh in
    remotely to see that things are still running. I don't seem to be able
    to kill X but "shutdown -r now" cleanly reboots.

    gdb fails to attach - complains about an internal error. strace shows
    lots of ioctls against the DRM device all returning EBUSY.

    2.6.25 appears to work fine. I originally had PAT enabled under 2.6.26
    but have seen a patch fixing that go into git, so disabled it for my
    2.6.26 kernel to see if that was the issue; no change AFAICT.

    Enabling DRM debug (echo 1 > /sys/module/drm/parameters/debug) gives
    lots of output from radeon_freelist_get, after the following ioctl is
    received:

    Jul 25 10:11:14 meepok kernel: [drm:drm_ioctl] pid=3302, cmd=0xc0406429, nr=0x29 , dev 0xe200, auth=1

    and then a returning NULL message.

    radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
    but I've seen it with older revisions too.

    It can take a couple of days for me to hit the problem, so a git bisect
    could be a lengthy process. If anyone has any suggestions about faster
    ways to track down the issue I'd like to hear them.

    Machine is a dual core AMD64 with 4GB of RAM running Debian unstable,
    card is:

    01:05.0 VGA compatible controller [0300]: ATI Technologies Inc RS690 [Radeon X1200 Series] [1002:791e]

    Kernel configs at:

    http://the.earth.li/~noodles/radeon-.../config-2.6.25
    http://the.earth.li/~noodles/radeon-.../config-2.6.26

    Debug log from enabling drm debug:

    http://the.earth.li/~noodles/radeon-2.6.26-hang/debug

    Full dmesg (no obvious errors):

    http://the.earth.li/~noodles/radeon-...g/meepok.dmesg

    Xorg log file (no obvious errors):

    http://the.earth.li/~noodles/radeon-...ang/Xorg.0.log

    J.

    --
    "I put it down to corrosive groin sweat myself." -- John Burnham, asr
    This .sig brought to you by the letter N and the number 39
    Product of the Republic of HuggieTag
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: X "Hangs" with RS690 + 2.6.26

    Am Freitag 25 Juli 2008 12:12:59 schrieb Jerome Glisse:
    > This looks like usual engine lockup followed by CP lockup so
    > that DMA buffer age never get written and we run out of DMA
    > buffer thus freelist failing in infinite loop.
    >
    > I think we now know all the reason why we lockup, while a
    > fix could be made for old ioctl we believe the best plan is
    > to work on new ioctl with this fix in mind.


    I can't help but feel uneasy with that kind of plan. After all, do "we"
    *really* know what's going on? I always had the impression that we only knew
    things along the lines of "perhaps it's better to submit 3D stuff in indirect
    buffers".

    If you *really* know what causes the lockups, could you please document that?
    As in, what's the actual command processor sequence that is to blame? I know
    that running e.g. a Nexuiz demo + glxgears window above it is apparently a
    100% guaranteed lockup on my system (R420).

    If you could share your progress in tracking down the sources of the lockups,
    I'd happily try to write a patch against the current system.

    cu,
    Nicolai

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQBIigfA1iBctzupxNIRAqnIAJ92n7zxXjWHynd0wfBpCe PHm9yjwQCgkjWn
    TCEngGbygPDEOgpDVAdg01o=
    =jWeE
    -----END PGP SIGNATURE-----


  3. Re: X "Hangs" with RS690 + 2.6.26

    On Fri, 25 Jul 2008 19:04:55 +0200
    Nicolai Hähnle wrote:

    > Am Freitag 25 Juli 2008 12:12:59 schrieb Jerome Glisse:
    > > This looks like usual engine lockup followed by CP lockup so
    > > that DMA buffer age never get written and we run out of DMA
    > > buffer thus freelist failing in infinite loop.
    > >
    > > I think we now know all the reason why we lockup, while a
    > > fix could be made for old ioctl we believe the best plan is
    > > to work on new ioctl with this fix in mind.

    >
    > I can't help but feel uneasy with that kind of plan. After all, do "we"
    > *really* know what's going on? I always had the impression that we only knew
    > things along the lines of "perhaps it's better to submit 3D stuff in indirect
    > buffers".
    >
    > If you *really* know what causes the lockups, could you please document that?
    > As in, what's the actual command processor sequence that is to blame? I know
    > that running e.g. a Nexuiz demo + glxgears window above it is apparently a
    > 100% guaranteed lockup on my system (R420).
    >
    > If you could share your progress in tracking down the sources of the lockups,
    > I'd happily try to write a patch against the current system.
    >
    > cu,
    > Nicolai
    >


    Here is a brief list from top of my head for the record :

    - no RB3D_DSTCACHE twice in a row without rendering cmd in btw
    - initialize all clip register to default values wait for engine idle
    after setting them
    - update wptr every 32 dwords (2 dwords seems enough but that one
    is very hard to track)
    - use indirect buffer
    - RB3D_DSTCACHE is not pipelined if free or sync bit is not set
    thus you have to feel the fifo and wait for idle before writing
    it if none of these bits are set
    - flush & wait until 3d before 2d, and flush & wait dma & 2d idle
    after 2d as well feel the fifo with dummy 2d reg to avoid unpipelined
    3d reg to get executed before idle is asserted
    - avoid emitting cliprect too much
    - txinval before changing texture
    - avoid stuff RB3D_DSTCACHE & RB2D_DSTCACHE too much
    - set ISYNC properly through CP
    - CP idle is wrong we should wait for tag and not
    try to force CP to goes idle or inject flush after
    idle
    - set vertex shader constant & input to default safe value

    And there is other things to think about scattered in my drm.
    Baiscly things should be set in some order to make sure the
    engine will not be unhappy in face of a cmd stream. Some of the
    above might be wrong but i use them because somehow they each
    one of them seems to give me more stable drm. The last drm
    i have doesn't lockup in the case of few glxgears on top
    of other 3d app like celestia and likely nexuiz haven't tried
    that one.

    Cheers,
    Jerome Glisse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: X "Hangs" with RS690 + 2.6.26

    On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
    > > I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
    > > The symptoms are that load average goes up, X stops accepting keypresses
    > > or mouse clicks, but the cursor still moves around the screen in
    > > response to the mouse being moved. I can't switch to a VT but can ssh in
    > > remotely to see that things are still running. I don't seem to be able
    > > to kill X but "shutdown -r now" cleanly reboots.
    > >
    > > radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
    > > but I've seen it with older revisions too.
    > >
    > > It can take a couple of days for me to hit the problem, so a git bisect
    > > could be a lengthy process. If anyone has any suggestions about faster
    > > ways to track down the issue I'd like to hear them.

    >
    > git log v2.6.25..v2.6.26 drivers/char/drm
    >
    > 5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
    > 5cfb6956073a9e42d44a26790b7800980634d037


    No joy.

    > d396db321bcaec54345e7e9e87cea8482d6ae3a8


    I thought this might be it; nearly 5 days of uptime rather than the
    usual less than 2. But I got the same symptoms today so I'll continue
    working down the list.

    > 259434acccbc823ee8bc00b2d2689ccccd25e1fd
    > d7463eb41d88a39de2653fd41857c4ccddb8707b
    > 45e519052e8f583a709edd442a23f59581d3fe42
    > 2735977b12cb0f113aae24afff04747b6d0f5bf1
    > 3722bfc607d46275369865c02fe8694486d640b5
    > fa0d71b967506031f7cb08ced6095d1c4f988594
    > 9f18409ea3d778a171a9505c0a849d846f352bd0


    J.

    --
    Friends are God's apology for relations.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: X "Hangs" with RS690 + 2.6.26

    On Fri, 1 Aug 2008, Jonathan McDowell wrote:

    > On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
    >>> I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
    >>> The symptoms are that load average goes up, X stops accepting keypresses
    >>> or mouse clicks, but the cursor still moves around the screen in
    >>> response to the mouse being moved. I can't switch to a VT but can ssh in
    >>> remotely to see that things are still running. I don't seem to be able
    >>> to kill X but "shutdown -r now" cleanly reboots.
    >>>
    >>> radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
    >>> but I've seen it with older revisions too.
    >>>
    >>> It can take a couple of days for me to hit the problem, so a git bisect
    >>> could be a lengthy process. If anyone has any suggestions about faster
    >>> ways to track down the issue I'd like to hear them.

    >>
    >> git log v2.6.25..v2.6.26 drivers/char/drm
    >>
    >> 5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
    >> 5cfb6956073a9e42d44a26790b7800980634d037

    >
    > No joy.
    >
    >> d396db321bcaec54345e7e9e87cea8482d6ae3a8

    >
    > I thought this might be it; nearly 5 days of uptime rather than the
    > usual less than 2. But I got the same symptoms today so I'll continue
    > working down the list.
    >
    >> 259434acccbc823ee8bc00b2d2689ccccd25e1fd
    >> d7463eb41d88a39de2653fd41857c4ccddb8707b
    >> 45e519052e8f583a709edd442a23f59581d3fe42
    >> 2735977b12cb0f113aae24afff04747b6d0f5bf1
    >> 3722bfc607d46275369865c02fe8694486d640b5
    >> fa0d71b967506031f7cb08ced6095d1c4f988594
    >> 9f18409ea3d778a171a9505c0a849d846f352bd0


    Any joy ? I apparently have the same problem with my RS690. I
    noticed it after upgrading from 2.6.25 to 2.6.26, alongside
    xorg-server (1.4.99.904 to 1.4.99.905) and Mesa (7.1-rc1 to
    7.1-rc3). The ATI driver is 6.9.0.

    Here it always freezes in a few minutes or less than an hour.
    When it happens, I'm not running any 3D application and the CPU
    is idle. I may be just typing something in a shell. But it
    works disabling DRI.

    Alt-SysRq-s/u/b is the only way. Trying with q freezes the
    mouse cursor.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: X "Hangs" with RS690 + 2.6.26

    On Sat, Aug 09, 2008 at 05:47:42AM -0300, Frédéric L. W. Meunier wrote:
    > On Fri, 1 Aug 2008, Jonathan McDowell wrote:
    > >On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
    > >>>I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
    > >>>The symptoms are that load average goes up, X stops accepting keypresses
    > >>>or mouse clicks, but the cursor still moves around the screen in
    > >>>response to the mouse being moved. I can't switch to a VT but can ssh in
    > >>>remotely to see that things are still running. I don't seem to be able
    > >>>to kill X but "shutdown -r now" cleanly reboots.
    > >>>
    > >>>radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
    > >>>but I've seen it with older revisions too.
    > >>>
    > >>>It can take a couple of days for me to hit the problem, so a git bisect
    > >>>could be a lengthy process. If anyone has any suggestions about faster
    > >>>ways to track down the issue I'd like to hear them.
    > >>
    > >>git log v2.6.25..v2.6.26 drivers/char/drm
    > >>
    > >>5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
    > >>5cfb6956073a9e42d44a26790b7800980634d037

    > >
    > >No joy.
    > >
    > >>d396db321bcaec54345e7e9e87cea8482d6ae3a8

    > >
    > >I thought this might be it; nearly 5 days of uptime rather than the
    > >usual less than 2. But I got the same symptoms today so I'll continue
    > >working down the list.
    > >
    > >>259434acccbc823ee8bc00b2d2689ccccd25e1fd
    > >>d7463eb41d88a39de2653fd41857c4ccddb8707b
    > >>45e519052e8f583a709edd442a23f59581d3fe42
    > >>2735977b12cb0f113aae24afff04747b6d0f5bf1
    > >>3722bfc607d46275369865c02fe8694486d640b5
    > >>fa0d71b967506031f7cb08ced6095d1c4f988594
    > >>9f18409ea3d778a171a9505c0a849d846f352bd0

    >
    > Any joy ?


    259434acccbc823ee8bc00b2d2689ccccd25e1fd
    d7463eb41d88a39de2653fd41857c4ccddb8707b
    45e519052e8f583a709edd442a23f59581d3fe42

    all don't seem to be the problem. It's getting harder to do the reverts
    and I'm away this week so I haven't got any further yet.

    > I apparently have the same problem with my RS690. I noticed it after
    > upgrading from 2.6.25 to 2.6.26, alongside xorg-server (1.4.99.904 to
    > 1.4.99.905) and Mesa (7.1-rc1 to 7.1-rc3). The ATI driver is 6.9.0.
    >
    > Here it always freezes in a few minutes or less than an hour. When it
    > happens, I'm not running any 3D application and the CPU is idle. I may
    > be just typing something in a shell. But it works disabling DRI.


    Likewise, I'm not doing anything 3D related (at least, not consciously).

    J.

    --
    ] http://www.earth.li/~noodles/ [] No program done by a hacker will [
    ] PGP/GPG Key @ the.earth.li [] work unless he is on the system. [
    ] via keyserver, web or email. [] [
    ] RSA: 4DC4E7FD / DSA: 5B430367 [] [
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: X "Hangs" with RS690 + 2.6.26

    On Sun, 10 Aug 2008, Jonathan McDowell wrote:

    > On Sat, Aug 09, 2008 at 05:47:42AM -0300, Frédéric L. W. Meunier wrote:
    >> On Fri, 1 Aug 2008, Jonathan McDowell wrote:
    >>> On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
    >>>>> I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
    >>>>> The symptoms are that load average goes up, X stops accepting keypresses
    >>>>> or mouse clicks, but the cursor still moves around the screen in
    >>>>> response to the mouse being moved. I can't switch to a VT but can ssh in
    >>>>> remotely to see that things are still running. I don't seem to be able
    >>>>> to kill X but "shutdown -r now" cleanly reboots.
    >>>>>
    >>>>> radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
    >>>>> but I've seen it with older revisions too.
    >>>>>
    >>>>> It can take a couple of days for me to hit the problem, so a git bisect
    >>>>> could be a lengthy process. If anyone has any suggestions about faster
    >>>>> ways to track down the issue I'd like to hear them.
    >>>>
    >>>> git log v2.6.25..v2.6.26 drivers/char/drm
    >>>>
    >>>> 5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
    >>>> 5cfb6956073a9e42d44a26790b7800980634d037
    >>>
    >>> No joy.
    >>>
    >>>> d396db321bcaec54345e7e9e87cea8482d6ae3a8
    >>>
    >>> I thought this might be it; nearly 5 days of uptime rather than the
    >>> usual less than 2. But I got the same symptoms today so I'll continue
    >>> working down the list.
    >>>
    >>>> 259434acccbc823ee8bc00b2d2689ccccd25e1fd
    >>>> d7463eb41d88a39de2653fd41857c4ccddb8707b
    >>>> 45e519052e8f583a709edd442a23f59581d3fe42
    >>>> 2735977b12cb0f113aae24afff04747b6d0f5bf1
    >>>> 3722bfc607d46275369865c02fe8694486d640b5
    >>>> fa0d71b967506031f7cb08ced6095d1c4f988594
    >>>> 9f18409ea3d778a171a9505c0a849d846f352bd0

    >>
    >> Any joy ?

    >
    > 259434acccbc823ee8bc00b2d2689ccccd25e1fd
    > d7463eb41d88a39de2653fd41857c4ccddb8707b
    > 45e519052e8f583a709edd442a23f59581d3fe42
    >
    > all don't seem to be the problem. It's getting harder to do the reverts
    > and I'm away this week so I haven't got any further yet.
    >
    >> I apparently have the same problem with my RS690. I noticed it after
    >> upgrading from 2.6.25 to 2.6.26, alongside xorg-server (1.4.99.904 to
    >> 1.4.99.905) and Mesa (7.1-rc1 to 7.1-rc3). The ATI driver is 6.9.0.
    >>
    >> Here it always freezes in a few minutes or less than an hour. When it
    >> happens, I'm not running any 3D application and the CPU is idle. I may
    >> be just typing something in a shell. But it works disabling DRI.

    >
    > Likewise, I'm not doing anything 3D related (at least, not consciously).


    BTW, I forgot to mention that. Here the motherboard is a
    Gigabyte GA-MA69VM-S2. When it happens and I use SysRq to
    reboot, it doesn't post in the BIOS screen. I have to press
    reset.

  8. Re: X "Hangs" with RS690 + 2.6.26

    On Sun, Aug 10, 2008 at 05:25:55PM -0300, Frédéric L. W. Meunier wrote:
    > On Sun, 10 Aug 2008, Jonathan McDowell wrote:
    > >On Sat, Aug 09, 2008 at 05:47:42AM -0300, Frédéric L. W. Meunier wrote:
    > >>I apparently have the same problem with my RS690. I noticed it after
    > >>upgrading from 2.6.25 to 2.6.26, alongside xorg-server (1.4.99.904 to
    > >>1.4.99.905) and Mesa (7.1-rc1 to 7.1-rc3). The ATI driver is 6.9.0.
    > >>
    > >>Here it always freezes in a few minutes or less than an hour. When it
    > >>happens, I'm not running any 3D application and the CPU is idle. I may
    > >>be just typing something in a shell. But it works disabling DRI.

    > >
    > >Likewise, I'm not doing anything 3D related (at least, not consciously).

    >
    > BTW, I forgot to mention that. Here the motherboard is a
    > Gigabyte GA-MA69VM-S2. When it happens and I use SysRq to
    > reboot, it doesn't post in the BIOS screen. I have to press
    > reset.


    My mobo is an ASUS M2A-VM HDMI and a "shutdown -r now" when X is wedged
    (done over ssh) results in a clean reboot; no need to hard reset.

    J.

    --
    ] http://www.earth.li/~noodles/ [] 101 things you can't have too much [
    ] PGP/GPG Key @ the.earth.li [] of : 38 - clean underwear. [
    ] via keyserver, web or email. [] [
    ] RSA: 4DC4E7FD / DSA: 5B430367 [] [
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: X "Hangs" with RS690 + 2.6.26

    On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
    > > I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
    > > The symptoms are that load average goes up, X stops accepting keypresses
    > > or mouse clicks, but the cursor still moves around the screen in
    > > response to the mouse being moved. I can't switch to a VT but can ssh in
    > > remotely to see that things are still running. I don't seem to be able
    > > to kill X but "shutdown -r now" cleanly reboots.
    > >
    > > radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
    > > but I've seen it with older revisions too.
    > >
    > > It can take a couple of days for me to hit the problem, so a git bisect
    > > could be a lengthy process. If anyone has any suggestions about faster
    > > ways to track down the issue I'd like to hear them.

    >
    > git log v2.6.25..v2.6.26 drivers/char/drm

    ....
    > not sure if you wanna try reverting some of those and seeing which is the
    > cause maybe..


    I never figured out which of these caused the issues, but as a further
    data point for anyone else suffering from the issue 2.6.27-rc kernels
    appear to fix (or at least significantly ease) the problem; I managed a
    23 day uptime on 2.6.27-rc5 with I think one X freeze during that period
    that cleaned up after a Ctrl-Alt-Backspace. Not seen the same thing at
    all on 2.6.27-rc7 (though only ran it for 14 days before rebooting into
    2.6.27 proper).

    J.

    --
    Web [ Can I trade this job for what's behind door 2? ]
    site: http:// [ ] Made by
    www.earth.li/~noodles/ [ ] HuggieTag 0.0.23
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread