[9fans] More venti sync woes. - Plan9

This is a discussion on [9fans] More venti sync woes. - Plan9 ; I've had a cpu server running off a non-venti-backed fossil for a few weeks now. the same machine has also been running venti (but the fossil wasn't talking to it, intentionally). I'd confirmed the venti was working by doing direct ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: [9fans] More venti sync woes.

  1. [9fans] More venti sync woes.

    I've had a cpu server running off a non-venti-backed fossil for a few
    weeks now. the same machine has also been running venti (but the
    fossil wasn't talking to it, intentionally). I'd confirmed the venti
    was working by doing direct dumps and mounting the results from vacfs.
    All was well.

    Yesterday I modified my fossil config to use the venti. Edited the
    config with fossil/conf, rebooted, and all was well. At boot time, the
    "sync..." message stayed for about 10 seconds (I didn't time it, but
    that's the right order), as it had been on every previous reboot
    (before fossil was using it), and then it moved on and booted as
    normal.

    Last night something outside my house got struck by lightning and we
    lost power for a few seconds. On boot, it hung at the "sync..."
    message. It's now been double-digit hours. The disk is slowish, and
    lacks supported DMA, but that still seems ridiculous, especially on a
    system with now one day's worth of dumps (with less than 50MB data
    beyond the stock plan9 install).

    On the up side, my microwave, which has been broken for months, is now
    working properly again. Go figure.

    So I've got questions. First, I was under the impression that venti's
    structure made it more or less immune to abrupt shutdown. In that
    case, assuming no damage to the actual hardware, is it safe to factor
    the power outage out of the equation and just treat this as a reboot?

    And the big one: what's going on? I've had this sync issue in a couple
    different setups. In the earlier ones, I wrote it off to having
    re-used oventi partitions and that confusing nventi. But this has been
    all nventi throughout. A handful of folks on IRC have observed
    indefinite stalls at the same place. Aside from the clock time theory
    proposed just a little bit ago (which is not the case for me; I
    checked), I've not heard any good working theories.

    My next step is going to be to try booting off some other medium and
    rebuild the index partitions, assuming the actual arenas are unharmed.
    Any bets on whether that's likely to pay off?

    Anthony

  2. Re: [9fans] More venti sync woes.

    I have had similar problems but not had the time or enthusism to look
    for it - I found disabling one of the CPUs in my twin cpu box solved this
    for me, not ideal but it got me going.

    I am running nventi on oventi partitions, and it works fine with one CPU.

    -Steve

  3. Re: [9fans] More venti sync woes.

    > I have had similar problems but not had the time or enthusism to look
    > for it - I found disabling one of the CPUs in my twin cpu box solved this
    > for me, not ideal but it got me going.
    >
    > I am running nventi on oventi partitions, and it works fine with one CPU.
    >
    > -Steve


    since venti is a user-level process, i would think that rather than
    solving the problem, disabling a cpu makes it less likely.

    - erik

  4. Re: [9fans] More venti sync woes.

    On 9/26/07, erik quanstrom wrote:

    // since venti is a user-level process, i would think that rather than
    // solving the problem, disabling a cpu makes it less likely.

    agree.

    also certainly not the problem in any of the cases where I've
    encountered this, as it's all been on one of two (significantly
    different) single-proc systems.

    a

  5. Re: [9fans] More venti sync woes.

    i don't understand what printed sync... - venti or fossil?

    boot from a boot cd and then run venti/venti -c config
    by hand, and then you can run ps and acid to see what
    it is spending its time doing.

    i half-doubt that venti is actually the one sitting around.
    it could be that fossil is trying to write the initial set of
    blocks out to venti, and that that's what is taking a while.
    but booting from a cd and being able to run ps, etc.
    will tell you for sure.

    finally, don't underestimate the slowness of non-dma disk.
    venti is *very* disk-intensive. i was measuring some
    changes i had made to venti recently and was very surprised
    that i was only getting under 1MB/s, and then i realized
    that dma was off. it matters.

    what's your venti config?

    russ

  6. Re: [9fans] More venti sync woes.

    it's almost certainly venti sitting there; i don't think fossil is
    even running yet. the last two lines on my screen are:

    root is from (tcp, local)[local!#S/sdC0/fossil]: time...
    venti...2007/0926 17:57:23 venti: conf...httpd tcp!127.1!8000...init...sync...

    that sequence matches my read of the venti source. also, ^t^tp shows a
    few venti procs that just keep racking up cpu time:

    8: venti pc f0100366 dbgpc 2557f Rendez (Running) ut 1547792 st
    2058589 bss 6650000 qpc 0 nl 1 nd 0 lpc f01bb8ec pri 2
    14: venti pc 2557f dbgpc 2557f Rendez (Ready) ut 2477463
    st 2354871 bss 6650000 qpc 0 nl 0 nd 0 lpc f01c7bc7 pri 2
    16: venti pc f01c88ae dbgpc 1f141 Pread (Ready) ut 467438
    st 1254296 bss 6650000 qpc f01c35e5 nl 0 nd 0 lpc f01bb8ec pri 1

    they're definitely increasing more-or-less regularly. everything else
    (well, i can't see above 8) had 0-2 for ut and st.

    as for my venti config: i'll confirm when i get thin thing booted off
    another medium, but from memory: i've got a ~30GB fossil partition, a
    64MB bloom filter, a 5-10GB index, and a ~120GB arenas partition.
    there's also a 9fat and swap in there somewhere. it's all on one disk.

    i certainly appreciate the fact that non-dma disks can be dog slow
    under load. but at this point, whatever it's doing it's been doing for
    over 24 hours; even a factor of 50 puts that at just under half an
    hour to reboot, which seems like an unreasonable amount of time for a
    server to spin on an unclean reboot. what was the rough improvement
    factor you observed?

    this machine has no optical drive, so i'll have to make a floppy and
    get it to boot of the net. i'll report back when i have info from
    that.

  7. Re: [9fans] More venti sync woes.

    > as for my venti config: i'll confirm when i get thin thing booted off
    > another medium, but from memory: i've got a ~30GB fossil partition, a
    > 64MB bloom filter, a 5-10GB index, and a ~120GB arenas partition.
    > there's also a 9fat and swap in there somewhere. it's all on one disk.


    > i certainly appreciate the fact that non-dma disks can be dog slow
    > under load. but at this point, whatever it's doing it's been doing for
    > over 24 hours; even a factor of 50 puts that at just under half an
    > hour to reboot, which seems like an unreasonable amount of time for a
    > server to spin on an unclean reboot. what was the rough improvement
    > factor you observed?


    generally, a sata disk is good for 30-50 MB/s using sequential
    IDE dma transfers on the outer tracks. /non-sequential access
    can be as slow as non-dma access./

    (i fixed a similar problem reciently with the on-disk cache of ken's fs.
    throughput went up by a factor of 20 or so.)

    russ would have to answer questions about disk access patterns with
    venti, but you could be generating constant seeks between the various
    partitions.

    - erik


  8. Re: [9fans] More venti sync woes.

    > then i realized
    > that dma was off. it matters.


    it matters greatly.
    in some cases, if there's a lot to do, don't even bother until dma is on.
    otherwise, it takes simply ages.


  9. Re: [9fans] More venti sync woes.

    > it matters greatly.
    > in some cases, if there's a lot to do, don't even bother until dma is on.
    > otherwise, it takes simply ages.


    regardless of the need to get things onto disk due to memory pressure,
    the data's not safe from a momentary power problem until it hits
    the disk.

    my experience has been that even with upses, power is less reliable than
    disks.

    - erik


  10. Re: [9fans] More venti sync woes.

    > my experience has been that even with upses, power is less reliable than
    > disks.


    you're suggesting that power corrupts?


  11. Re: [9fans] More venti sync woes.

    >> my experience has been that even with upses, power is less reliable than
    >> disks.

    >
    > you're suggesting that power corrupts?


    lack of power corrupts dram; absolute lack of power corrupts dram absolutely.

    hardware has always been a bit odd.

    - erik


  12. Re: [9fans] More venti sync woes.

    dma is worth around 10x, certainly less than 50.
    i agree that your venti server is taking a very long
    time to come back. i reboot mine all the time
    and don't have this problem.

    i am at a loss for what could be taking it so long.
    it's probably not going to hurt any to stop it.
    it could take forever -- maybe it's looping!

    when you manage to boot in other means,
    it would be nice to see what ps -a|grep venti
    says. venti sets its proc args that show up in ps -a
    to tell you what each proc does.

    the new venti is very careful both about the
    consistency of what is stored on disk and about
    recovering quickly after a disk failure
    (there's not a lot to do -- just pick up the unindexed
    arena entries from the arena tocs and toss them
    back into the index write buffer where they were
    when you restarted the system).

    what you're describing could happen if you were
    running a new venti (which buffers index updates
    quite aggressively) and then on reboot managed
    to start an old venti (which would then process the
    unindexed new blocks one at a time instead of
    buffering the updates, with about 3 seeks per block).

    without more information i'm afraid i have no good answers.

    russ

  13. Re: [9fans] More venti sync woes.

    absurd .. a UPS is your friend. there is something else wrong.

    brucee

    On 9/28/07, erik quanstrom wrote:
    > > it matters greatly.
    > > in some cases, if there's a lot to do, don't even bother until dma is on.
    > > otherwise, it takes simply ages.

    >
    > regardless of the need to get things onto disk due to memory pressure,
    > the data's not safe from a momentary power problem until it hits
    > the disk.
    >
    > my experience has been that even with upses, power is less reliable than
    > disks.
    >
    > - erik
    >
    >


  14. Re: [9fans] More venti sync woes.

    Russ Cox wrote:

    >dma is worth around 10x, certainly less than 50.
    >i agree that your venti server is taking a very long
    >time to come back. i reboot mine all the time
    >and don't have this problem.
    >
    >i am at a loss for what could be taking it so long.
    >it's probably not going to hurt any to stop it.
    >it could take forever -- maybe it's looping!
    >
    >

    It is...

    while(1){
    proc main: kick icache
    work icachewritecoord: start
    proc icachewritecoord: icachewritecoord kick dcache
    work flushproc: start
    proc flushproc: build t=131
    proc flushproc: writeblocks t=991
    proc flushproc: writeblocks.1 t=1632
    proc flushproc: writeblocks.2 t=2296
    proc flushproc: writeblocks.3 t=2944
    proc flushproc: undirty.4 t=3564
    work flushproc: finish
    proc icachewritecoord: kick dcache
    proc icachewritecoord: icachewritecoord kicked dcache
    proc icachewritecoord: icachewritecoord start flush
    proc icachewritecoord: icachedirty enter
    proc icachewritecoord: icachedirty exit
    proc icachewritecoord: icachewritecoord sleep
    proc main: kick icache
    }

    the main proc loops in icachealloc():

    while(icache.ndirty == icache.entries){
    /*
    * This is a bit suspect. Kickicache will wake up the
    * icachewritecoord, but if all the index entries are for
    * unflushed disk blocks, icachewritecoord won't be
    * able to do much. It always rewakes everyone when
    * it thinks it is done, though, so at least we'll go around
    * the while loop again. Also, if icachewritecoord sees
    * that the disk state hasn't change at all since the last
    * time around, it kicks the disk. This needs to be
    * rethought, but it shouldn't deadlock anymore.
    */
    kickicache();
    rsleep(&icache.full);
    }

    but icache.ndirty never changes... so it hangs forever in
    "sync..." because it cant allocate ientries.

    >when you manage to boot in other means,
    >it would be nice to see what ps -a|grep venti
    >says. venti sets its proc args that show up in ps -a
    >to tell you what each proc does.
    >
    >the new venti is very careful both about the
    >consistency of what is stored on disk and about
    >recovering quickly after a disk failure
    >(there's not a lot to do -- just pick up the unindexed
    >arena entries from the arena tocs and toss them
    >back into the index write buffer where they were
    >when you restarted the system).
    >
    >what you're describing could happen if you were
    >running a new venti (which buffers index updates
    >quite aggressively) and then on reboot managed
    >to start an old venti (which would then process the
    >unindexed new blocks one at a time instead of
    >buffering the updates, with about 3 seeks per block).
    >
    >without more information i'm afraid i have no good answers.
    >
    >russ
    >
    >
    >



  15. Re: [9fans] More venti sync woes.

    agreed. 'ps -a | grep venti' shows the followg after about 15 minutes:

    glenda 198 3:04 3:44 104508K Rendez venti [main]
    glenda 199 0:00 0:00 104508K Rendez venti
    glenda 200 0:00 0:00 104508K Sleep venti
    glenda 201 0:00 0:00 104508K Rendez venti
    [icachewriteproc:/dev/sdC0/isect]
    glenda 202 4:49 4:23 104508K Rendez venti [icachewritecoord]
    glenda 203 0:00 0:00 104508K Sleep venti
    [delaykickproc icache]
    glenda 204 0:23 1:11 104508K Rendez venti [flushproc]
    glenda 205 0:00 0:00 104508K Rendez venti
    [delaykickproc dcache]
    glenda 206 0:00 0:00 104508K Rendez venti
    glenda 206 0:00 0:00 104508K Rendez venti [bloomwriteproc]

    once it hits "sync...", load, context, and sycall are pegged in stats;
    memory ramps up a bit over the first ~ half minute, but levels out.
    For the big three processes, here's everything over 3% in tprof:

    :; tprof 198
    total: 3040
    TEXT 00001000
    ms % sym
    480 15.7 _tas
    240 7.8 runthread
    230 7.5 lock
    180 5.9 _threadrendezvous
    170 5.5 rendezvous
    140 4.6 qlock
    130 4.2 _sched
    110 3.6 trace
    110 3.6 _threadready
    100 3.2 waitforkick
    100 3.2 icachewritecoord

    :; tprof 202
    total: 7570
    TEXT 00001000
    ms % sym
    1090 14.3 _tas
    520 6.8 runthread
    500 6.6 rendezvous
    490 6.4 _threadrendezvous
    490 6.4 lock
    290 3.8 icachewritecoord
    280 3.6 _sched
    260 3.4 qlock
    240 3.1 trace
    230 3.0 _threadready

    :; tprof 204
    total: 14040
    TEXT 00001000
    ms % sym
    2020 14.3 _tas
    1010 7.1 _threadrendezvous
    950 6.7 rendezvous
    930 6.6 runthread
    920 6.5 lock
    590 4.2 icachewritecoord
    540 3.8 trace
    510 3.6 _sched
    470 3.3 _threadready

    tight loops with most of its time in the thread library. poking around
    with acid now to get more info.

    On 9/28/07, Kernel Panic wrote:
    > Russ Cox wrote:
    >
    > >dma is worth around 10x, certainly less than 50.
    > >i agree that your venti server is taking a very long
    > >time to come back. i reboot mine all the time
    > >and don't have this problem.
    > >
    > >i am at a loss for what could be taking it so long.
    > >it's probably not going to hurt any to stop it.
    > >it could take forever -- maybe it's looping!
    > >
    > >

    > It is...
    >
    > while(1){
    > proc main: kick icache
    > work icachewritecoord: start
    > proc icachewritecoord: icachewritecoord kick dcache
    > work flushproc: start
    > proc flushproc: build t=131
    > proc flushproc: writeblocks t=991
    > proc flushproc: writeblocks.1 t=1632
    > proc flushproc: writeblocks.2 t=2296
    > proc flushproc: writeblocks.3 t=2944
    > proc flushproc: undirty.4 t=3564
    > work flushproc: finish
    > proc icachewritecoord: kick dcache
    > proc icachewritecoord: icachewritecoord kicked dcache
    > proc icachewritecoord: icachewritecoord start flush
    > proc icachewritecoord: icachedirty enter
    > proc icachewritecoord: icachedirty exit
    > proc icachewritecoord: icachewritecoord sleep
    > proc main: kick icache
    > }
    >
    > the main proc loops in icachealloc():
    >
    > while(icache.ndirty == icache.entries){
    > /*
    > * This is a bit suspect. Kickicache will wake up the
    > * icachewritecoord, but if all the index entries are for
    > * unflushed disk blocks, icachewritecoord won't be
    > * able to do much. It always rewakes everyone when
    > * it thinks it is done, though, so at least we'll go around
    > * the while loop again. Also, if icachewritecoord sees
    > * that the disk state hasn't change at all since the last
    > * time around, it kicks the disk. This needs to be
    > * rethought, but it shouldn't deadlock anymore.
    > */
    > kickicache();
    > rsleep(&icache.full);
    > }
    >
    > but icache.ndirty never changes... so it hangs forever in
    > "sync..." because it cant allocate ientries.
    >
    > >when you manage to boot in other means,
    > >it would be nice to see what ps -a|grep venti
    > >says. venti sets its proc args that show up in ps -a
    > >to tell you what each proc does.
    > >
    > >the new venti is very careful both about the
    > >consistency of what is stored on disk and about
    > >recovering quickly after a disk failure
    > >(there's not a lot to do -- just pick up the unindexed
    > >arena entries from the arena tocs and toss them
    > >back into the index write buffer where they were
    > >when you restarted the system).
    > >
    > >what you're describing could happen if you were
    > >running a new venti (which buffers index updates
    > >quite aggressively) and then on reboot managed
    > >to start an old venti (which would then process the
    > >unindexed new blocks one at a time instead of
    > >buffering the updates, with about 3 seeks per block).
    > >
    > >without more information i'm afraid i have no good answers.
    > >
    > >russ
    > >
    > >
    > >

    >
    >


+ Reply to Thread