Re: Intermittent RWSCS state - VMS

This is a discussion on Re: Intermittent RWSCS state - VMS ; johnwallace4@yahoo.co.uk wrote on 09/11/2008 03:15:59 PM: > On Sep 11, 5:19 pm, Marty Kuhrt wrote: > > John Santos wrote: > > > norm.raph...@metso.com wrote: > > > > >> Marty Kuhrt wrote on 09/09/2008 12:38:20 PM: > > > ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: Re: Intermittent RWSCS state

  1. Re: Intermittent RWSCS state

    johnwallace4@yahoo.co.uk wrote on 09/11/2008 03:15:59 PM:

    > On Sep 11, 5:19 pm, Marty Kuhrt wrote:
    > > John Santos wrote:
    > > > norm.raph...@metso.com wrote:

    > >
    > > >> Marty Kuhrt wrote on 09/09/2008 12:38:20

    PM:
    > >
    > > >> > Since the VAX in OP's question is probably only talking to the

    cluster
    > > >> > via its 10M network cable, it might be as simple as a that.

    > >
    > > >> No such luck. Talking on FDDI for SCS traffic. 10MB network for

    other.
    > >
    > > > Are you certain? IIRC, the default cluster configuration is to

    enable all
    > > > SCS-capable circuits, and normally all the traffic would end up on

    the
    > > > fastest one (FDDI), but if there was a momentary failure or

    excessive
    > > > congestion on the FDDI, it might have failed over to the ethernet,

    thus
    > > > hitting the VAX's 10Mb bottleneck, and then never failed back. I
    > > > think the show cluster circuit counters should reveal if this has
    > > > happened. (I think the 2nd example shows circuit counters by

    circuit,
    > > > but not circuit names, so I can't tell which is which, though

    possibly a
    > > > cluster expert could.)

    > >
    > > > There is a way to force it to use *only* the FDDI, and I think

    there's
    > > > a way to force to fail back to FDDI if for some reason it has failed
    > > > over to the Ethernet.

    > >
    > > > HTH.

    > >
    > > Now that I think on it, was there a FDDI interconnect for VAXen? I
    > > vaguely remember that Nemonix was making an after market one, but I
    > > don't remember a "native" one. Of course, that doesn't mean too much,
    > > since I occasionally forget I have my glasses on my head. ;^)

    >
    > There was FDDI from DEC for TURBOCHANNEL (the DEFTA). And there were
    > VAXes (eg VAXstation 4000s?) with TURBOCHANNEL. I'm pretty sure there
    > was DEFTA support, on VAX, in at least some versions of VMS, though I
    > don't recall actually ever seeing that combination (whereas I knew of
    > lots of DEC 3000s with FDDI, especially where resilience was/is of
    > interest). A more definitive answer would be the VAX/VMS SPDs
    > themselves.


    To elaborate on my earlier post, This is a Nemonix-accelerated
    VAXstation-90
    used as a server with expanded memory to 512MB. The supported
    configuration
    is 128MB. It was necessary to modify the DEFTA board to add the missing
    two
    memory lines as otherwise the FDDI ring (in high memory) was incompatible
    with the memory capacity modifications. So the DEFTA is also unsupported.
    Nemonix, however, supports all of it and is geographically nearby to the
    site.

  2. Re: Intermittent RWSCS state

    For a sanity-check, you might run LOCKTIME.COM from the V6 Freeware CD
    directory [KP_LOCKTOOLS] to see what averasge lock-request time is from
    each of the nodes to each of the other nodes, to see if a node or path
    seems unusually slow.

    If you run LOCK_ACTV_*.COM (obtain that from the same place) you'll get
    a cluster-wide summary of the locking activity. Look for cases where the
    node with the highest locking activity rate on a busy tree is not the
    master node (the master node is indicated with an asterisk). VMS tries
    to keep the lock tree mastered on the node with the highest activity
    level at any given point in time, but use of non-zero PE1 values or an
    imbalance of LOCKDIRWT values between nodes (or saturation of a CPU in
    interrupt state) can prevent lock mastership from being on the optimal node.

    If you run LCKQUE.COM from the same place you can detect cases where
    process are forced to wait for their turn at locks because of slowness.
    Availability Manager's Lock Contention data collection can also help
    with this.

    Saturation of a CPU in interrupt state could cause it to respond slowly
    to SCS messages (including lock requests) and thus cause RWSCS states on
    other nodes. I'd check T4 data (or use $MONITOR
    MODES/ALL/INTERVAL=1/CPU=(0,1,2...), looking for interrupt state time
    above 80% or so on any CPU in the box.

+ Reply to Thread