Re: Intermittent RWSCS state - VMS
This is a discussion on Re: Intermittent RWSCS state - VMS ; johnwallace4@yahoo.co.uk wrote on 09/11/2008 03:15:59 PM:
> On Sep 11, 5:19 pm, Marty Kuhrt wrote:
> > John Santos wrote:
> > > norm.raph...@metso.com wrote:
> >
> > >> Marty Kuhrt wrote on 09/09/2008 12:38:20
PM:
> >
> ...
-
Re: Intermittent RWSCS state
johnwallace4@yahoo.co.uk wrote on 09/11/2008 03:15:59 PM:
> On Sep 11, 5:19 pm, Marty Kuhrt wrote:
> > John Santos wrote:
> > > norm.raph...@metso.com wrote:
> >
> > >> Marty Kuhrt wrote on 09/09/2008 12:38:20
PM:
> >
> > >> > Since the VAX in OP's question is probably only talking to the
cluster
> > >> > via its 10M network cable, it might be as simple as a that.
> >
> > >> No such luck. Talking on FDDI for SCS traffic. 10MB network for
other.
> >
> > > Are you certain? IIRC, the default cluster configuration is to
enable all
> > > SCS-capable circuits, and normally all the traffic would end up on
the
> > > fastest one (FDDI), but if there was a momentary failure or
excessive
> > > congestion on the FDDI, it might have failed over to the ethernet,
thus
> > > hitting the VAX's 10Mb bottleneck, and then never failed back. I
> > > think the show cluster circuit counters should reveal if this has
> > > happened. (I think the 2nd example shows circuit counters by
circuit,
> > > but not circuit names, so I can't tell which is which, though
possibly a
> > > cluster expert could.)
> >
> > > There is a way to force it to use *only* the FDDI, and I think
there's
> > > a way to force to fail back to FDDI if for some reason it has failed
> > > over to the Ethernet.
> >
> > > HTH.
> >
> > Now that I think on it, was there a FDDI interconnect for VAXen? I
> > vaguely remember that Nemonix was making an after market one, but I
> > don't remember a "native" one. Of course, that doesn't mean too much,
> > since I occasionally forget I have my glasses on my head. ;^)
>
> There was FDDI from DEC for TURBOCHANNEL (the DEFTA). And there were
> VAXes (eg VAXstation 4000s?) with TURBOCHANNEL. I'm pretty sure there
> was DEFTA support, on VAX, in at least some versions of VMS, though I
> don't recall actually ever seeing that combination (whereas I knew of
> lots of DEC 3000s with FDDI, especially where resilience was/is of
> interest). A more definitive answer would be the VAX/VMS SPDs
> themselves.
To elaborate on my earlier post, This is a Nemonix-accelerated
VAXstation-90
used as a server with expanded memory to 512MB. The supported
configuration
is 128MB. It was necessary to modify the DEFTA board to add the missing
two
memory lines as otherwise the FDDI ring (in high memory) was incompatible
with the memory capacity modifications. So the DEFTA is also unsupported.
Nemonix, however, supports all of it and is geographically nearby to the
site.
-
Re: Intermittent RWSCS state
For a sanity-check, you might run LOCKTIME.COM from the V6 Freeware CD
directory [KP_LOCKTOOLS] to see what averasge lock-request time is from
each of the nodes to each of the other nodes, to see if a node or path
seems unusually slow.
If you run LOCK_ACTV_*.COM (obtain that from the same place) you'll get
a cluster-wide summary of the locking activity. Look for cases where the
node with the highest locking activity rate on a busy tree is not the
master node (the master node is indicated with an asterisk). VMS tries
to keep the lock tree mastered on the node with the highest activity
level at any given point in time, but use of non-zero PE1 values or an
imbalance of LOCKDIRWT values between nodes (or saturation of a CPU in
interrupt state) can prevent lock mastership from being on the optimal node.
If you run LCKQUE.COM from the same place you can detect cases where
process are forced to wait for their turn at locks because of slowness.
Availability Manager's Lock Contention data collection can also help
with this.
Saturation of a CPU in interrupt state could cause it to respond slowly
to SCS messages (including lock requests) and thus cause RWSCS states on
other nodes. I'd check T4 data (or use $MONITOR
MODES/ALL/INTERVAL=1/CPU=(0,1,2...), looking for interrupt state time
above 80% or so on any CPU in the box.