HP-UX 11.x and Legato Networker 5.5 - HP UX
This is a discussion on HP-UX 11.x and Legato Networker 5.5 - HP UX ; Hello,
I work for a company who uses Legato Networker to backup their HP-UX
systems.
The problem is Legato. It seems that on one particular server, Legato
wants to fail at least once a week. Multiple sites are supported
solely ...
-
HP-UX 11.x and Legato Networker 5.5
Hello,
I work for a company who uses Legato Networker to backup their HP-UX
systems.
The problem is Legato. It seems that on one particular server, Legato
wants to fail at least once a week. Multiple sites are supported
solely by me, and I have spent many weeks restoring Legato directories
when it crashes or encounters wierd problems, and then updating the
indexes and checking on the settings to make sure nothing has changed
(online and offline notes are kept on the settings for each server).
The thing that makes this difficult is that I am not allowed to install
software such as snort, sudo or tripwire. Furthermore, it seems that
although a limited number of people know the root password (i.e 1-2
people plus myself, as I am root), certain things are happening that
would require the root password. Physical security is not optimal, and
I have to sleep a small number of hours to keep my sanity, otherwise I
would be onsite or connected otherwise all the time.
A recent issue that concerns me is the numbe of times it's primary
daemons disappear from the process table, forciong me to go in and
restart it manually and watch the system. I had written a script to
automatically check and restart the processes, but it didn't work
correctly so it was removed from cron.
No signs of external intrusion are found, so it seems to me it would be
a system issue or internal access to the console for some odd reason to
only kill this application nightly this week.
Here is the script that was written:
#!/bin/sh
outfile="/nsr_up.txt"
touch $outfile
echo "Number of processes on host total: `ps -ef | grep nsr | wc -l`"
>> $outfile
ps -ef | grep nsrd
if [ "$?" -eq 0 ]; then
echo "Exit value of last command was unsuccessful" >> $outfile
/opt/networker/bin/nsrd
/opt/networker/bin/nsrexecd
echo "" >> $outfile
echo "NSR processes started at `date`" >> $outfile
echo "Number of NSR process running on host total: `ps -ef | grep nsr
| wc -l`" >> $outfile
ps -ef | grep nsr >> $outfile
sleep 1
fi
mailx root@myhost.ext < $outfile
mailx me@externaladdr.ext < $outfile
rm $outfile
Can someone point out why:
1. These Legato issues are occurring
2. Why this script is overloading processes (nsrd loads at least one
nsrexecd and two nsrexex processes when started)
Thanks.
-
Re: HP-UX 11.x and Legato Networker 5.5
James wrote:
> Hello,
>
> I work for a company who uses Legato Networker to backup their HP-UX
> systems.
>
> The problem is Legato. It seems that on one particular server, Legato
> wants to fail at least once a week. Multiple sites are supported
> solely by me, and I have spent many weeks restoring Legato directories
> when it crashes or encounters wierd problems, and then updating the
> indexes and checking on the settings to make sure nothing has changed
> (online and offline notes are kept on the settings for each server).
>
> The thing that makes this difficult is that I am not allowed to install
> software such as snort, sudo or tripwire. Furthermore, it seems that
> although a limited number of people know the root password (i.e 1-2
> people plus myself, as I am root), certain things are happening that
> would require the root password. Physical security is not optimal, and
> I have to sleep a small number of hours to keep my sanity, otherwise I
> would be onsite or connected otherwise all the time.
>
> A recent issue that concerns me is the numbe of times it's primary
> daemons disappear from the process table, forciong me to go in and
> restart it manually and watch the system. I had written a script to
> automatically check and restart the processes, but it didn't work
> correctly so it was removed from cron.
>
> No signs of external intrusion are found, so it seems to me it would be
> a system issue or internal access to the console for some odd reason to
> only kill this application nightly this week.
>
> Here is the script that was written:
> #!/bin/sh
> outfile="/nsr_up.txt"
> touch $outfile
> echo "Number of processes on host total: `ps -ef | grep nsr | wc -l`"
> >> $outfile
> ps -ef | grep nsrd
> if [ "$?" -eq 0 ]; then
> echo "Exit value of last command was unsuccessful" >> $outfile
> /opt/networker/bin/nsrd
> /opt/networker/bin/nsrexecd
> echo "" >> $outfile
> echo "NSR processes started at `date`" >> $outfile
> echo "Number of NSR process running on host total: `ps -ef | grep nsr
> | wc -l`" >> $outfile
> ps -ef | grep nsr >> $outfile
> sleep 1
> fi
> mailx root@myhost.ext < $outfile
> mailx me@externaladdr.ext < $outfile
> rm $outfile
>
>
> Can someone point out why:
> 1. These Legato issues are occurring
> 2. Why this script is overloading processes (nsrd loads at least one
> nsrexecd and two nsrexex processes when started)
>
> Thanks.
BTW, the primary shell is /bin/sh.
A few other notes that I've encountered.
Legato continually has issues with Cleaning Tapes and predefined slots,
Read error: no such device or addess,
The client one week decides it's a server resulting in a license issue
that was resolved
Thanks in advance.
-
Re: HP-UX 11.x and Legato Networker 5.5
X-No-archive: yes
Hi James,
we are running Networker 7.1.3 on HPUX Itanium and beside the problem of
the non-existing persistent binding solution for tape devices it is
running well. Version 5.5 seems to be rather old, since 7.3.1 is the
current release. I am sorry that I could not help with your specific
problem but I would like to give You a link to a mailinglist where many
Legato networker users gave really helpful tips. I guess You will find a
helpful answer there. HTH
http://www.lsoft.com/scripts/wl.exe?...ERV.TEMPLE.EDU
Ines
Am 18.07.2006 hast Du geschrieben:
>James wrote:
>>
>> Can someone point out why:
>> 1. These Legato issues are occurring
>> 2. Why this script is overloading processes (nsrd loads at least one
>> nsrexecd and two nsrexex processes when started)
>>
>> Thanks.
>BTW, the primary shell is /bin/sh.
>A few other notes that I've encountered.
>Legato continually has issues with Cleaning Tapes and predefined slots,
>Read error: no such device or addess,
>The client one week decides it's a server resulting in a license issue
>that was resolved
-
Re: HP-UX 11.x and Legato Networker 5.5
James wrote:
> Hello,
>
> I work for a company who uses Legato Networker to backup their HP-UX
> systems.
>
> The problem is Legato. It seems that on one particular server, Legato
> wants to fail at least once a week. Multiple sites are supported
> solely by me, and I have spent many weeks restoring Legato directories
> when it crashes or encounters wierd problems, and then updating the
> indexes and checking on the settings to make sure nothing has changed
> (online and offline notes are kept on the settings for each server).
>
> The thing that makes this difficult is that I am not allowed to install
> software such as snort, sudo or tripwire. Furthermore, it seems that
> although a limited number of people know the root password (i.e 1-2
> people plus myself, as I am root), certain things are happening that
> would require the root password. Physical security is not optimal, and
> I have to sleep a small number of hours to keep my sanity, otherwise I
> would be onsite or connected otherwise all the time.
>
> A recent issue that concerns me is the numbe of times it's primary
> daemons disappear from the process table, forciong me to go in and
> restart it manually and watch the system. I had written a script to
> automatically check and restart the processes, but it didn't work
> correctly so it was removed from cron.
>
> No signs of external intrusion are found, so it seems to me it would be
> a system issue or internal access to the console for some odd reason to
> only kill this application nightly this week.
>
> Here is the script that was written:
> #!/bin/sh
> outfile="/nsr_up.txt"
> touch $outfile
> echo "Number of processes on host total: `ps -ef | grep nsr | wc -l`"
> >> $outfile
> ps -ef | grep nsrd
> if [ "$?" -eq 0 ]; then
> echo "Exit value of last command was unsuccessful" >> $outfile
> /opt/networker/bin/nsrd
> /opt/networker/bin/nsrexecd
> echo "" >> $outfile
> echo "NSR processes started at `date`" >> $outfile
> echo "Number of NSR process running on host total: `ps -ef | grep nsr
> | wc -l`" >> $outfile
> ps -ef | grep nsr >> $outfile
> sleep 1
> fi
> mailx root@myhost.ext < $outfile
> mailx me@externaladdr.ext < $outfile
> rm $outfile
>
>
> Can someone point out why:
> 1. These Legato issues are occurring
> 2. Why this script is overloading processes (nsrd loads at least one
> nsrexecd and two nsrexex processes when started)
>
> Thanks.
John, I have to agree with Ines that you're on a dangerously old
version of NetWorker. We're running the PA-RISC version of 7.10 under
HP-UX 11i and it's absolutely fine. I don't think (somebody correct me
if I'm wrong) that NetWorker 5.5 has been supported for a pretty long
time now.
When it came to a Digital UNIX implementation of NetWorker, 5.5 (right
around the time Legato brought the Digital code base in house), it was
the absolute buggiest version of NetWorker I've ever encountered, other
people's experiences may vary. Even a 5.x NetWorker storage node we
had running (at the same time) under HP-UX was buggy as all heck. If
my memory serves me, it would constantly fill up the /opt file system
with nsrjb core dumps. There may also be a bug on the HP-UX 5.5
version that stops you under certain circumstances from doing parallel
saves on a single client.
Seriously guy, look at upgrading NetWorker and also make sure you're
reasonably up to date on your HP-UX patch levels. See what the good
people at EMC/Legato are saying about whatever HP-UX patch levels
you're on - presumably while you upgrade NetWorker - because I doubt
they'll even want to talk to you at a 5.5 version unless you shovel out
extra dollars.
Speaking of which, have you looked to see if there were any patchsets
for 5.5 that may have addressed your problems?
Charles R. Whealton
Charles Whealton @ pleasedontspam.com
-
Re: HP-UX 11.x and Legato Networker 5.5
Chuck Whealton wrote:
> James wrote:
> > Hello,
> >
> > I work for a company who uses Legato Networker to backup their HP-UX
> > systems.
> >
> > The problem is Legato. It seems that on one particular server, Legato
> > wants to fail at least once a week. Multiple sites are supported
> > solely by me, and I have spent many weeks restoring Legato directories
> > when it crashes or encounters wierd problems, and then updating the
> > indexes and checking on the settings to make sure nothing has changed
> > (online and offline notes are kept on the settings for each server).
> >
> > The thing that makes this difficult is that I am not allowed to install
> > software such as snort, sudo or tripwire. Furthermore, it seems that
> > although a limited number of people know the root password (i.e 1-2
> > people plus myself, as I am root), certain things are happening that
> > would require the root password. Physical security is not optimal, and
> > I have to sleep a small number of hours to keep my sanity, otherwise I
> > would be onsite or connected otherwise all the time.
> >
> > A recent issue that concerns me is the numbe of times it's primary
> > daemons disappear from the process table, forciong me to go in and
> > restart it manually and watch the system. I had written a script to
> > automatically check and restart the processes, but it didn't work
> > correctly so it was removed from cron.
> >
> > No signs of external intrusion are found, so it seems to me it would be
> > a system issue or internal access to the console for some odd reason to
> > only kill this application nightly this week.
> >
> > Here is the script that was written:
> > #!/bin/sh
> > outfile="/nsr_up.txt"
> > touch $outfile
> > echo "Number of processes on host total: `ps -ef | grep nsr | wc -l`"
> > >> $outfile
> > ps -ef | grep nsrd
> > if [ "$?" -eq 0 ]; then
> > echo "Exit value of last command was unsuccessful" >> $outfile
> > /opt/networker/bin/nsrd
> > /opt/networker/bin/nsrexecd
> > echo "" >> $outfile
> > echo "NSR processes started at `date`" >> $outfile
> > echo "Number of NSR process running on host total: `ps -ef | grep nsr
> > | wc -l`" >> $outfile
> > ps -ef | grep nsr >> $outfile
> > sleep 1
> > fi
> > mailx root@myhost.ext < $outfile
> > mailx me@externaladdr.ext < $outfile
> > rm $outfile
> >
> >
> > Can someone point out why:
> > 1. These Legato issues are occurring
> > 2. Why this script is overloading processes (nsrd loads at least one
> > nsrexecd and two nsrexex processes when started)
> >
> > Thanks.
>
> John, I have to agree with Ines that you're on a dangerously old
> version of NetWorker. We're running the PA-RISC version of 7.10 under
> HP-UX 11i and it's absolutely fine. I don't think (somebody correct me
> if I'm wrong) that NetWorker 5.5 has been supported for a pretty long
> time now.
>
> When it came to a Digital UNIX implementation of NetWorker, 5.5 (right
> around the time Legato brought the Digital code base in house), it was
> the absolute buggiest version of NetWorker I've ever encountered, other
> people's experiences may vary. Even a 5.x NetWorker storage node we
> had running (at the same time) under HP-UX was buggy as all heck. If
> my memory serves me, it would constantly fill up the /opt file system
> with nsrjb core dumps. There may also be a bug on the HP-UX 5.5
> version that stops you under certain circumstances from doing parallel
> saves on a single client.
>
> Seriously guy, look at upgrading NetWorker and also make sure you're
> reasonably up to date on your HP-UX patch levels. See what the good
> people at EMC/Legato are saying about whatever HP-UX patch levels
> you're on - presumably while you upgrade NetWorker - because I doubt
> they'll even want to talk to you at a 5.5 version unless you shovel out
> extra dollars.
>
> Speaking of which, have you looked to see if there were any patchsets
> for 5.5 that may have addressed your problems?
>
> Charles R. Whealton
> Charles Whealton @ pleasedontspam.com
Chuck and Ines, thank you for your replies. I would have replied
sooner, but had personal issues to deal with the past couple of weeks.
Chuck, I had never thought about the parallelism issue. I do know core
files were created, and should really pay more attention to the opt
location for core files as well (usually found them in nsr/cores,
although not all those files are bad core files per se).
It is not (Legato) patched up to date as it is unsupported and the
company I work for is not big on having the latest and greatest patches
up to date installed. I can see rationale to their decision, but if it
were me, patches would be installed immediately.
The problems I have seen are multiple --
If the index files get too big too quick, it starts failing out on
jobs.
If not unmounted and someone goest to swap autoloader tapes, then it
starts causing a variety of problems, usually read/open errors/no such
device messages.
The cleaning tape is very particular, and has on occasion required it
to be suppressed, leading to tape problems.
Daemons stop working unexpectedly, causing automated jobs to fail
(however the script I had working was commented out of cron checks,
beacuse it started too many processes).
Clients would fail due to authentication problems, even if on the same
host.
One client all of a sudden acted as a server, and with no enebler code
for it stopped acting as a server (luckily still worked client backups
on it from the real server).
I have corrected these issues, but some of them repeat themselves on
occasion creating headaches (from bosses yelling about it).
Thanks for your replies, I will look closer at the patchsets.
-
Re: HP-UX 11.x and Legato Networker 5.5
James wrote:
> Chuck Whealton wrote:
> > James wrote:
> > > Hello,
> > >
> > > I work for a company who uses Legato Networker to backup their HP-UX
> > > systems.
> > >
> > > The problem is Legato. It seems that on one particular server, Legato
> > > wants to fail at least once a week. Multiple sites are supported
> > > solely by me, and I have spent many weeks restoring Legato directories
> > > when it crashes or encounters wierd problems, and then updating the
> > > indexes and checking on the settings to make sure nothing has changed
> > > (online and offline notes are kept on the settings for each server).
> > >
> > > The thing that makes this difficult is that I am not allowed to install
> > > software such as snort, sudo or tripwire. Furthermore, it seems that
> > > although a limited number of people know the root password (i.e 1-2
> > > people plus myself, as I am root), certain things are happening that
> > > would require the root password. Physical security is not optimal, and
> > > I have to sleep a small number of hours to keep my sanity, otherwise I
> > > would be onsite or connected otherwise all the time.
> > >
> > > A recent issue that concerns me is the numbe of times it's primary
> > > daemons disappear from the process table, forciong me to go in and
> > > restart it manually and watch the system. I had written a script to
> > > automatically check and restart the processes, but it didn't work
> > > correctly so it was removed from cron.
> > >
> > > No signs of external intrusion are found, so it seems to me it would be
> > > a system issue or internal access to the console for some odd reason to
> > > only kill this application nightly this week.
> > >
> > > Here is the script that was written:
> > > #!/bin/sh
> > > outfile="/nsr_up.txt"
> > > touch $outfile
> > > echo "Number of processes on host total: `ps -ef | grep nsr | wc -l`"
> > > >> $outfile
> > > ps -ef | grep nsrd
> > > if [ "$?" -eq 0 ]; then
> > > echo "Exit value of last command was unsuccessful" >> $outfile
> > > /opt/networker/bin/nsrd
> > > /opt/networker/bin/nsrexecd
> > > echo "" >> $outfile
> > > echo "NSR processes started at `date`" >> $outfile
> > > echo "Number of NSR process running on host total: `ps -ef | grep nsr
> > > | wc -l`" >> $outfile
> > > ps -ef | grep nsr >> $outfile
> > > sleep 1
> > > fi
> > > mailx root@myhost.ext < $outfile
> > > mailx me@externaladdr.ext < $outfile
> > > rm $outfile
> > >
> > >
> > > Can someone point out why:
> > > 1. These Legato issues are occurring
> > > 2. Why this script is overloading processes (nsrd loads at least one
> > > nsrexecd and two nsrexex processes when started)
> > >
> > > Thanks.
> >
> > John, I have to agree with Ines that you're on a dangerously old
> > version of NetWorker. We're running the PA-RISC version of 7.10 under
> > HP-UX 11i and it's absolutely fine. I don't think (somebody correct me
> > if I'm wrong) that NetWorker 5.5 has been supported for a pretty long
> > time now.
> >
> > When it came to a Digital UNIX implementation of NetWorker, 5.5 (right
> > around the time Legato brought the Digital code base in house), it was
> > the absolute buggiest version of NetWorker I've ever encountered, other
> > people's experiences may vary. Even a 5.x NetWorker storage node we
> > had running (at the same time) under HP-UX was buggy as all heck. If
> > my memory serves me, it would constantly fill up the /opt file system
> > with nsrjb core dumps. There may also be a bug on the HP-UX 5.5
> > version that stops you under certain circumstances from doing parallel
> > saves on a single client.
> >
> > Seriously guy, look at upgrading NetWorker and also make sure you're
> > reasonably up to date on your HP-UX patch levels. See what the good
> > people at EMC/Legato are saying about whatever HP-UX patch levels
> > you're on - presumably while you upgrade NetWorker - because I doubt
> > they'll even want to talk to you at a 5.5 version unless you shovel out
> > extra dollars.
> >
> > Speaking of which, have you looked to see if there were any patchsets
> > for 5.5 that may have addressed your problems?
> >
> > Charles R. Whealton
> > Charles Whealton @ pleasedontspam.com
>
> Chuck and Ines, thank you for your replies. I would have replied
> sooner, but had personal issues to deal with the past couple of weeks.
>
> Chuck, I had never thought about the parallelism issue. I do know core
> files were created, and should really pay more attention to the opt
> location for core files as well (usually found them in nsr/cores,
> although not all those files are bad core files per se).
>
> It is not (Legato) patched up to date as it is unsupported and the
> company I work for is not big on having the latest and greatest patches
> up to date installed. I can see rationale to their decision, but if it
> were me, patches would be installed immediately.
>
> The problems I have seen are multiple --
> If the index files get too big too quick, it starts failing out on
> jobs.
> If not unmounted and someone goest to swap autoloader tapes, then it
> starts causing a variety of problems, usually read/open errors/no such
> device messages.
> The cleaning tape is very particular, and has on occasion required it
> to be suppressed, leading to tape problems.
> Daemons stop working unexpectedly, causing automated jobs to fail
> (however the script I had working was commented out of cron checks,
> beacuse it started too many processes).
> Clients would fail due to authentication problems, even if on the same
> host.
> One client all of a sudden acted as a server, and with no enebler code
> for it stopped acting as a server (luckily still worked client backups
> on it from the real server).
>
> I have corrected these issues, but some of them repeat themselves on
> occasion creating headaches (from bosses yelling about it).
>
> Thanks for your replies, I will look closer at the patchsets.
James:
Hi, real quickly... The index problems you mentioned SHOULD be dealt
with in a later version of NetWorker. I >THINK< it was 6.x when they
started significantly trimming down index sizes, but it may have been
7.x. Probably doesn't matter since I >BELIEVE< 6.x is now unsupported
as well!
It's just another reason to get on a later, supported version.
I understand the rational behind your management not wanted to always
apply the latest patches as I go through it myself. However, in this
case I'd have to say that in my OWN OPINION (use your own judgement),
the problems with the application keeping your data safe (which one
would assume is your management's priority) are more significant at
this point - I mean you're operating on a long unsupported version of
NetWorker now.
Your data won't be backed up if your backup application dies in the
middle of the night or is still waiting for a tape mount the next
morning during prime time hours. That should also play into your
management's decision.
Good luck!
Charles R. Whealton
Charles Whealton @ pleasedontspam.com
-
Re: HP-UX 11.x and Legato Networker 5.5
James wrote:
> James wrote:
> > Hello,
> >
> > I work for a company who uses Legato Networker to backup their HP-UX
> > systems.
> >
> > The problem is Legato. It seems that on one particular server, Legato
> > wants to fail at least once a week. Multiple sites are supported
> > solely by me, and I have spent many weeks restoring Legato directories
> > when it crashes or encounters wierd problems, and then updating the
> > indexes and checking on the settings to make sure nothing has changed
> > (online and offline notes are kept on the settings for each server).
> >
> > The thing that makes this difficult is that I am not allowed to install
> > software such as snort, sudo or tripwire. Furthermore, it seems that
> > although a limited number of people know the root password (i.e 1-2
> > people plus myself, as I am root), certain things are happening that
> > would require the root password. Physical security is not optimal, and
> > I have to sleep a small number of hours to keep my sanity, otherwise I
> > would be onsite or connected otherwise all the time.
> >
> > A recent issue that concerns me is the numbe of times it's primary
> > daemons disappear from the process table, forciong me to go in and
> > restart it manually and watch the system. I had written a script to
> > automatically check and restart the processes, but it didn't work
> > correctly so it was removed from cron.
> >
> > No signs of external intrusion are found, so it seems to me it would be
> > a system issue or internal access to the console for some odd reason to
> > only kill this application nightly this week.
> >
> > Here is the script that was written:
> > #!/bin/sh
> > outfile="/nsr_up.txt"
> > touch $outfile
> > echo "Number of processes on host total: `ps -ef | grep nsr | wc -l`"
> > >> $outfile
> > ps -ef | grep nsrd
> > if [ "$?" -eq 0 ]; then
> > echo "Exit value of last command was unsuccessful" >> $outfile
> > /opt/networker/bin/nsrd
> > /opt/networker/bin/nsrexecd
> > echo "" >> $outfile
> > echo "NSR processes started at `date`" >> $outfile
> > echo "Number of NSR process running on host total: `ps -ef | grep nsr
> > | wc -l`" >> $outfile
> > ps -ef | grep nsr >> $outfile
> > sleep 1
> > fi
> > mailx root@myhost.ext < $outfile
> > mailx me@externaladdr.ext < $outfile
> > rm $outfile
> >
> >
> > Can someone point out why:
> > 1. These Legato issues are occurring
> > 2. Why this script is overloading processes (nsrd loads at least one
> > nsrexecd and two nsrexex processes when started)
> >
> > Thanks.
>
> BTW, the primary shell is /bin/sh.
>
> A few other notes that I've encountered.
> Legato continually has issues with Cleaning Tapes and predefined slots,
> Read error: no such device or addess,
> The client one week decides it's a server resulting in a license issue
> that was resolved
>
> Thanks in advance.
James... HP-UX 11.x includes a "template" in /sbin/init.d that you can
use to put together a decent Legato NetWorker startup. Don't forget to
include the runlevel links to make it work.
As far as your NetWorker processes go, not having one of my company's
HP-UX systems in front of me, to the best of my memory, it SHOULD be:
Server - 1 nsrd process, 2 nsrexecd processes
Client - 2 nsrexecd processes
Remember, I haven't used 5.5 for years, so take that with a grain of
salt.
Good luck!
Charles R. Whealton
Charles Whealton @ pleasedontspam.com