Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ? - Unix

This is a discussion on Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ? - Unix ; System is Tru64 5.1B running on a cluster. We have monitoring scripts checking for idle users which picked up the following: [4]swips2:/usr/users/jamesb # w |egrep "[d]ays|User" 11:58 up 29 days, 2:44, 69 users, load average: 3.10, 3.09, 3.04 User tty ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ?

  1. Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ?

    System is Tru64 5.1B running on a cluster.

    We have monitoring scripts checking for idle users which picked up the
    following:

    [4]swips2:/usr/users/jamesb # w |egrep "[d]ays|User"
    11:58 up 29 days, 2:44, 69 users, load average: 3.10, 3.09, 3.04
    User tty from login@ idle JCPU PCPU what
    YENQP1 pts/162 192.168.16.60 23:06 11days

    This suggests there should be an idle process for this connection, but
    there isn't:

    [4]swips2:/usr/users/jamesb # ps -ef | grep [Y]ENQ
    [4]swips2:/usr/users/jamesb #


    I think this means that the utmp file is incorrect, I cannot reboot
    this machine (its a 24/7 service and this is only a minor irritation),
    so is there a way to refresh/rebuild utmp or fix this another way ?

    Thanks.

  2. Re: Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ?

    James Blackmore wrote:
    >
    > We have monitoring scripts checking for idle users which picked up

    the
    > following:
    >
    > [4]swips2:/usr/users/jamesb # w |egrep "[d]ays|User"
    > 11:58 up 29 days, 2:44, 69 users, load average: 3.10, 3.09, 3.04
    > User tty from login@ idle JCPU PCPU

    what
    > YENQP1 pts/162 192.168.16.60 23:06 11days


    Not a particularly good method to use, as you have
    discovered. What you have enocuntered is common to
    all versions of Unix and Unix-alikes as far as I know.

    > This suggests there should be an idle process for this connection,

    but
    > there isn't:
    >
    > [4]swips2:/usr/users/jamesb # ps -ef | grep [Y]ENQ
    > [4]swips2:/usr/users/jamesb #
    >
    > I think this means that the utmp file is incorrect


    Define "correct". Login sessions log in utmp. Login
    sessions that exit gracefully log in utmp. Login
    sessions that are killed ungracefully do not log in
    utmp.

    The most typical way these ghosts are created is someone
    exitting their windowing session without exitting their
    login sessions first. It used to happen under various
    X11 window managers but eventually they switched to
    more gracefull kill methods. It still happens when
    folks exit their Windows login while a Unix window is
    open.

    > I cannot reboot
    > this machine (its a 24/7 service and this is only a minor

    irritation),
    > so is there a way to refresh/rebuild utmp or fix this another way ?


    First thing, you already know this is an issue so modify
    your script. If there are no processes, move on to the
    next user.

    Next thing, if there are plenty of logins on the host
    they eventually clean up on their own. Each login
    session takes an unused pty and when there enough
    logins to reach the pty with the ghost the ghost goes
    away.

    Last thing, if you really want to clean-up utmp, there
    are various programs on various freeware sites. Look
    for "fix utmp" and so on on your favorite freeware
    site.


  3. Re: Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ?

    Thanks Doug,

    > Define "correct". Login sessions log in utmp. Login
    > sessions that exit gracefully log in utmp. Login
    > sessions that are killed ungracefully do not log in
    > utmp.


    Thanks, I didn't realise this, and this script has been running for 2
    years and this is the first time this has occured so I can only assume
    Wintegrate/Powerterm (the common clients here) are reasonably good at
    exiting cleanly even when window is closed.

    > First thing, you already know this is an issue so modify
    > your script. If there are no processes, move on to the
    > next user.


    Thanks, I will do this, I might even drop the w | grep days altogether
    and use STIME on a ps -ef, something like:

    ps -ef | grep `date +'%b'` | egrep -v '`date +"%b %e"`'

    > Next thing, if there are plenty of logins on the host
    > they eventually clean up on their own. Each login
    > session takes an unused pty and when there enough
    > logins to reach the pty with the ghost the ghost goes
    > away.


    I don't understand why this didn't happen then, as we have several
    hundred logins a day, so surely it should have been re-used. From
    midnight to 9am today we have already had 200 logins, and the busy
    time starts at 9am, so in 11 days we should have had several thousand
    logins ?

    [4]swips2:/ # date
    Fri Mar 18 08:47:16 GMT 2005
    [4]swips2:/ # last | tail -1
    wtmp begins Fri Mar 18 00:02
    [4]swips2:/ # last | grep -v ftp | wc
    197 1956 14481

    I wonder if the pty is not properly returned to 'free list' in this
    'unclean exit' case, or this would have been cleaned up in 11 days I
    think.

    > Last thing, if you really want to clean-up utmp, there
    > are various programs on various freeware sites. Look
    > for "fix utmp" and so on on your favorite freeware
    > site.


    Thanks, but user accounting information is not too critical, so once I
    was sure this was just an 'incorrect' utmp file I flushed it with
    logclean.

    All users are kicked out for a nightly 2am backup anyway, so I can
    easily check for any 'idle' sessions manually from before then which
    stayed up, and the accounting info will be correct from now on (till
    the next time).

    Thanks for the response though, all very useful info !

    James.

  4. Re: Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ?

    James Blackmore wrote:
    > Doug Freyubrger wrote:
    >
    > > Login sessions log in utmp. Login
    > > sessions that exit gracefully log in utmp. Login
    > > sessions that are killed ungracefully do not log in
    > > utmp.

    >
    > Thanks, I didn't realise this, and this script has been running for 2
    > years and this is the first time this has occured so I can only

    assume
    > Wintegrate/Powerterm (the common clients here) are reasonably good at
    > exiting cleanly even when window is closed.


    Sounds like it. If you're using W2Kmost of the
    time folks will logout and it appears that is
    being handled gracefully. It looks like someone
    powered off without logging out or some sort of
    application crash happened.

    > > Next thing, if there are plenty of logins on the host
    > > they eventually clean up on their own. Each login
    > > session takes an unused pty and when there enough
    > > logins to reach the pty with the ghost the ghost goes
    > > away.

    >
    > I don't understand why this didn't happen then, as we have several
    > hundred logins a day, so surely it should have been re-used. From
    > midnight to 9am today we have already had 200 logins, and the busy
    > time starts at 9am, so in 11 days we should have had several thousand
    > logins ?


    It isn't quite just the number of logins that
    determines pty recycling. Each session tends
    to use the lowest available numbered pty,
    though occasionaly a race condition will have
    a session skip a couple. So what really counts
    for reclaiming these ghosts is the peak number
    of sessions not the raw number.

    > I wonder if the pty is not properly returned to 'free list' in this
    > 'unclean exit' case, or this would have been cleaned up in 11 days I
    > think.


    It's just a missing entry in utmp and ownerships
    of the device pair in /dev. Not all that much
    to the clean-up involved. A process no longer
    has the device open so a scan will show it
    available.

    So what I think happened: Your app is usually good
    about exitting gracefully so it gets logged in
    utmp. On this occasion there was an application
    crash, or kill -9 rahter than -15, or a power off
    without logout or similar. It happened to be a
    session with a high pty number because the login
    happened to happen during a monthly peak.

    > > Last thing, if you really want to clean-up utmp, there
    > > are various programs on various freeware sites. Look
    > > for "fix utmp" and so on on your favorite freeware
    > > site.

    >
    > Thanks, but user accounting information is not too critical, so once

    I
    > was sure this was just an 'incorrect' utmp file I flushed it with
    > logclean.


    Yup. Logclean is just fine for utmp clean-up.
    As long as there aren't any processes you know
    it is really available.


+ Reply to Thread