Terrible performance with very large /etc/group file - Aix

This is a discussion on Terrible performance with very large /etc/group file - Aix ; Hi I'm implementing a clustered server AIX 5.3ML04 system to host an Oracle RAC application. The oracle application is quite old and uses external (operating system) authentication for the application user accounts. What this amounts to is that all of ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: Terrible performance with very large /etc/group file

  1. Terrible performance with very large /etc/group file

    Hi

    I'm implementing a clustered server AIX 5.3ML04 system to host an
    Oracle RAC application. The oracle application is quite old and uses
    external (operating system) authentication for the application user
    accounts.
    What this amounts to is that all of our users will require an AIX
    password and associated group entries. The system will have
    approximately 10000 users, which will all need to be members of core
    AIX groups.

    Being aware of the limits of the /etc/group file. (max chars per line
    etc) I have implemented, with the nod of IBM consultation, a solution
    that I have seen referenced previously in the newsgroups.

    Each user is/maybe a member of 10 groups.
    The application makes a logical separation of the users into 500
    "sites". I have used the logical separation to create 500 entries in
    the /etc/group file for each of the 10 groups every user may need to
    belong to. This ammounts to 5000 individual group entries in the group
    file.
    However, for each group of 10 groups for each logical "site", I have
    ammended the "group name" to make it unique, but have MAINTAINED the
    GID. (see snippet)

    -----------------------------------------------------------------------------
    system:!:0:root,mailgway,sdagent,csmftapi,oracle,g ifted,ss3
    staff:!:1racle,sfa,unicom,netinst,pwrchute,gifted,ss3
    bin:!:2:root,bin
    sys:!:3:root,bin,sys
    adm:!:4:bin,adm
    uucp:!:5:uucp
    mail:!:6:
    security:!:7:root,gifted,ss3
    cron:!:8:root,gifted,ss3
    printq:!:9:gifted,ss3
    audit:!:10:root,gifted,ss3
    shutdown:!:21:gifted,ss3
    ecs:!:28:
    nobody:!:4294967294:nobody,lpd
    usr:!:100:
    backup:!:11racle,gifted,ss3
    unicom:!:200:unicom,root,gifted,ss3
    sfa:!:201:sfa
    dba:!:202racle,root,gifted,ss3
    opadmin:!:204pspool,icl9dlm,icl9dls,public,opadmin,gifted,ss3
    iclmail:!:205:iclmail
    icl9sch:!:206:icl9sch
    other:!:207:
    lp:!:12:gifted,ss3
    sshd:!:13:sshd

    staff-100901:!:1:smcadmin
    printq-100901:!:9:smcadmin
    lp-100901:!:12:
    system-100901:!:0:smcadmin
    security-100901:!:7:smcadmin
    cron-100901:!:8:smcadmin
    audit-100901:!:10:smcadmin
    shutdown-100901:!:21:smcadmin
    backup-100901:!:11:smcadmin
    unicom-100901:!:200:smcadmin
    dba-100901:!:202:smcadmin
    sfa-100901:!:201:smcadmin
    opadmin-100901:!:204:smcadmin

    staff-200101:!:1:
    printq-200101:!:9:
    lp-200101:!:12:
    system-200101:!:0:
    security-200101:!:7:
    cron-200101:!:8:
    audit-200101:!:10:
    shutdown-200101:!:21:
    backup-200101:!:11:
    unicom-200101:!:200:
    dba-200101:!:202:
    sfa-200101:!:201:
    opadmin-200101:!:204:

    staff-200102:!:1:
    printq-200102:!:9:
    lp-200102:!:12:
    system-200102:!:0:
    security-200102:!:7:
    cron-200102:!:8:
    audit-200102:!:10:
    shutdown-200102:!:21:
    backup-200102:!:11:
    unicom-200102:!:200:
    dba-200102:!:202:
    sfa-200102:!:201:
    opadmin-200102:!:204:

    etc
    -----------------------------------------------------------

    When adding users to the system we will add their group entry to the
    particular group line for their "site". But for practical use, the gid
    will equate to the "MASTER" gid for that "system wide" group.

    Benifits of this are that it will allow many more users to be group
    members,avoiding the 2048 char limit on each group line.

    Here's the problem.

    During development we've been working happily with 32 "logical sites"
    and groups set up as above. There were no obvious issues.
    We have now implemented the full 500 "sites" and have immediately hit
    a problem with performance.

    ((( It might be worth mentioning that the group file actuially lives
    on a GPFS shared filesystem and is symbolically linked to /etc/group
    on each of the clustered servers.)))

    On server startup, telnetd takes ages to start. The overall CPU usage
    goes up to 100% with a av runqueue of 90+!

    Telnet never actually starts properly, so from the console only I can
    run commands like "id user" and it will take 2 minutes to return the
    correct answer.

    When I truss "id" it looks like the group file is being searched
    sequentially. Is this taking the time?

    How can I apply an index to the group file?
    Could there be locking issues?

    Any suggestions welcome!

    Help!

    Rob


  2. Re: Terrible performance with very large /etc/group file

    I have improved the performance issues by indexing the group file
    using:

    mkpasswd -f

    This creates a set of indexes for passwd, group, lastlog and other
    security related files.

    I've not come accross this before, and as I'm looking at group issues,
    the "mkpasswd" cmd didn't show in my searches.

    Anyway. improved the response of "id smcadmin" from 1:48 mins to 1.5
    seconds

    Cheers

    Rob



  3. Re: Terrible performance with very large /etc/group file

    openstream rob schrieb:
    > Hi
    >
    > I'm implementing a clustered server AIX 5.3ML04 system to host an
    > Oracle RAC application. The oracle application is quite old and uses
    > external (operating system) authentication for the application user
    > accounts.
    > What this amounts to is that all of our users will require an AIX
    > password and associated group entries. The system will have
    > approximately 10000 users, which will all need to be members of core
    > AIX groups.
    > [SNIP]


    Hi Rob!

    I know you have solved your problem wit mkpasswd -r, but i have a
    question: why you dont use the "built in" ldap server to maintain your
    users and groups? You can "cluster" ldap and it is using a db2 at the
    backend, so it could be fast.

    I am asking because i would like to implement a three ldap server with
    peer-to-peer replication in a small environment with about 50 user and
    20 groups). Have you evaluated some show stoppers?

    With best Regards
    Dieter



  4. Re: Terrible performance with very large /etc/group file

    Hi Dieter

    I have not evaluated the use of LDAP at this stage. Our project is an
    evolution of a distributed system of 500 servers where the you of
    simple password files has been fine. To avoid recoding the system
    administration elements( and other significant dependencies) we are
    going to release our initial build using the same mechanism. We did
    evaluate the use of Oracle SSO with Kerberos, but our client
    infrastructure suppliers were unable to provide the required
    envronment to make this a success.

    For future releases we will need to investigate the other options,
    such as LDAP, particularly if performance is severly impacted.

    Regards

    Rob


  5. Re: Terrible performance with very large /etc/group file

    openstream rob wrote:
    > Hi Dieter
    >
    > I have not evaluated the use of LDAP at this stage. Our project is an
    > evolution of a distributed system of 500 servers where the you of
    > simple password files has been fine. To avoid recoding the system
    > administration elements( and other significant dependencies) we are
    > going to release our initial build using the same mechanism. We did
    > evaluate the use of Oracle SSO with Kerberos, but our client
    > infrastructure suppliers were unable to provide the required
    > envronment to make this a success.
    >
    > For future releases we will need to investigate the other options,
    > such as LDAP, particularly if performance is severly impacted.
    >
    > Regards
    >
    > Rob
    >

    Thx for the explanation!
    Dieter

  6. Re: Terrible performance with very large /etc/group file

    openstream rob wrote:
    > I have improved the performance issues by indexing the group file
    > using:
    >
    > mkpasswd -f
    >
    > This creates a set of indexes for passwd, group, lastlog and other
    > security related files.


    fyi.. i believe i've run into a performance regression with the lastlog
    index handling related to 5.3, ftp and a large number of accounts
    (20k+). basically, with ftp sessions per-session io to the lastlog index
    file (/etc/security/lastlog.idx iirc) is MASSIVE.. to the tune of
    saturating the bootdisks. it's been orders of magnatude higher than
    either 5.3 w/o indexing or 5.2 w/indexing.. say, 8secs under 5.2, ~50
    secs for 5.3 w/o indexing, ~2 MINUTES for 5.3 w/indexing using identical
    testing across the platforms. i'm currently working the issue w/ibm but
    it's going slowly.. and it's blocking our OS update to 5.3 on a
    particular host.

    just so you know.. may not want to count on mkpasswd to speed your auth
    info handling (at least at present, though i think something is
    seriously broke).

    -r

  7. Re: Terrible performance with very large /etc/group file

    On Sep 7, 12:34 am, no body wrote:
    > openstream rob wrote:
    > > I have improved the performance issues by indexing the group file
    > > using:

    >
    > > mkpasswd -f

    >
    > > This creates a set of indexes for passwd, group, lastlog and other
    > > security related files.

    >
    > fyi.. i believe i've run into a performance regression with the lastlog
    > index handling related to 5.3, ftp and a large number of accounts
    > (20k+). basically, with ftp sessions per-session io to the lastlog index
    > file (/etc/security/lastlog.idx iirc) is MASSIVE.. to the tune of
    > saturating the bootdisks. it's been orders of magnatude higher than
    > either 5.3 w/o indexing or 5.2 w/indexing.. say, 8secs under 5.2, ~50
    > secs for 5.3 w/o indexing, ~2 MINUTES for 5.3 w/indexing using identical
    > testing across the platforms. i'm currently working the issue w/ibm but
    > it's going slowly.. and it's blocking our OS update to 5.3 on a
    > particular host.
    >
    > just so you know.. may not want to count on mkpasswd to speed your auth
    > info handling (at least at present, though i think something is
    > seriously broke).


    Due to massive ftp logins we had problems with logins because the /
    var/adm/wtmp was getting pretty large.

    Might worth a look.

    regards
    Hajo


  8. Re: Terrible performance with very large /etc/group file

    Hajo Ehlers wrote:
    > Due to massive ftp logins we had problems with logins because the /
    > var/adm/wtmp was getting pretty large.
    >
    > Might worth a look.


    we passed those kinda problems a while ago... we're managing the wtmp
    file daily to prevent that. filemon was showing the lastlog index taking
    the brunt.. the IO count/amount between a 5.2 indexed and 5.3
    non-indexed to the / filesystem far less than 5.3 w/indexing. as in the
    orders of magnatudes of 100x-1000x.

    -r

  9. Re: Terrible performance with very large /etc/group file

    You can configure AIX to use LDAP and then you DON't need to change or
    configure your apps. I've set up myself AIX servers like LDAP clients,
    and NIS clients, and everything works.


+ Reply to Thread