maintenance downtime on our supercomputer - VMS

This is a discussion on maintenance downtime on our supercomputer - VMS ; Below is a message from the administrator of our supercomputer, www.bris.ac.uk . Clearly, VMS cluster rolling update features would be very useful here. Although I understand that cluster shutdown is also necessary in some cases. [...] You have received this ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: maintenance downtime on our supercomputer

  1. maintenance downtime on our supercomputer

    Below is a message from the administrator of our supercomputer, www.bris.ac.uk.
    Clearly, VMS cluster rolling update features would be very useful here.
    Although I understand that cluster shutdown is also necessary in some
    cases.

    [...]

    You have received this email because you have an account on bluecrystal,
    a beowulf cluster managed by the Advanced Computing Research Centre,
    Bristol University.

    Greetings,

    It is necessary to schedule some maintenance downtime on bluecrystal
    to update the parallel filesystem software, amongst other things.
    The system will be shutdown on Wednesday 26th September (obviously
    all running jobs will be killed when this happens). Hopefully this
    work will be completed on the same day.

    We apologise for any inconvenience caused by this essential
    maintenance work.

    [...]

    --
    Anton Shterenlikht
    Room 2.6, Queen's Building
    Mech Eng Dept
    Bristol University
    University Walk, Bristol BS8 1TR, UK
    Tel: +44 (0)117 928 8233
    Fax: +44 (0)117 929 4423

  2. Re: maintenance downtime on our supercomputer

    On 09/13/07 10:31, Anton Shterenlikht wrote:
    > Below is a message from the administrator of our supercomputer, www.bris.ac.uk.
    > Clearly, VMS cluster rolling update features would be very useful here.
    > Although I understand that cluster shutdown is also necessary in some
    > cases.
    >
    > [...]
    >
    > You have received this email because you have an account on bluecrystal,
    > a beowulf cluster managed by the Advanced Computing Research Centre,
    > Bristol University.
    >
    > Greetings,
    >
    > It is necessary to schedule some maintenance downtime on bluecrystal
    > to update the parallel filesystem software, amongst other things.


    How do Beowulf clusters sync file accesses? Thru the master node?

    > The system will be shutdown on Wednesday 26th September (obviously
    > all running jobs will be killed when this happens). Hopefully this
    > work will be completed on the same day.
    >
    > We apologise for any inconvenience caused by this essential
    > maintenance work.


    --
    Ron Johnson, Jr.
    Jefferson LA USA

    Give a man a fish, and he eats for a day.
    Hit him with a fish, and he goes away for good!

  3. Re: maintenance downtime on our supercomputer

    Anton Shterenlikht wrote:
    > Below is a message from the administrator of our supercomputer, www.bris.ac.uk.


    Oh, that is one system Mr Vaxman probably wouldn't want to get near :-)
    :-) :-) :-) :-) :-)

  4. Re: maintenance downtime on our supercomputer

    On Thu, Sep 13, 2007 at 11:43:31AM -0500, Ron Johnson wrote:
    > On 09/13/07 10:31, Anton Shterenlikht wrote:
    > >
    > > It is necessary to schedule some maintenance downtime on bluecrystal
    > > to update the parallel filesystem software, amongst other things.

    >
    > How do Beowulf clusters sync file accesses? Thru the master node?


    You mean several processes to the same file? That's not easy as far
    as I understand. Or rather it is discouraged due to significant overhead.
    Therefore typically one has to have a separate file
    for each copy of the program (for each core). In cases where sync file
    access is requred, this will probably be done by the master node.

    One example - parallel matrix operation. In the beginning the master
    node will read the matrix data from a file and split into N chunks
    according to the number of cores used in the analysis. The code executed
    on each node might, if necessary, create a temp file used exclusively
    by this node to store intermediate data. (Obviously it is best to keep
    all data in RAM, or even better in cache if it fits). When the computation is
    complete, all slave nodes will pass the data to the master which will
    then combine the matrix back and write to the file.

    However, I might be wrong. You probably know this area better than me..

    --
    Anton Shterenlikht
    Room 2.6, Queen's Building
    Mech Eng Dept
    Bristol University
    University Walk, Bristol BS8 1TR, UK
    Tel: +44 (0)117 928 8233
    Fax: +44 (0)117 929 4423

+ Reply to Thread