is this a race condition? - VMS

This is a discussion on is this a race condition? - VMS ; 23 years ago, fresh out of university, my first job was maintaining FORTRAN on a VAX. Last month I started a new job, and I am . . . maintaining FORTRAN on a VAX. (sigh) The operating-system part is nice, ...

+ Reply to Thread
Results 1 to 12 of 12

Thread: is this a race condition?

  1. is this a race condition?

    23 years ago, fresh out of university, my first job was maintaining
    FORTRAN on a VAX.
    Last month I started a new job, and I am . . . maintaining FORTRAN on
    a VAX. (sigh)
    The operating-system part is nice, the language is not. Check this
    out:

    SUBROUTINE FOREVER

    IMPLICIT NONE

    INTEGER*4 I
    INTEGER*4 STATUS
    INTEGER*2 IOSB(4)
    INTEGER*2 READCHAN
    INTEGER*2 WRITECHAN
    LOGICAL*1 MORE
    BYTE READBUFFER(64)
    BYTE WRITEBUFFER(32)

    STATUS = SYS$CREMBX(,READCHAN,%VAL(64),%VAL(128),,,'MYMBX')
    IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    STATUS = SYS$ASSIGN('YOURMBX',WRITECHAN)
    IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))

    WHILE (.TRUE.)
    STATUS = SYS$QIOW(,%VAL(READCHAN),%VAL(IO$_READVBLK),IOSB,, ,
    %REF(READBUFFER),%VAL(64),,,,)
    IF (STATUS) THEN
    IF (IOSB(1)) THEN
    MORE = .TRUE.
    WHILE (MORE)
    FOR I=1,READBUFFER(4)
    IF (something) THEN MORE = .FALSE.
    END
    END
    STATUS = SYS$QIO(,%VAL(WRITECHAN),%VAL(IO$_WRITEVBLK .OR.
    IO$M_NOW),IOSB,,,
    %REF(WRITEBUFFER),%VAL(32),,,,)
    IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    ENDIF
    ENDIF
    END


    This particular bit of code went into an infinite loop today.
    I admit I've abstracted a lot of specifics but the gist remains,
    so please don't wonder about the nested WHILE and FOR
    that seem to be pointless, they're actually fine. The crucial
    point is that the inner test is assumed to be guaranteed ---
    i.e., eventually the "something" will be true, causing the WHILE
    to exit. Well, today it didn't, and my main suspicion is a
    possible race condition resulting from two perfectly valid I/Os.
    Does anyone else see it?

    ok
    dpm


  2. Re: is this a race condition?

    On Aug 22, 9:01 pm, "dpm_goo...@myths.com"
    wrote:
    > 23 years ago, fresh out of university, my first job was maintaining
    > FORTRAN on a VAX.
    > Last month I started a new job, and I am . . . maintaining FORTRAN on
    > a VAX. (sigh)
    > The operating-system part is nice, the language is not. Check this
    > out:
    >
    > SUBROUTINE FOREVER
    >
    > IMPLICIT NONE
    >
    > INTEGER*4 I
    > INTEGER*4 STATUS
    > INTEGER*2 IOSB(4)
    > INTEGER*2 READCHAN
    > INTEGER*2 WRITECHAN
    > LOGICAL*1 MORE
    > BYTE READBUFFER(64)
    > BYTE WRITEBUFFER(32)
    >
    > STATUS = SYS$CREMBX(,READCHAN,%VAL(64),%VAL(128),,,'MYMBX')
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    > STATUS = SYS$ASSIGN('YOURMBX',WRITECHAN)
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    >
    > WHILE (.TRUE.)
    > STATUS = SYS$QIOW(,%VAL(READCHAN),%VAL(IO$_READVBLK),IOSB,, ,
    > %REF(READBUFFER),%VAL(64),,,,)
    > IF (STATUS) THEN
    > IF (IOSB(1)) THEN
    > MORE = .TRUE.
    > WHILE (MORE)
    > FOR I=1,READBUFFER(4)
    > IF (something) THEN MORE = .FALSE.
    > END
    > END
    > STATUS = SYS$QIO(,%VAL(WRITECHAN),%VAL(IO$_WRITEVBLK .OR.
    > IO$M_NOW),IOSB,,,
    > %REF(WRITEBUFFER),%VAL(32),,,,)
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    > ENDIF
    > ENDIF
    > END
    >
    > This particular bit of code went into an infinite loop today.
    > I admit I've abstracted a lot of specifics but the gist remains,
    > so please don't wonder about the nested WHILE and FOR
    > that seem to be pointless, they're actually fine. The crucial
    > point is that the inner test is assumed to be guaranteed ---
    > i.e., eventually the "something" will be true, causing the WHILE
    > to exit. Well, today it didn't, and my main suspicion is a
    > possible race condition resulting from two perfectly valid I/Os.
    > Does anyone else see it?
    >
    > ok
    > dpm


    dpm,

    I would not consider the language an issue. The features that attract
    my eye on first glance would be essentially the same in any language
    (down to and including MACRO-32).

    In addition to the QIO/QIOW problem cited earlier in this thread, I
    would be concerned about the use of a default event flag. In
    particular, WAIT operations are on an event flag, not anything else.
    If two things use the event flag, then they will both release. In this
    case, I wonder if the QIO is resulting in the event flag being tripped
    at an odd moment.

    Other things may have been deleted from the code during the
    abstraction process, but that does look suspicious to my eye (fair
    warning: I am writing this following a 16+hour day).

    - Bob Gezelter, http://www.rlgsc.com


  3. Re: is this a race condition?

    On Aug 22, 9:55 pm, Bob Gezelter wrote:

    > I would not consider the language an issue. The features that attract
    > my eye on first glance would be essentially the same in any language
    > (down to and including MACRO-32).


    I will not rant about C being better than FORTRAN.
    I will not rant about C being better than FORTRAN.
    I will not rant about C being better than FORTRAN.

  4. Re: is this a race condition?

    On Aug 22, 10:04 pm, "dpm_goo...@myths.com"
    wrote:
    > On Aug 22, 9:55 pm, Bob Gezelter wrote:
    >
    > > I would not consider the language an issue. The features that attract
    > > my eye on first glance would be essentially the same in any language
    > > (down to and including MACRO-32).

    >
    > I will not rant about C being better than FORTRAN.
    > I will not rant about C being better than FORTRAN.
    > I will not rant about C being better than FORTRAN.
    > .
    > .
    > .
    >
    > > In addition to the QIO/QIOW problem cited earlier in this thread, I
    > > would be concerned about the use of a default event flag. In
    > > particular, WAIT operations are on an event flag, not anything else.
    > > If two things use the event flag, then they will both release. In this
    > > case, I wonder if the QIO is resulting in the event flag being tripped
    > > at an odd moment.

    >
    > That is _exactly_ what I'm thinking. Both operations are using the
    > same
    > event flag (zero) and the same area of memory as their I/O status
    > block,
    > so if the WRITEVBLK happens to delay completing until the READVBLK
    > has been queued, the QIO() will cause the QIOW() to return even though
    > no read has actually occurred. At that point the data in the read-
    > buffer
    > being used for loop control will be zero, meaning the FOR loop is
    > never
    > entered, causing MORE to remain .TRUE. . . . voila! infinite loop.
    >
    > It's easy enough to fix, of course, if that's the real problem. I
    > can't
    > think of a way to test it, though.
    >
    > Thanks for the help, Bob and Steven.
    >
    > ok
    > dpm


    In addition... in the code as posted, if status from the first qio is
    ever false (bad channel, quota) then it will also loop as true remains
    true.

    Of course that could easily be a consequence of overly excessive code
    path reduction for purpose of sharing here, leaving out a signal on
    the else branch.

    fwiw,
    Hein.



  5. Re: is this a race condition?

    Using no event flag (= event flag 0) for QIO + QIOW is suspect. Try with
    event flag EFN$C_ENF.

    regards
    Walter

    schrieb im Newsbeitrag
    news:1187830870.915352.211590@e9g2000prf.googlegro ups.com...
    > 23 years ago, fresh out of university, my first job was maintaining
    > FORTRAN on a VAX.
    > Last month I started a new job, and I am . . . maintaining FORTRAN on
    > a VAX. (sigh)
    > The operating-system part is nice, the language is not. Check this
    > out:
    >
    > SUBROUTINE FOREVER
    >
    > IMPLICIT NONE
    >
    > INTEGER*4 I
    > INTEGER*4 STATUS
    > INTEGER*2 IOSB(4)
    > INTEGER*2 READCHAN
    > INTEGER*2 WRITECHAN
    > LOGICAL*1 MORE
    > BYTE READBUFFER(64)
    > BYTE WRITEBUFFER(32)
    >
    > STATUS = SYS$CREMBX(,READCHAN,%VAL(64),%VAL(128),,,'MYMBX')
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    > STATUS = SYS$ASSIGN('YOURMBX',WRITECHAN)
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    >
    > WHILE (.TRUE.)
    > STATUS = SYS$QIOW(,%VAL(READCHAN),%VAL(IO$_READVBLK),IOSB,, ,
    > %REF(READBUFFER),%VAL(64),,,,)
    > IF (STATUS) THEN
    > IF (IOSB(1)) THEN
    > MORE = .TRUE.
    > WHILE (MORE)
    > FOR I=1,READBUFFER(4)
    > IF (something) THEN MORE = .FALSE.
    > END
    > END
    > STATUS = SYS$QIO(,%VAL(WRITECHAN),%VAL(IO$_WRITEVBLK .OR.
    > IO$M_NOW),IOSB,,,
    > %REF(WRITEBUFFER),%VAL(32),,,,)
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    > ENDIF
    > ENDIF
    > END
    >
    >
    > This particular bit of code went into an infinite loop today.
    > I admit I've abstracted a lot of specifics but the gist remains,
    > so please don't wonder about the nested WHILE and FOR
    > that seem to be pointless, they're actually fine. The crucial
    > point is that the inner test is assumed to be guaranteed ---
    > i.e., eventually the "something" will be true, causing the WHILE
    > to exit. Well, today it didn't, and my main suspicion is a
    > possible race condition resulting from two perfectly valid I/Os.
    > Does anyone else see it?
    >
    > ok
    > dpm
    >




  6. Re: is this a race condition?

    On Aug 23, 12:09 am, Hein RMS van den Heuvel
    wrote:

    > In addition... in the code as posted, if status from the first qio is
    > ever false (bad channel, quota) then it will also loop as true remains
    > true.


    Actually that loop is supposed to be infinite --- the process is
    a daemon, running until halted via $ STOP /EXIT (thus the name of
    the subroutine, "FOREVER").

    But yes, the code certainly needs to be beefed up with regards to
    error reporting. As with most code that's been around for over
    a decade, some parts are good and others are horrible. Here's
    one of the latter:

    IF (.NOT. IOSB(1)) THEN
    TYPE *, 'READ TIMED OUT!'
    GOTO 100
    ENDIF

    Sigh.

    ok
    dpm


  7. Re: is this a race condition?

    On Aug 23, 3:44 am, "Walter Kuhn" wrote:
    > Using no event flag (= event flag 0) for QIO + QIOW is suspect.


    Absolutely. In fact, I use stronger words ;-)

    > Try with event flag EFN$C_ENF.


    Or, on older systems, LIB$GET_EF() and LIB$FREE_EF().

    ok
    dpm



  8. Re: is this a race condition?

    In article <1187830870.915352.211590@e9g2000prf.googlegroups.c om>, "dpm_google@myths.com" writes:
    > 23 years ago, fresh out of university, my first job was maintaining
    > FORTRAN on a VAX.
    > Last month I started a new job, and I am . . . maintaining FORTRAN on
    > a VAX. (sigh)
    > The operating-system part is nice, the language is not. Check this
    > out:
    >


    I see nothing inherintly wrong with the two I/Os as written,
    but the logic seems faulty. If the IO$_READVBLK fails to submit
    or fails to complete (STATUS is bad or IOSB(1) is bad) then the
    outer loop will never terminate. Both of these should be terminal
    conditions.

    I'd add an ELSE to both of those tests and call LIB$SIGNAL just
    as the program does if the IO$_WRITEVBLK fails.


  9. Re: is this a race condition?

    In article <1187834140.971153.295890@q4g2000prc.googlegroups.c om>, Bob Gezelter writes:
    > On Aug 22, 9:01 pm, "dpm_goo...@myths.com"
    > wrote:
    >> 23 years ago, fresh out of university, my first job was maintaining
    >> FORTRAN on a VAX.
    >> Last month I started a new job, and I am . . . maintaining FORTRAN on
    >> a VAX. (sigh)
    >> The operating-system part is nice, the language is not. Check this
    >> out:
    >>
    >> SUBROUTINE FOREVER
    >>
    >> IMPLICIT NONE
    >>
    >> INTEGER*4 I
    >> INTEGER*4 STATUS
    >> INTEGER*2 IOSB(4)
    >> INTEGER*2 READCHAN
    >> INTEGER*2 WRITECHAN
    >> LOGICAL*1 MORE
    >> BYTE READBUFFER(64)
    >> BYTE WRITEBUFFER(32)
    >>
    >> STATUS = SYS$CREMBX(,READCHAN,%VAL(64),%VAL(128),,,'MYMBX')
    >> IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    >> STATUS = SYS$ASSIGN('YOURMBX',WRITECHAN)
    >> IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    >>
    >> WHILE (.TRUE.)
    >> STATUS = SYS$QIOW(,%VAL(READCHAN),%VAL(IO$_READVBLK),IOSB,, ,
    >> %REF(READBUFFER),%VAL(64),,,,)
    >> IF (STATUS) THEN
    >> IF (IOSB(1)) THEN
    >> MORE = .TRUE.
    >> WHILE (MORE)
    >> FOR I=1,READBUFFER(4)
    >> IF (something) THEN MORE = .FALSE.
    >> END
    >> END
    >> STATUS = SYS$QIO(,%VAL(WRITECHAN),%VAL(IO$_WRITEVBLK .OR.
    >> IO$M_NOW),IOSB,,,
    >> %REF(WRITEBUFFER),%VAL(32),,,,)
    >> IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    >> ENDIF
    >> ENDIF
    >> END
    >>
    >> This particular bit of code went into an infinite loop today.
    >> I admit I've abstracted a lot of specifics but the gist remains,
    >> so please don't wonder about the nested WHILE and FOR
    >> that seem to be pointless, they're actually fine. The crucial
    >> point is that the inner test is assumed to be guaranteed ---
    >> i.e., eventually the "something" will be true, causing the WHILE
    >> to exit. Well, today it didn't, and my main suspicion is a
    >> possible race condition resulting from two perfectly valid I/Os.
    >> Does anyone else see it?
    >>
    >> ok
    >> dpm

    >
    > dpm,
    >
    > I would not consider the language an issue. The features that attract
    > my eye on first glance would be essentially the same in any language
    > (down to and including MACRO-32).
    >
    > In addition to the QIO/QIOW problem cited earlier in this thread, I
    > would be concerned about the use of a default event flag. In


    It was my impression that $QIOW is not internally implemented as
    a $QIO followed by a $WAITFR. It's implemented as $QIO followed
    by $SYNCH. So the use of the default event flag is safe (albeit
    not necessarily desireable).

    Wait a bit... Not only are we re-using the event flag. We're re-using
    the IOSB. If that second $QIO completes asynchronously for any reason
    (not high probability given IO$M_NOW) then the $SYNCH on the
    next read will complete spuriously.

    How does IO$M_NOW interact with a mailbox full condition?

  10. Re: is this a race condition?

    Bob Koehler wrote:

    > In article <1187830870.915352.211590@e9g2000prf.googlegroups.c om>,
    > "dpm_google@myths.com" writes:
    > > 23 years ago, fresh out of university, my first job was maintaining
    > > FORTRAN on a VAX.
    > > Last month I started a new job, and I am . . . maintaining FORTRAN
    > > on a VAX. (sigh)
    > > The operating-system part is nice, the language is not. Check this
    > > out:
    > >

    >
    > I see nothing inherintly wrong with the two I/Os as written,
    > but the logic seems faulty. If the IO$_READVBLK fails to submit
    > or fails to complete (STATUS is bad or IOSB(1) is bad) then the
    > outer loop will never terminate. Both of these should be terminal
    > conditions.
    >
    > I'd add an ELSE to both of those tests and call LIB$SIGNAL just
    > as the program does if the IO$_WRITEVBLK fails.


    Nor was there a check on the IOSB length word to ensure the
    expected/sufficient data was read.

    --
    Cheers - Dave

  11. Re: is this a race condition?

    "dpm_google@myths.com" wrote:
    >
    > 23 years ago, fresh out of university, my first job was maintaining
    > FORTRAN on a VAX. Last month I started a new job, and I am . . .
    > maintaining FORTRAN on a VAX. (sigh)
    >
    > The operating-system part is nice, the language is not. Check this
    > out:
    >
    > SUBROUTINE FOREVER
    >
    > IMPLICIT NONE
    >
    > INTEGER*4 I
    > INTEGER*4 STATUS
    > INTEGER*2 IOSB(4)
    > INTEGER*2 READCHAN
    > INTEGER*2 WRITECHAN
    > LOGICAL*1 MORE
    > BYTE READBUFFER(64)
    > BYTE WRITEBUFFER(32)
    >
    > STATUS = SYS$CREMBX(,READCHAN,%VAL(64),%VAL(128),,,'MYMBX')
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    > STATUS = SYS$ASSIGN('YOURMBX',WRITECHAN)
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    >
    > WHILE (.TRUE.)
    > STATUS = SYS$QIOW(,%VAL(READCHAN),%VAL(IO$_READVBLK),IOSB,, ,
    > %REF(READBUFFER),%VAL(64),,,,)
    > IF (STATUS) THEN
    > IF (IOSB(1)) THEN
    > MORE = .TRUE.
    > WHILE (MORE)
    > FOR I=1,READBUFFER(4)
    > IF (something) THEN MORE = .FALSE.
    > END
    > END
    > STATUS = SYS$QIO(,%VAL(WRITECHAN),%VAL(IO$_WRITEVBLK .OR.
    > IO$M_NOW),IOSB,,,
    > %REF(WRITEBUFFER),%VAL(32),,,,)
    > IF (.NOT. STATUS) THEN LIB$SIGNAL(%VAL(STATUS), %VAL(0))
    > ENDIF
    > ENDIF
    > END


    This is curious code in that the second QIO does not wait for
    completion.

    I wonder if that was intentional. A SYS$SYNCH call before the
    _first_ QIO is entered is in order if there is some reason for
    asynchronoous completion [make sure everything is correct for
    the first time through of course], since as others have pointed
    out the IOSB is shared.

    Beware of Fortran optimization, by the way. I don't have a VAX handy
    to try it out, but you will find that Fortran is entitled, unless
    you specify otherwise, to assume that no entity external to the
    current routine will be modifying variables, or at least locally
    defined variables, and will make no particular exception for I/O
    calls.

    The following bit of code, on Alpha at least,

    ! VCHECK.FOR GFC 20070829 Check volatility of ordinary variables
    integer*8 iosb/0/
    call iostrt(iosb)
    100 if ( iosb.eq.0 ) then
    goto 100
    endif
    end

    will go into an infinite loop waiting for the IOSB variable to change
    if the called routine issues an asynchronous I/O call that has not
    completed by the time the loop at 100 is entered.

    Here's the pertinent machine code:

    0000 VCHECK$MAIN::
    ; 000003
    23DEFFE0 0000 LDA SP, -32(SP)
    221B0028 0004 LDA R16, 40(R27)
    B75E0008 0008 STQ R26, 8(SP)
    47E03419 000C MOV 1, R25
    A75B0030 0010 LDQ R26, 48(R27)
    B77E0000 0014 STQ R27, (SP)
    B45E0010 0018 STQ R2, 16(SP)
    B7BE0018 001C STQ FP, 24(SP)
    47FE041D 0020 MOV SP, FP
    47FB0402 0024 MOV R27, R2
    A7620038 0028 LDQ R27, 56(R2)
    6B5A4000 002C JSR R26, DFOR$SET_REENTRANCY
    ; R26, R26
    A7420040 0030 LDQ R26, 64(R2)
    A6020020 0034 LDQ R16, 32(R2)
    A7620048 0038 LDQ R27, 72(R2)
    47E03419 003C MOV 1, R25
    6B5A4000 0040 JSR R26, IOSTRT
    ; R26, R26
    A4020020 0044 LDQ R0, 32(R2)
    ; 000004
    A4000000 0048 LDQ R0, IOSB
    ; R0, (R0)
    43E003A0 004C CMPULT R31, R0, R0
    0050 .100:
    E41FFFFF 0050 BEQ R0, .100

    Again, on VAX, or with appropriate qualifiers at compile time, this
    may not be a problem, but be aware that the VOLATILE declaration exists
    for a reason in current compilers.

    --
    George Cornelius cornelius at mayo.edu
    cornelius at eisner.decus.org

  12. Re: is this a race condition?

    In article , koehler@eisner.nospam.encompasserve.org (Bob Koehler) writes:
    > In article <1187830870.915352.211590@e9g2000prf.googlegroups.c om>, "dpm_google@myths.com" writes:
    >> 23 years ago, fresh out of university, my first job was maintaining
    >> FORTRAN on a VAX. Last month I started a new job, and I am maintaining
    >> FORTRAN on a VAX (sigh). The operating-system part is nice, the language
    >> is not. Check this out:

    [...]
    >
    > I see nothing inherently wrong with the two I/Os as written,
    > but the logic seems faulty. If the IO$_READVBLK fails to submit
    > or fails to complete (STATUS is bad or IOSB(1) is bad) then the
    > outer loop will never terminate. Both of these should be terminal
    > conditions.
    >
    > I'd add an ELSE to both of those tests and call LIB$SIGNAL just
    > as the program does if the IO$_WRITEVBLK fails.


    Faulty logic seems to be the crux of the matter. That plus the sharing
    of the IOSB and the use of asynchronous completion in a QIO where it
    may not even have been intended.

    I should mention that I posted a comment about Fortran optimization and
    asynchronous I/O from another Usenet account, and while it did seem to
    show up on some news servers it is not present on my usual server, so
    I'll state this once more: asynchronous I/O generally involves passing
    addresses to subroutines that then return while retaining that information
    in order to later modify the associated data. Standard compiler behavior
    is to assume no asynchronous modification of local variables will occur.

    Current compilers include a VOLATILE directive to address this, just as
    in other languages such as C. You either need to use something of that
    sort, or you need a command line switch, perhaps /NOOPTIMIZE, if you want
    to properly deal with asynchronous I/O completion in Fortran [not to
    mention the usual reentrancy considerations if you choose to field AST's].

    If you don't take these steps you're likely to be rather frustrated by
    your program's inability to behave in a predictable / logical manner.

    --
    George Cornelius cornelius(at)eisner.decus.org
    cornelius(at)mayo.edu


+ Reply to Thread