Signal dispositions - Unix

This is a discussion on Signal dispositions - Unix ; Leet Jon wrote in news:slrnfiof1f.pft.nospam@nospam.com: > > Perhaps you are unaware that some C code is run in > safety-critical environments - having a program that dumps > core at the drop of a hat rather than carrying on running ...

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 21 to 40 of 41

Thread: Signal dispositions

  1. Re: Signal dispositions

    Leet Jon wrote in
    news:slrnfiof1f.pft.nospam@nospam.com:

    >
    > Perhaps you are unaware that some C code is run in
    > safety-critical environments - having a program that dumps
    > core at the drop of a hat rather than carrying on running
    > could literally be the difference between life and death. OK,
    > so *maybe* the error condition causing the SIGSEGV will
    > propagate and bring the program down later, but taking that
    > chance is a better option than immediately failing.
    >


    If that is the perspective you are coming from, then I think
    your intentions are good but either your thinking is misguided
    or you are being too vague about what you mean by "carrying on".

    As I pointed out in another post, an application running in a
    POSIX-/SUS-conforming environment cannot just ignore a SIGSEGV
    or just catch it and resume what it was doing. (Perhaps it
    could, but only if the specific environment makes additional
    guarantees that this will work. If that is the case, then please
    mention what environment we are dealing with.)

    So, at best, what the application can do is catch the signal and
    perhaps use a longjmp or similar mechanism to cause the
    application to reinitialize itself. Personally, I would not
    trust that for reasons that have already been mentioned many
    times in this thread: I would not trust anything about the state
    of the application after catching a SIGSEGV or similar signal.

    The pattern that I have seen (and used) most frequently is
    to have an external "monitor" process that watches over all
    the processes in the application and restarts any that exit
    unexpectedly. Even that is not a 100% guarantee of success,
    especially if there is any persistent or shared state that might
    have been corrupted leading up to the SIGSEGV.

    But that is only one (very small) aspect of properly designing
    a safety-critical system.

    MV

    P.S. I agree that there are "recoverable" conditions from which
    it is better for an application running in a "high
    availability" context to try and carry on rather than giving up
    an exiting. It's just that SIGSEGV is generally not one of
    those.

    --
    I do not want replies; please follow-up to the group.

  2. Re: Signal dispositions

    Leet Jon wrote:
    > On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>> I can understand that for debugging purposes you might want to have
    >>> SIGSEGV etc. generate a core file, but in production code the default
    >>> should be for these signals to be ignored.

    >> In production code, those signals should never be generated. If they
    >> are, they should crash, so that the user can complain, and someone can
    >> fix it.

    >
    > Perhaps you are unaware that some C code is run in safety-critical
    > environments - having a program that dumps core at the drop of a hat
    > rather than carrying on running could literally be the difference
    > between life and death. OK, so *maybe* the error condition causing the
    > SIGSEGV will propagate and bring the program down later, but taking that
    > chance is a better option than immediately failing.
    >
    > ~Jon~
    >


    The error handler which you must provide will bring the controls it is doing
    into a state that is normal or safe that won't cause death, ring a klaxon and
    turn on a warning light and then crash. It should arrange to make itself not
    restartable until it has been pulled from the working environment and placed on
    a test bench! Anything else would be criminal.

    If that means that the anti-lock brake system light stays on with the check
    vehicle light flashing and you are operating on analog backup only then that is
    what it means! (To put this into context)


    You might obtain a copy of National Bureau of Standards (NBS) Computer Science
    and Technology series, Special Publication 500-75, February 1981 "Validation,
    Verification, and Testing of Computer Software" by W. Richards Adrion, Martha A.
    Branstad, John C. Cherniavsky. Library of Congress Card Number 80-600199. I'm
    sure other publications have followed this, but you will get a sense of what the
    responsibility of the programmer is to design a test suite to prove the program
    works as expected under all conditions expected and unexpected.


  3. Re: Signal dispositions

    On Fri, 2 Nov 2007 21:16:31 +0100 (CET), Leet Jon wrote:
    >On 2 Nov 2007 at 19:10, Keith Thompson wrote:
    >> However, letting a program continue running by default after a
    >> catastrophic data-corrupting failure would not be a good idea. If a
    >> program dies immediately after "an invalid access to storage" (which
    >> is all the C standard says about SIGSEGV), then you have a good
    >> chance of diagnosing and correcting the problem before putting the
    >> code into production. If the error is ignored, the program will very
    >> likely continue to corrupt your data in subtle ways; tracking it down
    >> and fixing it is going to be difficult if the error occurs at a
    >> customer site, or even during an important demo.

    >
    > I believe you are completely wrong on this point. Very often a SIGSEGV
    > will be caused by (say) a single bad array access - the consequences
    > will be highly localized, and carrying on with the program will not
    > cause any significant problems.


    So the program may 'think' it has saved the important dataset from a
    medical patient's important test, but the data has disappeared because
    it was written ... well, nowhere in particular.

    Do you *really* want this program to go on?


  4. Re: Signal dispositions

    Leet Jon wrote:
    > On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>> I can understand that for debugging purposes you might want to have
    >>> SIGSEGV etc. generate a core file, but in production code the default
    >>> should be for these signals to be ignored.

    >> In production code, those signals should never be generated. If they
    >> are, they should crash, so that the user can complain, and someone can
    >> fix it.

    >
    > Perhaps you are unaware that some C code is run in safety-critical
    > environments - having a program that dumps core at the drop of a hat
    > rather than carrying on running could literally be the difference
    > between life and death.


    If a SIGSEGV can be the difference between life and death, then such
    code has *no* *right* to ever *cause* a SIGSEGV, regardless of how the
    system is going to respond to the SIGSEGV (ignoring it and letting
    the program continue, or aborting it).

    There are several solutions that could proper here:
    (1) Keep the code simple enough that you can use mathematics to
    prove it correct. This has been been done successfully with
    some designs. It's not easy, but then we're talking about a
    life or death situation here.
    (2) Exhaustively test the code. Sometimes this is not possible
    due to exponential explosion of test cases, but sometimes
    it actually is.
    (3) Nearly-exhaustively test the code. Maybe testing every possible
    program path isn't possible, but very thorough test coverage
    (not just of lines of code, but of "interesting" combination
    of inputs) is possible. That might be acceptable if combined
    with other quality efforts.
    (4) Use a system where, on a *local* basis, *individual* faults can
    be determined to be harmless and the program can proceed.
    Notice that this is not the same thing as ignoring SIGSEGV
    for the entire program and assuming all invalid memory
    accesses are OK. Instead, what I'm talking about is a
    system where you can say "if THIS block of code goes
    outside the bounds of THAT array, then THAT ONE THING
    should not be a fatal error, and here is the routine that
    will do the error handling and keep the system in a known
    good state".

    Of course, it's silly to be having a discussion about safety-critical
    software in comp.unix.programmer. Maybe there's one that I don't know
    about, but as far as I know, there isn't a version of Unix that is
    meant to be used in an environment like that. In fact, where I've
    checked, license agreements often specifically exclude the use of the
    software in such an environment. And for good reason: a system that
    can get somebody killed needs to use software that's simpler that Unix.

    - Logan

  5. Re: Signal dispositions

    Logan Shaw wrote:



    You might consider dropping c.l.c. from the cross-post and perhaps
    replace it with comp.programming and set followups to the same.


  6. Re: Signal dispositions

    Leet Jon wrote:
    > On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>> I can understand that for debugging purposes you might want to have
    >>> SIGSEGV etc. generate a core file, but in production code the default
    >>> should be for these signals to be ignored.

    >> In production code, those signals should never be generated. If they
    >> are, they should crash, so that the user can complain, and someone can
    >> fix it.

    >
    > Perhaps you are unaware that some C code is run in safety-critical
    > environments - having a program that dumps core at the drop of a hat
    > rather than carrying on running could literally be the difference
    > between life and death. OK, so *maybe* the error condition causing the
    > SIGSEGV will propagate and bring the program down later, but taking that
    > chance is a better option than immediately failing.
    >
    > ~Jon~
    >


    Continue after error
    http://www.netcomp.monash.edu.au/cpe...~tgallagh.html

    yeah right.


  7. Re: Signal dispositions

    Leet Jon wrote:
    >

    .... snip ...
    >
    > Perhaps you are unaware that some C code is run in safety-critical
    > environments - having a program that dumps core at the drop of a
    > hat rather than carrying on running could literally be the
    > difference between life and death. OK, so *maybe* the error
    > condition causing the SIGSEGV will propagate and bring the program
    > down later, but taking that chance is a better option than
    > immediately failing.


    Apparently you are unaware that such a program has absolutely no
    business running in such a 'safety critical environment'.

    --
    Chuck F (cbfalconer at maineline dot net)

    Try the download section.



    --
    Posted via a free Usenet account from http://www.teranews.com


  8. Re: Signal dispositions

    Rainer Weikusat writes:
    >
    >That this signals cannot 'just be ignored' by the program. All of them
    >are essentially hardware exceptions which occur because the program
    >attempted to do an "illegal" operation, like accessing memory which
    >does not exist.
    >
    >Since programs never intentionally do something like this (except for


    While I do agree with you regarding the signals, I'll point out that
    the original bourne shell used SIGSEGV to extend its heap. Not that
    I would advocate such a thing myself....

    scott

  9. Re: Signal dispositions

    scott@slp53.sl.home (Scott Lurndal) writes:
    > Rainer Weikusat writes:
    >>That this signals cannot 'just be ignored' by the program. All of them
    >>are essentially hardware exceptions which occur because the program
    >>attempted to do an "illegal" operation, like accessing memory which
    >>does not exist.
    >>
    >>Since programs never intentionally do something like this (except for

    >
    > While I do agree with you regarding the signals, I'll point out that
    > the original bourne shell used SIGSEGV to extend its heap.


    IMHO it would unwise to mention actual uses of SIGSEGV-handlers in the
    context of a thread where people hold the opinion that terminating a
    process because of invalid memory accesses should be avoided for the
    benefit of 'safety critical systems' :->.

  10. Re: Signal dispositions

    Golden California Girls writes:
    > Leet Jon wrote:
    >> On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>>> I can understand that for debugging purposes you might want to have
    >>>> SIGSEGV etc. generate a core file, but in production code the default
    >>>> should be for these signals to be ignored.
    >>> In production code, those signals should never be generated. If they
    >>> are, they should crash, so that the user can complain, and someone can
    >>> fix it.

    >>
    >> Perhaps you are unaware that some C code is run in safety-critical
    >> environments - having a program that dumps core at the drop of a hat
    >> rather than carrying on running could literally be the difference
    >> between life and death. OK, so *maybe* the error condition causing the
    >> SIGSEGV will propagate and bring the program down later, but taking that
    >> chance is a better option than immediately failing.
    >>
    >> ~Jon~
    >>

    >
    > Continue after error
    > http://www.netcomp.monash.edu.au/cpe...~tgallagh.html
    >
    > yeah right.


    This omits an important part of the real horror:

    The software for the Therac-25 was put together by cannibalizing the
    control software for two earlier models (PDP-11 assembly) by someone
    who was not the original code author while the company chose to remove
    most of the safety-protection hardware of those earlier models at the
    same time.

    It was decided to put the blame for the following disaster mainly onto
    the shoulders of the person who originally wrote the Therac-9 and
    Therac-6 code.



    Or in other words: After having accomplished an undertaking whose
    attendant circumstances were so completely insane that one would expect
    anyone having a clue wrt assembly programming to run away screaming
    and harvesting the to-be-expected results, it was decreed that the
    people who came up with this infernal procedure were apparently
    somewhat unlucky when selection a suitable demonstration object.

  11. Re: Signal dispositions

    On Sat, 03 Nov 2007 04:14:45 GMT, almond@brothers.orgy (Almond) wrote:

    >Send any feedback, ideas, suggestions, test results to
    >

    Here's some feedback: Your advertising, release notes, and privacy
    policy are inappropriate here, even in a sig block.

    Limit your signature to three or four lines, which is plenty of space
    to include your URL.

    --
    Al Balmer
    Sun City, AZ

  12. Re: Signal dispositions

    On Sat, 3 Nov 2007 10:15:52 +0100 (CET), Leet Jon
    wrote:

    >On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>>I can understand that for debugging purposes you might want to have
    >>>SIGSEGV etc. generate a core file, but in production code the default
    >>>should be for these signals to be ignored.

    >>
    >> In production code, those signals should never be generated. If they
    >> are, they should crash, so that the user can complain, and someone can
    >> fix it.

    >
    >Perhaps you are unaware that some C code is run in safety-critical
    >environments


    That's pretty funny, considering I wrote safety-critical code for the
    process control industry for over twenty years. Food, petroleum,
    polymers, paper, you name it.

    If the coolant control program on a PVC reactor crashes, you don't
    ignore it and keep cooking. You kill not only the program, but the
    process. Otherwise, you kill people.

    >- having a program that dumps core at the drop of a hat
    >rather than carrying on running could literally be the difference
    >between life and death. OK, so *maybe* the error condition causing the
    >SIGSEGV will propagate and bring the program down later, but taking that
    >chance is a better option than immediately failing.
    >

    Don't bother applying for a job here. We don't insist that all
    new-hires be expert, but we do want them to be trainable.

    --
    Al Balmer
    Sun City, AZ

  13. Re: Signal dispositions

    Leet Jon writes:
    > On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>>I can understand that for debugging purposes you might want to have
    >>>SIGSEGV etc. generate a core file, but in production code the default
    >>>should be for these signals to be ignored.

    >>
    >> In production code, those signals should never be generated. If they
    >> are, they should crash, so that the user can complain, and someone can
    >> fix it.

    >
    > Perhaps you are unaware that some C code is run in safety-critical
    > environments - having a program that dumps core at the drop of a hat


    This opens two important questions:

    - If the person responsible for the segfaulting code knows
    what the problem is, why doesn't he just fix it to avoid
    segfaulting?

    - Assuming the person does not know what the problem is, how
    can the same person possibly know that its eventual
    consequences will be harmless?

  14. Re: Signal dispositions

    ["Followup-To:" header set to comp.unix.programmer.]
    On 2007-11-03, Leet Jon wrote:
    > On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>>I can understand that for debugging purposes you might want to have
    >>>SIGSEGV etc. generate a core file, but in production code the default
    >>>should be for these signals to be ignored.

    >>
    >> In production code, those signals should never be generated. If they
    >> are, they should crash, so that the user can complain, and someone can
    >> fix it.

    >
    > Perhaps you are unaware that some C code is run in safety-critical
    > environments - having a program that dumps core at the drop of a hat
    > rather than carrying on running could literally be the difference
    > between life and death. OK, so *maybe* the error condition causing the


    Allowing a process with corrupted data to continue running can also
    cause death.

    > SIGSEGV will propagate and bring the program down later, but taking that
    > chance is a better option than immediately failing.
    >
    > ~Jon~
    >



    --


  15. Re: Signal dispositions

    On 3 Nov, 09:15, Leet Jon wrote:
    > On 2 Nov 2007 at 20:34, Al Balmer wrote:


    > >>I can understand that for debugging purposes you might want to have
    > >>SIGSEGV etc. generate a core file, but in production code the default
    > >>should be for these signals to be ignored.

    >
    > > In production code, those signals should never be generated. If they
    > > are, they should crash, so that the user can complain, and someone can
    > > fix it.

    >
    > Perhaps you are unaware that some C code is run in safety-critical
    > environments


    OHMYGOD

    *please* tell me you don't write safety critical code!



    > - having a program that dumps core at the drop of a hat
    > rather than carrying on running could literally be the difference
    > between life and death. OK, so *maybe* the error condition causing the
    > SIGSEGV will propagate and bring the program down later, but taking that
    > chance is a better option than immediately failing.




    --
    Nick Keighley


  16. Re: Signal dispositions

    Jim Cochrane wrote On 11/06/07 04:33,:
    > ["Followup-To:" header set to comp.unix.programmer.]
    > On 2007-11-03, Leet Jon wrote:
    >
    >>On 2 Nov 2007 at 20:34, Al Balmer wrote:
    >>
    >>>>I can understand that for debugging purposes you might want to have
    >>>>SIGSEGV etc. generate a core file, but in production code the default
    >>>>should be for these signals to be ignored.
    >>>
    >>>In production code, those signals should never be generated. If they
    >>>are, they should crash, so that the user can complain, and someone can
    >>>fix it.

    >>
    >>Perhaps you are unaware that some C code is run in safety-critical
    >>environments - having a program that dumps core at the drop of a hat
    >>rather than carrying on running could literally be the difference
    >>between life and death. OK, so *maybe* the error condition causing the

    >
    >
    > Allowing a process with corrupted data to continue running can also
    > cause death.


    The phrase that came instantly to my mind is
    "Controlled flight into terrain."

    --
    Eric.Sosman@sun.com


  17. Re: Signal dispositions

    Nick Keighley wrote:
    > On 3 Nov, 09:15, Leet Jon wrote:
    >> On 2 Nov 2007 at 20:34, Al Balmer wrote:

    >
    >>>> I can understand that for debugging purposes you might want to have
    >>>> SIGSEGV etc. generate a core file, but in production code the default
    >>>> should be for these signals to be ignored.
    >>> In production code, those signals should never be generated. If they
    >>> are, they should crash, so that the user can complain, and someone can
    >>> fix it.

    >> Perhaps you are unaware that some C code is run in safety-critical
    >> environments

    >
    > OHMYGOD
    >
    > *please* tell me you don't write safety critical code!
    >
    >
    >
    >> - having a program that dumps core at the drop of a hat
    >> rather than carrying on running could literally be the difference
    >> between life and death. OK, so *maybe* the error condition causing the
    >> SIGSEGV will propagate and bring the program down later, but taking that
    >> chance is a better option than immediately failing.

    >
    >
    >
    > --
    > Nick Keighley
    >



    I suspect he did, until his boss found out what he was planning and canned his
    ass. Now he's looking to prove his boss wrong. That of he's a troll.


  18. Re: Signal dispositions

    "Al Balmer" a écrit dans le message de news:
    r62ni39mk469csqb43n83qu3c6jfqvgfe7@4ax.com...
    > On Fri, 2 Nov 2007 21:16:31 +0100 (CET), Leet Jon
    > wrote:
    >
    >>On 2 Nov 2007 at 19:10, Keith Thompson wrote:
    >>> However, letting a program continue running by default after a
    >>> catastrophic data-corrupting failure would not be a good idea. If a
    >>> program dies immediately after "an invalid access to storage" (which
    >>> is all the C standard says about SIGSEGV), then you have a good chance
    >>> of diagnosing and correcting the problem before putting the code into
    >>> production. If the error is ignored, the program will very likely
    >>> continue to corrupt your data in subtle ways; tracking it down and
    >>> fixing it is going to be difficult if the error occurs at a customer
    >>> site, or even during an important demo.

    >>
    >>I believe you are completely wrong on this point. Very often a SIGSEGV
    >>will be caused by (say) a single bad array access - the consequences
    >>will be highly localized, and carrying on with the program will not
    >>cause any significant problems.

    >
    > How on earth would you know what the consequences might be? If the
    > program in question is calculating my paycheck, I don't want any bad
    > array access to be ignored.


    Someone else might want to check first if the error is worth such a drastic
    treatment.

    With your suggested behaviour, the paycheck is not printed, and who knows
    when the problem will be fixed... If you can wait for your paycheck, you'll
    be OK, else too bad.

    Alternately, let it print the damn check, there is a good chance the check
    will be correct and arrice in time. There is some possibility that the
    error is so small as to not be worth reporting. If the error is large, the
    you can complain and have it fixed... Or you will not complain and wait for
    the bank to figure where these millions came from ;-)

    If you are the payer, you probably want the process to stop. If you are the
    payee, it is not so obvious.

    > What kind of programs do you write? Games?
    >>
    >>Who wants their customer to run their program and have it just crash
    >>with a segfault? That hardly comes across as professional.

    >
    > What's not professional is writing code that causes segfaults.
    >
    >> Better to try
    >>your best to carry on and weather the storm than to just dump the user
    >>with a crash.
    >>
    >>I can understand that for debugging purposes you might want to have
    >>SIGSEGV etc. generate a core file, but in production code the default
    >>should be for these signals to be ignored.

    >
    > In production code, those signals should never be generated. If they
    > are, they should crash, so that the user can complain, and someone can
    > fix it.


    If they are, they should be logged and reported yet best efforts should be
    extended to minimize the impact on the user. Warning the user of potential
    malfunction, requesting urgent attention may be more appropriate than a core
    dump with no warning and no restart. Use common sense to determine what be
    least impact the user. When the oil gauge trips, the dashboard turns a
    light on, it does not immediately block the engine, fire the ejector seats
    and vaporize the contents of the trunk.

    --
    Chqrlie.



  19. Re: Signal dispositions

    On Wed, 7 Nov 2007 15:38:28 +0100, "Charlie Gordon"
    wrote:

    >"Al Balmer" a écrit dans le message de news:
    >r62ni39mk469csqb43n83qu3c6jfqvgfe7@4ax.com...
    >> On Fri, 2 Nov 2007 21:16:31 +0100 (CET), Leet Jon
    >> wrote:
    >>
    >>>On 2 Nov 2007 at 19:10, Keith Thompson wrote:
    >>>> However, letting a program continue running by default after a
    >>>> catastrophic data-corrupting failure would not be a good idea. If a
    >>>> program dies immediately after "an invalid access to storage" (which
    >>>> is all the C standard says about SIGSEGV), then you have a good chance
    >>>> of diagnosing and correcting the problem before putting the code into
    >>>> production. If the error is ignored, the program will very likely
    >>>> continue to corrupt your data in subtle ways; tracking it down and
    >>>> fixing it is going to be difficult if the error occurs at a customer
    >>>> site, or even during an important demo.
    >>>
    >>>I believe you are completely wrong on this point. Very often a SIGSEGV
    >>>will be caused by (say) a single bad array access - the consequences
    >>>will be highly localized, and carrying on with the program will not
    >>>cause any significant problems.

    >>
    >> How on earth would you know what the consequences might be? If the
    >> program in question is calculating my paycheck, I don't want any bad
    >> array access to be ignored.

    >
    >Someone else might want to check first if the error is worth such a drastic
    >treatment.
    >
    >With your suggested behaviour, the paycheck is not printed, and who knows
    >when the problem will be fixed... If you can wait for your paycheck, you'll
    >be OK, else too bad.


    And if the error is in a control process that blows up a reactor and
    kills a few people? How do you correct that mistake?

    I think your point is that the problem analysis should take account of
    the consequences of an error - that's obvious. Basic systems
    engineering. I'm not advocating that the only possible way to treat a
    segfault is to stop the program, though in a properly designed control
    system, it's usually the best way.
    >
    >Alternately, let it print the damn check, there is a good chance the check
    >will be correct and arrice in time. There is some possibility that the
    >error is so small as to not be worth reporting. If the error is large, the
    >you can complain and have it fixed... Or you will not complain and wait for
    >the bank to figure where these millions came from ;-)


    All of which will cause more problems, eventually, both to the payer
    and the payee. If the system stops, it *will* get fixed. People in
    data processing take payroll runs *very* seriously. Did you imagine
    that they would just not pay anybody else, and hope for a better run
    next week?
    >
    >If you are the payer, you probably want the process to stop. If you are the
    >payee, it is not so obvious.
    >
    >> What kind of programs do you write? Games?
    >>>
    >>>Who wants their customer to run their program and have it just crash
    >>>with a segfault? That hardly comes across as professional.

    >>
    >> What's not professional is writing code that causes segfaults.
    >>
    >>> Better to try
    >>>your best to carry on and weather the storm than to just dump the user
    >>>with a crash.
    >>>
    >>>I can understand that for debugging purposes you might want to have
    >>>SIGSEGV etc. generate a core file, but in production code the default
    >>>should be for these signals to be ignored.

    >>
    >> In production code, those signals should never be generated. If they
    >> are, they should crash, so that the user can complain, and someone can
    >> fix it.

    >
    >If they are, they should be logged and reported yet best efforts should be
    >extended to minimize the impact on the user. Warning the user of potential
    >malfunction, requesting urgent attention may be more appropriate than a core
    >dump with no warning and no restart.


    How do you warn of a segfault before it happens?

    > Use common sense to determine what be
    >least impact the user. When the oil gauge trips, the dashboard turns a
    >light on, it does not immediately block the engine, fire the ejector seats
    >and vaporize the contents of the trunk.


    Not "common sense." Systems analysis.

    --
    Al Balmer
    Sun City, AZ

  20. Re: Signal dispositions

    James Kuyper writes:
    > Charlie Gordon wrote:


    [...]

    >> Alternately, let it print the damn check, there is a good chance the
    >> check will be correct and arrice in time. There is some possibility
    >> that the error is so small as to not be worth reporting. If the
    >> error is large, the you can complain and have it fixed... Or you
    >> will not complain and wait for the bank to figure where these
    >> millions came from ;-)

    >
    > Everything about that paragraph is wrong. The chances are not good
    > that the paycheck will be correct and arrive on time. There's a large
    > probability that the error will be a big one. There is no error so
    > small that it's not worth reporting; tax auditors tend to get very
    > concerned about even small errors, because they think they might be a
    > signs of something more serious (and they are right to think that). If
    > the error is large, fixing it can be very expensive for the payer, and
    > a lot of hassle for the payee.


    I think this way of discussing the issue subtly misses the point. A
    segmentation fault occurs if the MMU has been asked to translate an
    address for which it doesn't have a valid translation or if it has
    detected a memory access running foul of the memory access permissions
    for the address that should have been accessed. A CPU exception will
    be raised because of this and the fault handler in the kernel takes
    control. The information available to this handler will usually be the
    reason of the fault, the type of intended access and the faulting
    address. So, what to do now? Unless something in the MMU setup is
    changed, the faulting instruction cannot be restarted, because this
    would just cause it to fault again, turning the process that contained
    it into a really expensive CPU hog. Since the kernel has no
    information regarding what the purpose of the access was supposed to
    be, let alone any information regarding what the process causing the
    fault is trying to accomplish how, it cannot possibly decide what a
    sensible 'other restart point' could be. Assuming the system allows
    for SIGSEGV to be handled. If this is so, running the handler and
    restarting the instruction afterwards can be tried, guarding against
    an infinite series of faults produced this way. Otherwise, the only
    available option is to terminate the process.

    The reason the MMU is programmed this way is because it is its purpose
    to ensure that different processes are isolated from each other,
    except insofar the processes themselves arrange otherwise. Because a
    process must not be able to arbitrarily write to the memory of another
    process, the MMU must cause a trap if it attempts to access something
    in a way it is not supposed to access it and the kernel must then
    terminate the 'offender' for the reasons given in the previous
    paragraph. If the access is an intended one, the hardware must be
    informed of this explicitly, because it cannot possibly 'just know
    it'.

    Taking this requirement into account, the assumption that an access
    the information available to the hardware causes to be flagged as
    'invalid' is actually unintendend and caused by a programming error is
    a sensible one. And even if the hardware had a way of knowing that the
    consequences of the programming error will be harmless, and it hasn't,
    it still cannot decide on what to do with the faulting process
    without its explicit cooperation.

    [...]

    >>> In production code, those signals should never be generated. If they
    >>> are, they should crash, so that the user can complain, and someone can
    >>> fix it.

    >> If they are, they should be logged and reported yet best efforts
    >> should be extended to minimize the impact on the user. Warning the
    >> user of potential malfunction, requesting urgent attention may be
    >> more appropriate than a core dump with no warning and no restart.

    >
    > The core dump IS your warning, and restart should NOT be attempted
    > until the problem has been resolved, otherwise you could easily add to
    > the damage created by the first run of the program.


    For a rarely triggered bug, restarting the offending process is a
    common choice. But this can again only be accomplished by 'other
    userspace software'.

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast