refclock use causes core dump of ntpd - NTP

This is a discussion on refclock use causes core dump of ntpd - NTP ; Personally, I think it is Wrong for a refclock to call abort(), but I reserve the right to change my mind if I get my way and implement "design by contract" throughout the codebase. The problem is that there is ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 30 of 30

Thread: refclock use causes core dump of ntpd

  1. Re: refclock use causes core dump of ntpd

    Personally, I think it is Wrong for a refclock to call abort(), but I
    reserve the right to change my mind if I get my way and implement "design by
    contract" throughout the codebase.

    The problem is that there is no maintainer for that refclock and I do not
    want to make things worse (and I do not want to increase my workload).

    H

  2. Re: refclock use causes core dump of ntpd

    Roger,

    The way to run ntpd under gdb is with the -d flag given to ntpd so it will
    not fork.

    H

  3. Re: refclock use causes core dump of ntpd

    Roger,

    'info gdb' could be your friend, or I suspect there is either a .ps or a
    ..pdf of the gdb documentation in the gdb distrubution. The online help is
    also pretty good.

    H

  4. Re: refclock use causes core dump of ntpd

    "wa6zvp" wrote:

    > ** Yes. I remembered correctly. up-type = t_unknown, up->state =
    > s_Base, and event = e_Poll.
    > Very bizarre. See below.


    That is odd; as I said before the first thing that should happen
    is for it to get e_Init from true_start() and go into s_InqGOES.

    > * Despite the fact that doevent is entered with valid params for type
    > and state,
    > that is, t_unknown and t_Base, it still gets to the abort statement at
    > the end
    > of the function whenever its called with event = e_Poll.
    >
    > Nowhere in this function is this event handled or used. Not even
    > mentioned.


    Sure it is, line 629 in "case t_goes:".

    It struck me last night (duh) that this unexpected code flow could be
    an effect of optimisation: if gcc recognises abort() and knows that it
    will not return, it could convert all the earlier calls to it into
    a jump to this one. I'm assuming ntpd is being compiled with gcc -O2
    or similar, so if you want to persist you could try taking that out.

    > So, I backed out my hack from last night, and now have
    > simply commented out the one and only line that calls
    > doevent with event=e_Poll. Line 540 in function true_receive.


    And your clock works with that?

    The e_Poll presumably needs to be there for t_goes, so the proper
    solution to this would seem to be to have e_Poll ignored where not
    expected. It only seems to be t_unknown where that doesn't happen
    currently, so a simple fix like this might do the job:

    --- refclock_true.c.orig Wed Feb 25 05:58:17 2004
    +++ refclock_true.c Fri Feb 23 11:18:07 2007
    @@ -703,6 +703,8 @@
    }
    break;
    case t_unknown:
    + if (event == e_Poll)
    + break;
    switch (up->state) {
    case s_Base:
    if (event != e_Init)

    (Saves dealing with it in every case s_* within t_unknown)

    On the other hand, converting every abort() to break as you did
    to start with could be the right answer after all :-/

    --
    Ronan Flood

  5. Re: refclock use causes core dump of ntpd

    On Feb 23, 3:43 am, Ronan Flood wrote:
    > "wa6zvp" wrote:
    > > ** Yes. I remembered correctly. up-type = t_unknown, up->state =
    > > s_Base, and event = e_Poll.
    > > Very bizarre. See below.

    >
    > That is odd; as I said before the first thing that should happen
    > is for it to get e_Init from true_start() and go into s_InqGOES.


    * Oh, it does pass through doevent a couple of times before the crash.
    The above are the values on the call that will end up at abort().

    > > * Despite the fact that doevent is entered with valid params for type
    > > and state,
    > > that is, t_unknown and t_Base, it still gets to the abort statement at
    > > the end
    > > of the function whenever its called with event = e_Poll.

    >
    > > Nowhere in this function is this event handled or used. Not even
    > > mentioned.

    >
    > Sure it is, line 629 in "case t_goes:".


    * Doh! I missed that.

    > It struck me last night (duh) that this unexpected code flow could be
    > an effect of optimisation: if gcc recognises abort() and knows that it
    > will not return, it could convert all the earlier calls to it into
    > a jump to this one. I'm assuming ntpd is being compiled with gcc -O2
    > or similar, so if you want to persist you could try taking that out.


    * I remembered yesterday as well that optimisation can mess up
    debugging so
    I removed the -O2 from the cflags. Made no change to the symptoms at
    all.

    > > So, I backed out my hack from last night, and now have
    > > simply commented out the one and only line that calls
    > > doevent with event=e_Poll. Line 540 in function true_receive.

    >
    > And your clock works with that?


    * Yea, it works fine.

    > The e_Poll presumably needs to be there for t_goes, so the proper
    > solution to this would seem to be to have e_Poll ignored where not
    > expected. It only seems to be t_unknown where that doesn't happen
    > currently, so a simple fix like this might do the job:
    >
    > --- refclock_true.c.orig Wed Feb 25 05:58:17 2004
    > +++ refclock_true.c Fri Feb 23 11:18:07 2007
    > @@ -703,6 +703,8 @@
    > }
    > break;
    > case t_unknown:
    > + if (event == e_Poll)
    > + break;
    > switch (up->state) {
    > case s_Base:
    > if (event != e_Init)
    >
    > (Saves dealing with it in every case s_* within t_unknown)


    * Yea, that looks simple enough. I'll try that today.

    > On the other hand, converting every abort() to break as you did
    > to start with could be the right answer after all :-/
    >
    > --
    > Ronan Flood


    * Heh, that was effective too.

    I am also of the opinion that refclocks should not call abort (or
    exit).
    Simply not getting data for the clock should be indication enough that
    something is wrong.

    Thanks for your assist in this, Ronan.

    Roger


  6. Re: refclock use causes core dump of ntpd

    On Feb 22, 9:44 pm, Harlan Stenn wrote:
    > Roger,
    >
    > The way to run ntpd under gdb is with the -d flag given to ntpd so it will
    > not fork.
    >
    > H


    I'm using the -n flag, which does what I need too.

    Roger


  7. Re: refclock use causes core dump of ntpd

    "wa6zvp" wrote:

    > * Oh, it does pass through doevent a couple of times before the crash.
    > The above are the values on the call that will end up at abort().


    Ah, OK.

    > * I remembered yesterday as well that optimisation can mess up
    > debugging so
    > I removed the -O2 from the cflags. Made no change to the symptoms at
    > all.


    Huh. (or indeed e_Huh ...)

    > > case t_unknown:
    > > + if (event == e_Poll)
    > > + break;


    > * Yea, that looks simple enough. I'll try that today.


    Failing that (!), change true_receive() around line 540 so that e_Poll
    is only called for t_goes:

    if (up->type == t_goes)
    true_event(peer, e_Poll);

    > I am also of the opinion that refclocks should not call abort (or
    > exit).
    > Simply not getting data for the clock should be indication enough that
    > something is wrong.


    Indeed -- for those who check with ntpq, which not everyone does.

    > Thanks for your assist in this, Ronan.


    No problem.

    --
    Ronan Flood

  8. Re: refclock use causes core dump of ntpd

    On Feb 23, 3:43 am, Ronan Flood wrote:
    >

    [snip]
    > The e_Poll presumably needs to be there for t_goes, so the proper
    > solution to this would seem to be to have e_Poll ignored where not
    > expected. It only seems to be t_unknown where that doesn't happen
    > currently, so a simple fix like this might do the job:
    >
    > --- refclock_true.c.orig Wed Feb 25 05:58:17 2004
    > +++ refclock_true.c Fri Feb 23 11:18:07 2007
    > @@ -703,6 +703,8 @@
    > }
    > break;
    > case t_unknown:
    > + if (event == e_Poll)
    > + break;
    > switch (up->state) {
    > case s_Base:
    > if (event != e_Init)
    >
    > (Saves dealing with it in every case s_* within t_unknown)
    >
    > On the other hand, converting every abort() to break as you did
    > to start with could be the right answer after all :-/
    >
    > --
    > Ronan Flood


    * This patch works fine. Thanks Ronan.

    * I've decided to bite the bullet and modify the source to correctly
    handle the TL-3 TrueTime receiver.
    I've got a good enough handle on this driver now.

    The TL-3 box can be placed in the 1-per-second timecode at will,
    either from the serial port or via
    the control panel. When this driver encounters a _valid_ TrueTime
    timecode on the serial port before it
    has identifed the device, the state machine portion of the code
    (true_doentry) can, and does get extremely confused. It can actually
    end up
    in several different steady states, that after preventing aborts with
    Ronan's patch, the driver will actually produce valid time to ntpd.
    Right now, the driver actually thinks its an Omega receiver.

    It should be able to identify the TL-3 if the auto output is not yet
    turned on, and then proceed to turn on that mode and be happy.

    I'm not sure I'll be able to unravel the unsolicited issue, but I'm
    sure going to try. It should be possible to continue the Inquiry
    portion of
    the code, while dealing with the ongoing incoming data. Time will
    tell.

    * Attention Harlan, since there is no maintainer of this driver, is it
    still possible for me to submit the changed code and get it into the
    distribution?

    73, Roger


  9. Re: refclock use causes core dump of ntpd

    Roger,

    When you are ready, please open a bugzilla ticket for this problem at
    http://bugs.ntp.isc.org and after you have created the ticket you can upload
    your unified diff patch(es) as an attachment.

    And if you'd like to be the maintainer of this refclock I'm fine with that,
    too!

    H

  10. Re: refclock use causes core dump of ntpd

    On Feb 19, 8:21 pm, "wa6zvp" wrote:
    > Hello all
    >
    > I have just built a new FreeBSD 6.2 box with ntp=dev-4.2.5p8 to test
    > some refclocks I have.
    >
    > One of the clocks is a TrueTime TL-3, a semi-consumer WWV based
    > clock. Its data output format matches
    > some/most of the other TT clocks that use the refclock_true driver.
    > (driver 5). 99 times out of 100, ntpd
    > dumps core when the serial line is attached. I did get it to stay up
    > exactly once, and the driver worked


    > Roger, Harvey Mudd College


    * I have submitted this as a bug. https://ntp.isc.org/bugs/show_bug.cgi?id=792
    I have attached the patch file that adds full support fo the TL-3, so
    perhaps it
    will make it into the distribution.

    Roger



+ Reply to Thread
Page 2 of 2 FirstFirst 1 2