Y2038 bug strikes early - NTP
This is a discussion on Y2038 bug strikes early - NTP ; From the latest RISKS Digest:
Date: Thu, 29 Jun 2006 13:38:25 -0700
From: Conrad Heiney
Subject: Y2038 bug strikes early
Starting on May 12, 2006, many installations of the AOLServer web server
failed. Not all versions or all configurations failed, ...
-
Y2038 bug strikes early
From the latest RISKS Digest:
Date: Thu, 29 Jun 2006 13:38:25 -0700
From: Conrad Heiney
Subject: Y2038 bug strikes early
Starting on May 12, 2006, many installations of the AOLServer web server
failed. Not all versions or all configurations failed, but the ones that did
became unusable. On start, the server would eat virtual memory and then
terminate with a memory allocation error. Discussion on the mailing list
revealed the starting date of the problem, indicating that some part of the
software had a clock issue. On careful inspection it was discovered that
database threads were a common factor. It was then noted by a perceptive
person that the servers all failed on or before exactly one billion seconds
before the end of the Unix epoch in 2038. Many installations had very long
database timeouts, which caused the software to look ahead and see the End
of Time. Adjusting the timeouts stopped the crashes.
The risk of the known clock bug striking 32 years early indicates there may
be other "pre-problems" lurking in software that will show up long before
the date we have comfortably set as the deadline.
The thread discussing the problem and its resolution is here:
http://www.mail-archive.com/aolserve.../msg09812.html
-
Re: Y2038 bug strikes early
> From: Conrad Heiney
> Starting on May 12, 2006, many installations of the AOLServer web
> server failed. Not all versions or all configurations failed, but
> the ones that did became unusable. On start, the server would eat
> virtual memory and then terminate with a memory allocation error.
I would have expected that one server would have failed with an error
message and millions of others followed with the message "ME TOO".
-wolfgang
-
Re: Y2038 bug strikes early
Marc,
Unix doesn't have to have a 2038 rollover problem, just as NTP doesn't
have a 2036 rollover problem. Evidence to this assertion has been
reported in recent messages to this list and the hackers@ntp.org support
group. It's all in the carefully designed 64-bit twos complement
calculations that determine the relative date and time, as long as the
clock is set first within 68 years of the actual calendar date. See
http://www.eecis.udel.edu/~mills/y2k.html.
Dave
Marc Brett wrote:
> From the latest RISKS Digest:
>
>
> Date: Thu, 29 Jun 2006 13:38:25 -0700
> From: Conrad Heiney
> Subject: Y2038 bug strikes early
>
> Starting on May 12, 2006, many installations of the AOLServer web server
> failed. Not all versions or all configurations failed, but the ones that did
> became unusable. On start, the server would eat virtual memory and then
> terminate with a memory allocation error. Discussion on the mailing list
> revealed the starting date of the problem, indicating that some part of the
> software had a clock issue. On careful inspection it was discovered that
> database threads were a common factor. It was then noted by a perceptive
> person that the servers all failed on or before exactly one billion seconds
> before the end of the Unix epoch in 2038. Many installations had very long
> database timeouts, which caused the software to look ahead and see the End
> of Time. Adjusting the timeouts stopped the crashes.
>
> The risk of the known clock bug striking 32 years early indicates there may
> be other "pre-problems" lurking in software that will show up long before
> the date we have comfortably set as the deadline.
>
> The thread discussing the problem and its resolution is here:
> http://www.mail-archive.com/aolserve.../msg09812.html
-
Re: Y2038 bug strikes early
In article ,
David L. Mills wrote:
>Unix doesn't have to have a 2038 rollover problem, just as NTP doesn't
>have a 2036 rollover problem. Evidence to this assertion has been
>reported in recent messages to this list and the hackers@ntp.org support
>group. It's all in the carefully designed 64-bit twos complement
>calculations that determine the relative date and time
I'd like to see some evidence of these Unix(R) systems of which you
speak, with "carefully designed 64-bit twos complement calculations".
If you adhere to the Single UNIX Specification, your date and time
representation is determined by a formula in the POSIX standard.[1]
The result of evaluating that formula will exceed 2**31 in January,
2038 -- end of story. One can hope that all systems in use by then
will have settled on a time_t type wider than that (or even better,
that time_t becomes an ill-remembered historical relic), and that all
applications which store times on disk or transmit them over the
network have done likewise, but I'm not counting on it.
-GAWollman
[1] This formula is highly unlikely to change in the ongoing POSIX
revision, even though its representation of leap seconds is ambiguous.
--
Garrett A. Wollman | As the Constitution endures, persons in every
wollman@csail.mit.edu | generation can invoke its principles in their own
Opinions not those | search for greater freedom.
of MIT or CSAIL. | - A. Kennedy, Lawrence v. Texas, 539 U.S. 558 (2003)
-
Re: Y2038 bug strikes early
Garrett,
The issue has nothing to do with Unix or POSIX. It has to do with NTP
timestamps. There are two sources of evidence, the page I referenced,
and actual test with Solaris 10 and current NTP daemon ntpd. Set the
Unix clock in the server to early 2037; set the client to the current
date and start with the -g option. Try this the other way around as
well. All this proves only that the NTP rollover will be transparent as
long as Unix is transparent beyond 2038.
It is important to note that NTP calculations never assume an absolute
value, only an offset relative to 136-year eras. Native Unix timekeeping
could do the same thing with result calculations spanning Unix eras
would be unambiguous as long as the difference between two timestamps
did not exceed 34 years (because Unix seconds are signed).
Modern kernels I have seen represent seconds in 64-bit twos complement
signed integer, which is the same as in the NTP date format. While the
base era for NTP is 1900 and for Unix is 1970, the 64-bit signed seconds
field can represent seconds since before the big bang until after the
Sun grows cold.
Dave
Garrett Wollman wrote:
> In article ,
> David L. Mills wrote:
>
>
>>Unix doesn't have to have a 2038 rollover problem, just as NTP doesn't
>>have a 2036 rollover problem. Evidence to this assertion has been
>>reported in recent messages to this list and the hackers@ntp.org support
>>group. It's all in the carefully designed 64-bit twos complement
>>calculations that determine the relative date and time
>
>
> I'd like to see some evidence of these Unix(R) systems of which you
> speak, with "carefully designed 64-bit twos complement calculations".
>
> If you adhere to the Single UNIX Specification, your date and time
> representation is determined by a formula in the POSIX standard.[1]
> The result of evaluating that formula will exceed 2**31 in January,
> 2038 -- end of story. One can hope that all systems in use by then
> will have settled on a time_t type wider than that (or even better,
> that time_t becomes an ill-remembered historical relic), and that all
> applications which store times on disk or transmit them over the
> network have done likewise, but I'm not counting on it.
>
> -GAWollman
>
> [1] This formula is highly unlikely to change in the ongoing POSIX
> revision, even though its representation of leap seconds is ambiguous.
>
-
Re: Y2038 bug strikes early
In article ,
David L. Mills wrote:
>The issue has nothing to do with Unix or POSIX. It has to do with NTP
>timestamps.
That's great for NTP but isn't responsive to your original claim:
>>>Unix doesn't have to have a 2038 rollover problem, just as NTP doesn't
>>>have a 2036 rollover problem.
UNIX brand operating systems are not (under current standards)
permitted to do the sort of epoch-windowing you describe and NTP
implements. Thus, the only solution to the Y2038 problem which
comports with the requirements of the standard is to make time_t be a
wider type.[1] As you note, many operating systems are now using a
64-bit type internally, but applications and protocols have failed to
keep up. The concern for the industry is, will we find and fix all
those systems in time? (I hope, given how much the pace of change has
increased, that this will be a non-issue in thirty years' time.)
-GAWollman
[1] Every time in recent memory that the POSIX committee has tried to
tackle the leap-second bug, someone always pops up and insists that
the Y2038 problem must also be solved at the same time (by increasing
the required range of time_t). This then gets tangled up with the
issue of finer-resolution file timestamps and the whole mess goes into
a rathole, not to be seen until the next round of revisions.
--
Garrett A. Wollman | As the Constitution endures, persons in every
wollman@csail.mit.edu | generation can invoke its principles in their own
Opinions not those | search for greater freedom.
of MIT or CSAIL. | - A. Kennedy, Lawrence v. Texas, 539 U.S. 558 (2003)
-
Re: Y2038 bug strikes early
Garrett Wollman wrote:
> In article ,
> David L. Mills wrote:
>
>
>>The issue has nothing to do with Unix or POSIX. It has to do with NTP
>>timestamps.
>
>
> That's great for NTP but isn't responsive to your original claim:
>
>
>>>>Unix doesn't have to have a 2038 rollover problem, just as NTP doesn't
>>>>have a 2036 rollover problem.
>
>
> UNIX brand operating systems are not (under current standards)
> permitted to do the sort of epoch-windowing you describe and NTP
> implements. Thus, the only solution to the Y2038 problem which
> comports with the requirements of the standard is to make time_t be a
> wider type.[1] As you note, many operating systems are now using a
> 64-bit type internally, but applications and protocols have failed to
> keep up. The concern for the industry is, will we find and fix all
> those systems in time? (I hope, given how much the pace of change has
> increased, that this will be a non-issue in thirty years' time.)
>
The available evidence (Y2K) suggests that the problem will not be
addressed until 2036 at the earliest. "It's not going to break in my
working lifetime so why should I fix it?"!!!!! As I recall the Y2K
problem was first noted sometime in the 1970s. It didn't need to be
fixed for 25 years so nobody worried about it.
-
Re: Y2038 bug strikes early
"Richard B. Gilbert" writes:
>The available evidence (Y2K) suggests that the problem will not be
>addressed until 2036 at the earliest. "It's not going to break in my
>working lifetime so why should I fix it?"!!!!! As I recall the Y2K
>problem was first noted sometime in the 1970s. It didn't need to be
>fixed for 25 years so nobody worried about it.
OSes are moving to 64 bitness rapidly and as such we hope
to see fewer and fewer 32 bit UNIX OSes and programs with
bad timestamp handling.
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
-
Re: Y2038 bug strikes early
In article <8dudnf3focSOlk3ZnZ2dnUVZ_qydnZ2d@comcast.com>,
Richard B. Gilbert wrote:
> ... As I recall the Y2K
>problem was first noted sometime in the 1970s. It didn't need to be
>fixed for 25 years so nobody worried about it.
Not true, in the 1980's mortgage software had to deal with dates
beyond 1999.
--
-- Rod --
rodd(at)polylogics(dot)com
-
Re: Y2038 bug strikes early
Rod Dorman wrote:
> In article <8dudnf3focSOlk3ZnZ2dnUVZ_qydnZ2d@comcast.com>,
> Richard B. Gilbert wrote:
>
>> ... As I recall the Y2K
>>problem was first noted sometime in the 1970s. It didn't need to be
>>fixed for 25 years so nobody worried about it.
>
>
> Not true, in the 1980's mortgage software had to deal with dates
> beyond 1999.
>
Picky, picky, picky.
All right, so almost nobody addressed the problem. The point was that
people were aware as early as the mid 1970's that the two digit years
that had been widely used to save space on 80 column punched cards and
never changed to four digit years were going to be a problem. It wasn't
so much the old data as the programs that were written to handle two
digit years and just assumed that the years involved were all 19xx.
New software generally used four digit years but there was an awful lot
of legacy stuff that had been around since the 1960's and 70's that
needed to be fixed. Nobody wanted to spend money fixing something that
wouldn't break for twenty-five years, or twenty years, or ten or five.
The effort to test and clean up all the application software didn't
really get under way until around 1998 or, in many cases, 1999.