weird, boot CPU (#0) not listed by the BIOS. - Linux
This is a discussion on weird, boot CPU (#0) not listed by the BIOS. - Linux ; Newly built machine. Tyan S2927 mainboard. A pair of dual-core AMD Opteron
model 2220 processors. BIOS and kernel both reported the expected count of
4 processors. Then things get weird. There are what appear to be strange
CPU numbers involved. ...
-
weird, boot CPU (#0) not listed by the BIOS.
Newly built machine. Tyan S2927 mainboard. A pair of dual-core AMD Opteron
model 2220 processors. BIOS and kernel both reported the expected count of
4 processors. Then things get weird. There are what appear to be strange
CPU numbers involved. These strange numbers do not get a processor response
so nothing more is activated and the system runs with just the initial CPU
indicated as #0. Kernel is 2.6.23.12 compiled with max CPUS at 4.
The message buffer has these interesting parts:
[ 0.000000] Intel MultiProcessor Specification v1.1
[ 0.000000] Virtual Wire compatibility mode.
[ 0.000000] OEM ID: TEMPLATE Product ID: ETEMPLATE APIC at: 0xFEE00000
[ 0.000000] Processor #67 15:1 APIC version 16
[ 0.000000] Processor #68 15:1 APIC version 16
[ 0.000000] Processor #69 15:1 APIC version 16
[ 0.000000] Processor #70 15:1 APIC version 16
[ 0.000000] Enabling APIC mode: Flat. Using 0 I/O APICs
[ 0.000000] Processors: 4
....
[ 56.272441] Freeing SMP alternatives: 16k freed
[ 56.272519] CPU0: AMD Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
[ 56.272672] weird, boot CPU (#0) not listed by the BIOS.
[ 56.272736] Booting processor 1/67 eip 2000
[ 56.272796] APIC error on CPU0: 00(04)
[ 56.282784] APIC error on CPU0: 00(04)
[ 56.283286] APIC error on CPU0: 00(04)
[ 61.277842] Not responding.
[ 61.277893] Inquiring remote APIC #67...
[ 61.277946] ... APIC #67 ID: failed
[ 61.278129] ... APIC #67 VERSION: failed
[ 61.278312] ... APIC #67 SPIV: failed
[ 61.278496] CPU #67 not responding - cannot use it.
....
Then the last 10 messages repeat 3 more times in the context of CPU numbers
68, 69, and 70. Why these numbers? Corrupt APIC data? BIOS error?
I'll look into the kernel code handling this tomorrow to see what it might
be expecting. Maybe I could make a patch and fake it for this particular
machine for now, assuming the correct numbers should be 0, 1, 2, and 3.
Might that work? Or is this something to contact Tyan tech support on?
The full dmesg buffer is at:
http://phil.ipal.org/usenet/colds/20...ird-cpus-0.txt
--
|---------------------------------------/----------------------------------|
| Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
| first name lower case at ipal.net / spamtrap-2008-01-02-0029@ipal.net |
|------------------------------------/-------------------------------------|
-
Re: weird, boot CPU (#0) not listed by the BIOS.
phil-news-nospam@ipal.net burped up warm pablum in
news:flfcvr02v22@news1.newsguy.com:
> Newly built machine. Tyan S2927 mainboard. A pair of dual-core AMD Opteron
> model 2220 processors. BIOS and kernel both reported the expected count of
> 4 processors. Then things get weird. There are what appear to be strange
> CPU numbers involved. These strange numbers do not get a processor response
> so nothing more is activated and the system runs with just the initial CPU
> indicated as #0. Kernel is 2.6.23.12 compiled with max CPUS at 4.
....
> Then the last 10 messages repeat 3 more times in the context of CPU numbers
> 68, 69, and 70. Why these numbers? Corrupt APIC data? BIOS error?
Tyan boards seem to have trouble with multiple CPUs while booting. Try this google search:
http://www.google.ca/search?num=30&h...ic+%2367+TYAN&
btnG=Search&meta=
--
Tris Orendorff
[ Anyone naming their child should spend a few minutes checking rhyming slang and dodgy
sounding names. Brad and Angelina failed to do this when naming their kid Shiloh Pitt. At some
point, someone at school is going to spoonerise her name.
Craig Stark ]
-
Re: weird, boot CPU (#0) not listed by the BIOS.
On Wed, 02 Jan 2008 16:26:46 GMT Tris Orendorff wrote:
| phil-news-nospam@ipal.net burped up warm pablum in
| news:flfcvr02v22@news1.newsguy.com:
|
|> Newly built machine. Tyan S2927 mainboard. A pair of dual-core AMD Opteron
|> model 2220 processors. BIOS and kernel both reported the expected count of
|> 4 processors. Then things get weird. There are what appear to be strange
|> CPU numbers involved. These strange numbers do not get a processor response
|> so nothing more is activated and the system runs with just the initial CPU
|> indicated as #0. Kernel is 2.6.23.12 compiled with max CPUS at 4.
|
| ...
|
|> Then the last 10 messages repeat 3 more times in the context of CPU numbers
|> 68, 69, and 70. Why these numbers? Corrupt APIC data? BIOS error?
|
| Tyan boards seem to have trouble with multiple CPUs while booting. Try this google search:
| http://www.google.ca/search?num=30&h...ic+%2367+TYAN&
| btnG=Search&meta=
Nice. My post comes up first 
So basically, does this mean Tyan and Linux are incompatible? I saw a few
complains about various related issues in that search, but no answers. I
did post a ticket with Tyan support.
--
|---------------------------------------/----------------------------------|
| Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
| first name lower case at ipal.net / spamtrap-2008-01-02-1831@ipal.net |
|------------------------------------/-------------------------------------|
-
Re: weird, boot CPU (#0) not listed by the BIOS.
On Wed, 02 Jan 2008 16:26:46 GMT Tris Orendorff wrote:
| phil-news-nospam@ipal.net burped up warm pablum in
| news:flfcvr02v22@news1.newsguy.com:
|
|> Newly built machine. Tyan S2927 mainboard. A pair of dual-core AMD Opteron
|> model 2220 processors. BIOS and kernel both reported the expected count of
|> 4 processors. Then things get weird. There are what appear to be strange
|> CPU numbers involved. These strange numbers do not get a processor response
|> so nothing more is activated and the system runs with just the initial CPU
|> indicated as #0. Kernel is 2.6.23.12 compiled with max CPUS at 4.
|
| ...
|
|> Then the last 10 messages repeat 3 more times in the context of CPU numbers
|> 68, 69, and 70. Why these numbers? Corrupt APIC data? BIOS error?
|
| Tyan boards seem to have trouble with multiple CPUs while booting. Try this google search:
| http://www.google.ca/search?num=30&h...ic+%2367+TYAN&
| btnG=Search&meta=
Well, that search was not helpful.
However, I did, at someone's suggestion, try booting some other distros.
Fedora 8 hangs during kernel probes. Ubuntu 6.06 does come up if I use
the "noapic" option AND ... It recognizes all 4 CPUS correctly! But if I
use the "noapic" option with my 2.6.23.12 kernel, it still has the same
problem where it gets the wrong CPU numbers. Ubuntu 6.06 has kernel 2.6.15.
So it seems one of two things might be the issue:
1. Ubuntu built their kernel with the magic "don't get the CPU numbers
wrong" option (whatever that might be).
2. Somewhere between 2.6.15 and 2.6.23.12 the kernel broke the ability
to see the correct CPU numbers.
Any idea which it is? Whatever it is, it might be something Fedora did
not do, or did wrong.
--
-----------------------------------------------------------------------------
| Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------
-
Solved - Was: weird, boot CPU (#0) not listed by the BIOS.
On 2 Jan 2008 07:03:23 GMT phil-news-nospam@ipal.net wrote:
| Newly built machine. Tyan S2927 mainboard. A pair of dual-core AMD Opteron
| model 2220 processors. BIOS and kernel both reported the expected count of
| 4 processors. Then things get weird. There are what appear to be strange
| CPU numbers involved. These strange numbers do not get a processor response
| so nothing more is activated and the system runs with just the initial CPU
| indicated as #0. Kernel is 2.6.23.12 compiled with max CPUS at 4.
|
| The message buffer has these interesting parts:
|
| [ 0.000000] Intel MultiProcessor Specification v1.1
| [ 0.000000] Virtual Wire compatibility mode.
| [ 0.000000] OEM ID: TEMPLATE Product ID: ETEMPLATE APIC at: 0xFEE00000
| [ 0.000000] Processor #67 15:1 APIC version 16
| [ 0.000000] Processor #68 15:1 APIC version 16
| [ 0.000000] Processor #69 15:1 APIC version 16
| [ 0.000000] Processor #70 15:1 APIC version 16
| [ 0.000000] Enabling APIC mode: Flat. Using 0 I/O APICs
| [ 0.000000] Processors: 4
|
| ...
|
| [ 56.272441] Freeing SMP alternatives: 16k freed
| [ 56.272519] CPU0: AMD Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
| [ 56.272672] weird, boot CPU (#0) not listed by the BIOS.
| [ 56.272736] Booting processor 1/67 eip 2000
| [ 56.272796] APIC error on CPU0: 00(04)
| [ 56.282784] APIC error on CPU0: 00(04)
| [ 56.283286] APIC error on CPU0: 00(04)
| [ 61.277842] Not responding.
| [ 61.277893] Inquiring remote APIC #67...
| [ 61.277946] ... APIC #67 ID: failed
| [ 61.278129] ... APIC #67 VERSION: failed
| [ 61.278312] ... APIC #67 SPIV: failed
| [ 61.278496] CPU #67 not responding - cannot use it.
|
| ...
|
| Then the last 10 messages repeat 3 more times in the context of CPU numbers
| 68, 69, and 70. Why these numbers? Corrupt APIC data? BIOS error?
|
| I'll look into the kernel code handling this tomorrow to see what it might
| be expecting. Maybe I could make a patch and fake it for this particular
| machine for now, assuming the correct numbers should be 0, 1, 2, and 3.
| Might that work? Or is this something to contact Tyan tech support on?
|
| The full dmesg buffer is at:
| http://phil.ipal.org/usenet/colds/20...ird-cpus-0.txt
It turns out the problem is that SMP, at least on an APIC machine, requires
that APIC be enabled under power management in the source tree configuration.
Having seen a few computers that don't play well with Linux where APIC is
involved, I usually leave APIC disabled. IMHO, APIC is one of the many ways
to totally bastardize the "PC architecture". It is overly complicated for
what it provides, and is too unstable. Nevertheless, it actually works in
the case of the Tyan S2927 mainboard. So my guess is the SMP code detected
the machine was APIC, and attempted to look at the APIC information which
had probably not been gathered since APIC was not enabled, and merely picked
up garbage left over from other uses, or used an invalid pointer that happened
to not crash things.
If it makes sense on some (non-APIC) machines to have SMP enabled (SMP did
exist before APIC, so this must be a yes), then perhaps the SMP code itself
needs to be made to correctly detect if APIC is truly enabled in the kernel
(don't assume so just because the hardware/BIOS has it), and not attempt to
use APIC if not, and revert to using older methods to detect CPUs (like guess
the CPU numbers sequentially and stop when one doesn't work and don't even
try for pluggable CPU sets).
Better documentation in the kernel would also help.
--
-----------------------------------------------------------------------------
| Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------