What still uses the block layer? - Kernel

This is a discussion on What still uses the block layer? - Kernel ; Greg KH wrote: > On Mon, Oct 15, 2007 at 03:36:15AM -0500, Rob Landley wrote: >> The point I was trying to make is that it seems to me like it would be >> possible to keep the namespace separate ...

+ Reply to Thread
Page 3 of 5 FirstFirst 1 2 3 4 5 LastLast
Results 41 to 60 of 91

Thread: What still uses the block layer?

  1. Re: What still uses the block layer?

    Greg KH wrote:
    > On Mon, Oct 15, 2007 at 03:36:15AM -0500, Rob Landley wrote:
    >> The point I was trying to make is that it seems to me like it would be
    >> possible to keep the namespace separate here, and thus reduce the enumeration
    >> problems to the point where common cases (like my laptop) aren't impacted by
    >> them during early boot.

    >
    > Proposals on how to do this would be gladly reviewed.


    Agreed.


    > But again, please remember that these USB devices are really SCSI
    > devices. Same for SATA devices. There is a reason they are using the
    > SCSI layer, and it isn't just because the developers felt like it


    /somewhat/ true I'm afraid: libata uses the SCSI layer for ATAPI
    devices because they are essentially bridges to SCSI devices. It uses
    the SCSI layer for ATA devices because the SCSI layer provided a huge
    amount of infrastructure that would need to have been otherwise
    duplicated, /then/ massaged into coordinating between layer> and when dealing with ATAPI.

    There is also a detail that was of /huge/ value when introducing a new
    device class: distro installers automatically work, if you use SCSI.
    If you use a new block device type, that behaves differently from other
    types and is on a different major, you have to poke the distros into
    action or do it yourself.

    IOW, it was the high Just Works(tm) value of the SCSI layer when it came
    to ATA (not ATAPI) devices.

    For the future, ATA will eventually be more independent (though the SCSI
    simulator will be available as an option, for compat), but the value is
    big enough to put that task on the back-burner.

    Jeff



    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: What still uses the block layer?

    On Mon, Oct 15, 2007 at 07:00:22AM -0700, Arjan van de Ven wrote:

    > that's a choice Ubuntu made in their udev scripts... if you don't like
    > it, complain to them.


    Keeping the naming as hda while changing the semantics (such as the
    reduced number of partitions) would have been differently confusing. We
    did look into keeping compatibility symlinks, but decided to just
    transition everything to UUIDs instead.

    --
    Matthew Garrett | mjg59@srcf.ucam.org
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: What still uses the block layer?

    Am Mon, Oct 15, 2007 at 04:26:04AM -0500 schrieb Rob Landley:
    > To clarify, I think that merging ide, sata, usb, firewire, and others into a
    > single device namespace causes each type of device to inherit that
    > namespace's cumulative ordering issues, which is a bad thing. I have no real
    > attachment to the underlying scsi or block layers. I've never seriously
    > worked on either (although I'm trying to understand both).
    >
    > For example, usb devices are never easy to order. IDE devices (back when they
    > had their own namespace) were trivial to order back when /dev/hda couldn't
    > move without use of a screwdriver. USB and IDE devices are very different in
    > that it's not possible to plug a USB device into an IDE controller (not
    > without one _heck_ of an adapter) and vice versa. USB devices usually live
    > outside the computer's case, and IDE devices inside the case. They're not
    > the same thing.
    >
    > Combining USB and IDE into the same /dev/sd? namespace makes enumerating the
    > IDE devices much harder than in the traditional "/dev/hdb doesn't move
    > without a screwdriver" model.


    I have udev here, and it generates several useful symlinks.
    /dev/disk/by-path/pci-0000:00:1f.1-scsi-0:0:0:0-part2 will always point
    to the second primary partition of the IDE master on the first IDE
    channel here, be there as many USB sticks as there may.

    (But still I'd like it if it wasn't named "scsi-0:0:0:0", because the
    "0:0:0:0" part could change.)

    > The merger creates a new problem for IDE, one
    > which didn't exist before: the addition or removal of other unrelated types
    > of devices may change this device's location next boot. It may be possible
    > to add additional complication to the system to compensate, but what was the
    > advantage of merging the namespaces in the first place?


    I don't think there was any intent to merge namespaces. It "just happened"
    as a byproduct of having sata/pata use the scsi subsystem.

    Wilfried
    --
    Irgendwas ist ja immer...
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: What still uses the block layer?

    On Monday 15 October 2007 5:32:32 am Loïc Grenié wrote:
    > You are really looking like you are out for a fight.

    ....
    > Your objection is interesting. It is lost in the middle of e-mails
    > which, to the untrained eye, look like you are trying to fight everyone and
    > everybody.

    ....
    > ...holy external disk...
    > ...holy external hard...

    ....
    > You would probably have received more interesting answers and less
    > insults.

    ....
    > Once again. You are so aggressive in your asking that it does not
    > lead to an interesting discussion.

    ....
    > Out for a fight ?


    This is where I hit my ad hominem attack quota and stopped reading.

    Rob
    --
    "One of my most productive days was throwing away 1000 lines of code."
    - Ken Thompson.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: What still uses the block layer?

    On Monday 15 October 2007 6:19:58 am Neil Brown wrote:
    > On Monday October 15, rob@landley.net wrote:
    > > This is my objection. Even when enumerating multiple devices of the same
    > > type is tricky, enumerating multiple devices of _different_ types should
    > > not be. There's a great big type indicator that is being _deliberately_
    > > ignored, and large classes of devices (millions of laptops) where you
    > > know there's only going to be _one_ instance of a given type.

    >
    > My perspective is different.
    >
    > The range of addressing option for "all disk devices" is far too rich
    > to be able to assign a stable device number every device: there are
    > multiple, multi-dimensional addressing scheme, and some devices might
    > not even have a stable address at all (e.g. USB?).
    > So the reality of dealing with disk devices is that you cannot provide
    > a stable single-number naming scheme for all devices on all machines.


    Sure.

    > Therefore it is best to not have stable single-number naming schemes
    > for any devices on any machines. Why? Because it ensure there will
    > not be any second class citizens.


    This is where we disagree. The existence of devices you cannot stably
    enumerate does not eliminate the existence of devices you trivially can.

    Pulling out the "IBM numa cluster with multiple SAS enclosures _and_ firewire"
    infrastructure to find the root partition on my hard drive may be good for
    the IBM numa clusters, but only at the expense of complicating this part of
    my laptop's infrastructure by an order of magnitude, and making embedded
    systems nearly impossible to put together. If "one size fits all" were true,
    my cell phone would be running Red Hat Enterprise.

    > If some devices that are even reasonably common (e.g. IDE drives) are
    > stable, then some application developers or system integrators will
    > work under the assumption of stability and whatever they build will
    > break when you try it on different hardware.


    So you break the IDE drives to get laptop users to debug the Niagra set? The
    solution is to make the easy cases hard?

    > This happened during the
    > early days of SCSI support - code assumed the stability of
    > major/minor numbers and so did not work properly with SCSI which
    > cannot provide that stability in general.


    In this case, I ripped the relevant infrastructure out by hand so fstab again
    has /dev/sda. I can do it again on future systems. I'd just really rather
    not have to.

    > Having a totally uniform approach makes development and testing a lot
    > easier - there are fewer special cases.


    There are actually more special cases, you just expose more people to them.

    > I would prefer that 'total uniformity' went even further than
    > /dev/sd?? to /dev/disk??. i.e. Anything that is or behaves
    > substantially like a disk drive should be "/dev/diskXX", where numbers
    > are assigned sequentially on discovery. (I wonder why we need
    > /dev/scdX to be separate from /dev/sdX).


    It's /dev/srX here, and I have no idea.

    I believe merging these namespaces invents problems, and was a bad idea. I
    understand you're reasoning, but imposing the problems of mainframes onto
    laptops does not strike me as an improvement for laptops.

    > Note that stable names a still a very real option. udev provides
    > several. /dev/disk-by-path/XXX will be stable for lots of "screwed
    > in" devices. /dev/disk-by-id will be stable for devices the report a
    > unique id. etc.


    Here it's

    ls /dev/disk/by-path/
    pci-0000:00:1f.2-scsi-0:0:0:0 pci-0000:00:1f.2-scsi-0:0:0:0-part4
    pci-0000:00:1f.2-scsi-0:0:0:0-part1 pci-0000:00:1f.2-scsi-0:0:0:0-part5
    pci-0000:00:1f.2-scsi-0:0:0:0-part2 pci-0000:00:1f.2-scsi-0:0:0:0-part6
    pci-0000:00:1f.2-scsi-0:0:0:0-part3 pci-0000:00:1f.2-scsi-1:0:0:0

    And this is an improvement?

    > The different between IDE, SATA, SCSI and even USB is peripheral for
    > the large majority of uses, and I think maintaining the distinction in
    > the major/minor number or in the "primary" /dev name is - for the
    > above reasons - more of a cost that a value.


    Is your definition of "the large majority of uses" where ncr Voyager, the
    Amiga, and current macintosh laptops are all one use each, or is your
    definition of "the large majority of uses" the one where each "use" is an
    installation, of which there are millions of PCs (and even more ARM cell
    phones), and something like three instances of Voyager?

    I realize that both views are valid. This is why the US has a house and a
    senate, and filters things through both views. My gripe is that forcing my
    laptop to look at my USB devices to find my SATA hard drive is aligned with
    only one of those viewpoints, and completely opposed to the other.

    An approach that makes things much easier on laptops is seen to hurt big iron,
    not because it the approach itself has a direct negative impact on big iron,
    but only because then laptops are not saddled with the problems of big iron.

    Why do you allow uni-processor kernel builds then?

    > NeilBrown


    Rob
    --
    "One of my most productive days was throwing away 1000 lines of code."
    - Ken Thompson.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: What still uses the block layer?

    Rob Landley wrote:
    > I realize that both views are valid. This is why the US has a house and a
    > senate, and filters things through both views. My gripe is that forcing my
    > laptop to look at my USB devices to find my SATA hard drive is aligned with
    > only one of those viewpoints, and completely opposed to the other.
    >
    > An approach that makes things much easier on laptops is seen to hurt big iron,
    > not because it the approach itself has a direct negative impact on big iron,
    > but only because then laptops are not saddled with the problems of big iron.



    And we are telling you that, in a modern hotplug world -- yes even on a
    laptop -- you are clinging too much to assumptions that were never 100%
    true in the first place, and much less so on today's laptops.

    When you can unplug a SATA drive from a laptop, and plug it back in via
    USB, you can see how unwise it is to hardcode device names into your fstab.

    We invented udev, sysfs, mount-by-label, mount-by-uuid, LVM and all
    sorts of other gadgets to make this problem go away.

    If you ignore the solutions that exist to solve these problems, then of
    course annoyances will persist as the state of hardware marches forward.

    Jeff


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: What still uses the block layer?

    > This is where we disagree. The existence of devices you cannot stably
    > enumerate does not eliminate the existence of devices you trivially can.


    "trivially"

    You are I assume familiar in full with EDD 3.0, EDD 1.x and the Ralf
    Brown documentation on the BIOS drive mappings and tables for different
    BIOSes ?

    If you are then you could add EDD 1.x spport, FADT parsing and update the
    EDD driver to produce links to the drives in BIOS map order. Would be
    quite useful but very few people on the planet actually know all the
    arcana to do this.

    Alan
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: What still uses the block layer?

    On Monday 15 October 2007 8:10:49 am James Bottomley wrote:
    > OK, so could we get back to the original discussion? The question I
    > think you meant to ask is "does SCSI use the block layer, and if so;
    > how?"
    >
    > The answer is yes (just do an ls /sys/block on any scsi machine). The
    > how is that it bascially uses the block layer as a service library (i.e.
    > most SCSI services are built on top of those already provided by block).
    > The email you cited was basically from our one area of confusion: SCSI
    > and block both provide services to decode the SG_IO ioctl. This is
    > partly historical; block and SCSI are very much intertwined; so much so
    > that they both tend to drive each other's development. The programme
    > over the last few years has been to identify features in SCSI that
    > should be more generic (and hence moved to block). SG_IO is one of
    > these, so we end up with the situation where Block provides this as a
    > service (and sr, st and sd make use of it) while the sg driver still
    > doesn't use what the block layer provides but rolls its own. I think
    > the layout of how all this works is illustrated at a reasonably high
    > level here on slide 15:
    >
    > http://licensing.steeleye.com/suppor...005_slides.pdf


    Thanks, that's exactly what I wanted to know.

    > > However, the response to my attempts to express this dissatisfaction on
    > > the SCSI list a few months ago came too close to a flamewar for me to
    > > consider continuing it productive. I'd still love to update the "2.4
    > > scsi howto" and corresponding sg howto, but lack the expertise. The SCSI
    > > layer really isn't my area, and I was much happier back when I could
    > > avoid using it at all.

    >
    > That was because your initial inquiry came across as "I'm trying to
    > document this, and by the way it's rubbish".


    Sorry about that. Not my intent. I was aiming more at "I'm trying to
    document this and I don't understand how it works at all, or why it does
    things this way. It seems backwards from what I would expect."

    Rob
    --
    "One of my most productive days was throwing away 1000 lines of code."
    - Ken Thompson.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: What still uses the block layer?

    On Monday 15 October 2007 12:25:13 pm Greg KH wrote:
    > Oh, and this seems like a very Ubuntu specific rant, might I suggest you
    > contact the Ubuntu developers about this? The kernel doesn't dictate
    > that the distro has to use these long identifiers, and there is nothing
    > we can do about it.


    I was just trying to use the strangeness in a large distributor's first
    attempt at this functionality as an evidence that it's apparently not trivial
    to get even the common cases right under the new model, while the common
    cases used to be trivial to get right under the old model. (Or at least it
    seemed so to me.)

    I think I've exhausted this line of argument, though, and will stop now.

    Rob
    --
    "One of my most productive days was throwing away 1000 lines of code."
    - Ken Thompson.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: What still uses the block layer?

    On Monday October 15, rob@landley.net wrote:
    > > Therefore it is best to not have stable single-number naming schemes
    > > for any devices on any machines. Why? Because it ensure there will
    > > not be any second class citizens.

    >
    > This is where we disagree. The existence of devices you cannot stably
    > enumerate does not eliminate the existence of devices you trivially can.


    No, but it dramatically reduces that value of being able to enumerate
    those devices.

    >
    > Pulling out the "IBM numa cluster with multiple SAS enclosures _and_ firewire"
    > infrastructure to find the root partition on my hard drive may be good for
    > the IBM numa clusters, but only at the expense of complicating this part of
    > my laptop's infrastructure by an order of magnitude, and making embedded
    > systems nearly impossible to put together. If "one size fits all" were true,
    > my cell phone would be running Red Hat Enterprise.
    >
    > > If some devices that are even reasonably common (e.g. IDE drives) are
    > > stable, then some application developers or system integrators will
    > > work under the assumption of stability and whatever they build will
    > > break when you try it on different hardware.

    >
    > So you break the IDE drives to get laptop users to debug the Niagra set? The


    Breaking old behaviour is always bad... My computers with IDE
    interfaces still see stable "/dev/hda" devices. Are you saying the
    devices that used to be "hda" are now "sdb" ?? Maybe there is a
    ..config option...

    > solution is to make the easy cases hard?


    Is it really that hard?

    > > Note that stable names a still a very real option. udev provides
    > > several. /dev/disk-by-path/XXX will be stable for lots of "screwed
    > > in" devices. /dev/disk-by-id will be stable for devices the report a
    > > unique id. etc.

    >
    > Here it's
    >
    > ls /dev/disk/by-path/
    > pci-0000:00:1f.2-scsi-0:0:0:0 pci-0000:00:1f.2-scsi-0:0:0:0-part4
    > pci-0000:00:1f.2-scsi-0:0:0:0-part1 pci-0000:00:1f.2-scsi-0:0:0:0-part5
    > pci-0000:00:1f.2-scsi-0:0:0:0-part2 pci-0000:00:1f.2-scsi-0:0:0:0-part6
    > pci-0000:00:1f.2-scsi-0:0:0:0-part3 pci-0000:00:1f.2-scsi-1:0:0:0
    >
    > And this is an improvement?


    Depends on your metric.

    "Easy to type" - I guess /dev/hda1 wins hands down.
    "Can be used in a script or config file and is guaranteed always to
    work until a screwdriver is used to change that device or it's
    controller"
    I think
    /dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0-part1
    is quite acceptable.
    What is your metric?


    >
    > > The different between IDE, SATA, SCSI and even USB is peripheral for
    > > the large majority of uses, and I think maintaining the distinction in
    > > the major/minor number or in the "primary" /dev name is - for the
    > > above reasons - more of a cost that a value.

    >
    > Is your definition of "the large majority of uses" where ncr Voyager, the
    > Amiga, and current macintosh laptops are all one use each, or is your
    > definition of "the large majority of uses" the one where each "use" is an
    > installation, of which there are millions of PCs (and even more ARM cell
    > phones), and something like three instances of Voyager?


    My definition of "the large majority or uses" is "mkfs, fsck, mount,
    fdisk, system-install-process".

    Different people differentiate devices in different ways. A system
    integrator might know about the hardware path. An end user might know
    about drive brands or sizes. A casual user might just think "internal
    or external". The kernel cannot support all these different
    approaches to naming. It really is best if it uses arbitrary names,
    and provides access to descriptions that the user can choose between.
    udev facilitates this with links in /dev/disk/. A system install can
    facilitate this even more by reporting size/manufacturer information etc.

    >
    > I realize that both views are valid. This is why the US has a house and a
    > senate, and filters things through both views. My gripe is that forcing my
    > laptop to look at my USB devices to find my SATA hard drive is aligned with
    > only one of those viewpoints, and completely opposed to the other.


    I'm guessing you are talking about mount-by-uuid? This effectively has
    to look at the filesystem of all devices to discover which one has the
    correct UUID, though it can cache the information for efficiency.

    Maybe it is just an implementation issue. Suppose that everytime a
    device were discovered, it were examined to see what was stored on it,
    and this information was stored in a cache.
    Then to find a particular filesystem to mount, you just look in the
    cache and if the info isn't there yet, just wait or fail as
    appropriate.
    Then we don't "look at my USB devices to find my SATA hard drive" but
    rather "look at each device as it is attached to find out what is in
    it", which seems like a sensible thing to do...

    >
    > An approach that makes things much easier on laptops is seen to hurt big iron,
    > not because it the approach itself has a direct negative impact on big iron,
    > but only because then laptops are not saddled with the problems of big iron.


    I think your "laptops vs big iron" contrast is making the gap seem
    bigger than it really is. Naming issues are present in laptops and
    easily get significant is modest servers.

    >
    > Why do you allow uni-processor kernel builds then?


    Funny you should suggest that...
    I don't think OpenSuSE10.3 includes any UP kernels. There is code in
    the kernel which detects the single processor case and removes some
    the more expense "LOCK" operations to reduce the cost of using an SMP
    kernel on a UP computer.
    There is real value in reducing the number of options, and people have
    obviously put work into making that a cost-effective proposition.

    NeilBrown
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: What still uses the block layer?

    [adding back CCs which were dropped because I'm stupid - sorry!]

    On 10/16/07, Rob Landley wrote:
    > On Monday 15 October 2007 5:27:55 am Julian Calaby wrote:
    > > On 10/15/07, Rob Landley wrote:
    > > > On Monday 15 October 2007 4:06:20 am Julian Calaby wrote:
    > > > > On 10/15/07, Rob Landley wrote:
    > > > > > I note that the eth0 and eth1 names are dynamically assigned on a
    > > > > > first come first serve basis (like scsi). This never causes me a
    > > > > > problem because the driver loading order is constant, and once you
    > > > > > figure out that eth0 is gigabit and eth1 is the 80211g it _stays_
    > > > > > that way across reboots, reliably. Yeah, it's a heuristic. Hands up
    > > > > > everybody relying on such a heuristic in the real world.
    > > > >
    > > > > Umm, not quite, from my experiences with pre-production wireless
    > > > > drivers, (another story, another time) fancy stuff is being done in
    > > > > udev to make sure that your gigabit card is always assigned to eth0.
    > > >
    > > > I remember building a 2.4 kernel, statically linking in all the drivers,
    > > > and getting the ethernet devices showing up in a reliable order for
    > > > years. Where does the need for fancy stuff come in?

    > >
    > > I remember that too. In fact, I have had no issues with network card
    > > enumeration order, outside my own inexperience and stupidity.
    > >
    > > However, this sort of thing is needed now because of the various types
    > > of hotpluggable networking devices, e.g. USB 802.11 cards, USB
    > > ethernet cards, PCMCIA, etc.

    >
    > I thought the strategy was just to scan the hotpluggable busses after the
    > non-hotpluggable busses.


    My (practical) experience is that I couldn't guarantee which card was
    which. (I remember once where it changed over a kernel re-compile) So
    my solution, before Debian's persistent naming scheme appeared, was to
    check it after every new kernel and make sure my config matched up
    with the names of the physical interfaces.

    > > And yes, PCMCIA worked fine for ages, but
    > > usually you'd never have more than one PCMCIA network card.

    >
    > Still don't, but presumably the slots are scanned in a reliable order so if
    > the cards are always present on bootup in the same slots, they'd stay in that
    > order.


    Well, yes and no. My gut feeling is that it's probed like PCI cards
    are. They're initialised when the drivers are loaded, and not before,
    as such, there are no guarantees which card will be initialised first.
    - and anyway, what happens if you plug them in in a different order?

    > > Personally, I use 2 different usb network cards, and I'm quite
    > > comforted to know that the 802.11a one is always wlan0, and the
    > > 802.11b/g one is always wlan1.

    >
    > So if I have a USB 100baseT adapter, and I boot with it plugged in, it'll
    > potentially come before my built-in wireless card due to ordering based on
    > device type?


    Ok, firstly the 100baseT adapter will be named something like ethX,
    the wireless card will most likely be named something like wlanX.

    Now let's say your laptop has a built in ethernet card.

    So, we'll assume a modular kernel, with the module "usbnet" for the
    usb card and "e100" for the onboard card:

    If the "usbnet" module is loaded first, then initially, according to
    the kernel, the usb card will be eth0 and the built in one eth1.

    Now let's assume that, on the PCI bus, the USB controller is in a
    lower slot number than the network card. (highly likely, given that
    the network card is most likely external to the chipset of the laptop)
    It's pretty likely that the USB controller will have it's module
    loaded first, before the built in network card. At this point, it'll
    send out hotplug events for all it's children (root hubs, etc.) and
    eventually an event will be sent out for the usb network card. Now, at
    this point, it's impossible to say which one will claim eth0 first.

    Now, in my case, with my two wireless cards, what happens if I plug
    the 802.11b/g one in first? If this fancy renaming didn't happen, it'd
    end up with the name wlan0 and, hence, try to connect to the network
    which the 802.11a one is supposed to connect to.

    This is not a good thing.

    I also have to make the point that this has been happening all over
    the kernel, well before I started using it. Video4Linux and DVB
    devices can be USB, and the order the /dev/videoX nodes appear in is
    determined by the plugging order. IRDA cards, sound cards, usb
    devices, framebuffers, mice, keyboards, loopback devices, etc. all
    have the same "issue". (and annoyingly, they all have different ways
    of getting around it, or not)

    And to make one final point, getting right back to the initial parts
    of the discussion, at the end of the day, your SATA disk, IDE disk,
    USB disk and the CF card in your camera are all mass storage devices -
    they all work in a fairly similar way. You want to mount filesystems
    from all of them, and when you run low level tools, like parted or
    whatever, you want them all to behave in the same way. If the kernel
    abstract away the nastinesses of talking SATA, IDE, USB mass storage,
    or CF - and hence, make them all behave in the same, standard, way,
    why the hell not?

    Thanks,

    --

    Julian Calaby

    Email: julian.calaby@gmail.com
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: What still uses the block layer?

    On Tue, 16 Oct 2007, Neil Brown wrote:

    > On Monday October 15, rob@landley.net wrote:
    >>> Therefore it is best to not have stable single-number naming schemes
    >>> for any devices on any machines. Why? Because it ensure there will
    >>> not be any second class citizens.

    >>
    >> This is where we disagree. The existence of devices you cannot stably
    >> enumerate does not eliminate the existence of devices you trivially can.

    >
    > No, but it dramatically reduces that value of being able to enumerate
    > those devices.


    this is the point of disagreement. the devices you can trivially enumerate
    can be handled easily and trivially, the ones that you can't may require
    more complex things to handle them, but that depends on the situation. If
    you only have one USB drive on a system you don't need to worry about what
    order USB hotplug events come in if you can just say 'the first USB
    drive'. mixing the different types of devices into one namespace
    complicates things in a couple of ways.

    1. devices that used to have stable names no longer have stable names
    without extra effort.

    2. having multiple seperate unstable namespaces with one name in each of
    them looks to the user like a stable namespace, since the instability
    never comes into play. combineing these into a single namespace looses
    this stability

    >>
    >> Pulling out the "IBM numa cluster with multiple SAS enclosures _and_ firewire"
    >> infrastructure to find the root partition on my hard drive may be good for
    >> the IBM numa clusters, but only at the expense of complicating this part of
    >> my laptop's infrastructure by an order of magnitude, and making embedded
    >> systems nearly impossible to put together. If "one size fits all" were true,
    >> my cell phone would be running Red Hat Enterprise.
    >>
    >>> If some devices that are even reasonably common (e.g. IDE drives) are
    >>> stable, then some application developers or system integrators will
    >>> work under the assumption of stability and whatever they build will
    >>> break when you try it on different hardware.

    >>
    >> So you break the IDE drives to get laptop users to debug the Niagra set? The

    >
    > Breaking old behaviour is always bad... My computers with IDE
    > interfaces still see stable "/dev/hda" devices. Are you saying the
    > devices that used to be "hda" are now "sdb" ?? Maybe there is a
    > .config option...


    yes, this changed. If you run your IDE drives with the PATA drivers of
    libata they show up as sdX, and are subject to the same detection order
    issues as any other sd device.

    >> solution is to make the easy cases hard?

    >
    > Is it really that hard?
    >
    >>> Note that stable names a still a very real option. udev provides
    >>> several. /dev/disk-by-path/XXX will be stable for lots of "screwed
    >>> in" devices. /dev/disk-by-id will be stable for devices the report a
    >>> unique id. etc.

    >>
    >> Here it's
    >>
    >> ls /dev/disk/by-path/
    >> pci-0000:00:1f.2-scsi-0:0:0:0 pci-0000:00:1f.2-scsi-0:0:0:0-part4
    >> pci-0000:00:1f.2-scsi-0:0:0:0-part1 pci-0000:00:1f.2-scsi-0:0:0:0-part5
    >> pci-0000:00:1f.2-scsi-0:0:0:0-part2 pci-0000:00:1f.2-scsi-0:0:0:0-part6
    >> pci-0000:00:1f.2-scsi-0:0:0:0-part3 pci-0000:00:1f.2-scsi-1:0:0:0
    >>
    >> And this is an improvement?

    >
    > Depends on your metric.
    >
    > "Easy to type" - I guess /dev/hda1 wins hands down.
    > "Can be used in a script or config file and is guaranteed always to
    > work until a screwdriver is used to change that device or it's
    > controller"
    > I think
    > /dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0-part1
    > is quite acceptable.
    > What is your metric?


    does it have to be one or the other? /dev/hda1 suceeded on both metrics.


    >>> The different between IDE, SATA, SCSI and even USB is peripheral for
    >>> the large majority of uses, and I think maintaining the distinction in
    >>> the major/minor number or in the "primary" /dev name is - for the
    >>> above reasons - more of a cost that a value.

    >>
    >> Is your definition of "the large majority of uses" where ncr Voyager, the
    >> Amiga, and current macintosh laptops are all one use each, or is your
    >> definition of "the large majority of uses" the one where each "use" is an
    >> installation, of which there are millions of PCs (and even more ARM cell
    >> phones), and something like three instances of Voyager?

    >
    > My definition of "the large majority or uses" is "mkfs, fsck, mount,
    > fdisk, system-install-process".
    >
    > Different people differentiate devices in different ways. A system
    > integrator might know about the hardware path. An end user might know
    > about drive brands or sizes. A casual user might just think "internal
    > or external". The kernel cannot support all these different
    > approaches to naming. It really is best if it uses arbitrary names,
    > and provides access to descriptions that the user can choose between.
    > udev facilitates this with links in /dev/disk/. A system install can
    > facilitate this even more by reporting size/manufacturer information etc.


    but is the possibility of wanting different options really sufficiant
    reason to eliminate every stable option? right now the /dev names are
    essentially random without external help. why couldn't they be stable (in
    all cases where that is possible) and let people who are happy with the
    defaults not run the external helpers, but leave them as options for
    people who do want things to be different.

    >>
    >> I realize that both views are valid. This is why the US has a house and a
    >> senate, and filters things through both views. My gripe is that forcing my
    >> laptop to look at my USB devices to find my SATA hard drive is aligned with
    >> only one of those viewpoints, and completely opposed to the other.

    >
    > I'm guessing you are talking about mount-by-uuid? This effectively has
    > to look at the filesystem of all devices to discover which one has the
    > correct UUID, though it can cache the information for efficiency.
    >
    > Maybe it is just an implementation issue. Suppose that everytime a
    > device were discovered, it were examined to see what was stored on it,
    > and this information was stored in a cache.
    > Then to find a particular filesystem to mount, you just look in the
    > cache and if the info isn't there yet, just wait or fail as
    > appropriate.
    > Then we don't "look at my USB devices to find my SATA hard drive" but
    > rather "look at each device as it is attached to find out what is in
    > it", which seems like a sensible thing to do...


    this would still require spinning up every drive and looking at it to find
    the UUID.

    >>
    >> An approach that makes things much easier on laptops is seen to hurt big iron,
    >> not because it the approach itself has a direct negative impact on big iron,
    >> but only because then laptops are not saddled with the problems of big iron.

    >
    > I think your "laptops vs big iron" contrast is making the gap seem
    > bigger than it really is. Naming issues are present in laptops and
    > easily get significant is modest servers.


    maby it's becouse I've been useing linux for so long (since before 1.0),
    but I have not been seeing the same thing, it's possible that none of the
    several hundred servers I've built and managed have been big enough to
    have the problems that you describe, but the recent 'fixes' for these
    problems have been more painful for me than the original problems.

    yes I have had kernel upgrades that changed the link order of drivers and
    I've had to deal with that, but I still have that problem today, with udev
    and friends involved. I recently was installing linux onto machines with
    multiple SCSI controllers and had all sorts of fun becouse the install
    disk detection order wasn't the same as the installed kernel detection
    order, causing the installer to decide teh wrong drive was the boot drive
    and put the boot loader in the wrong place (and this happened for multiple
    distros). To get things working I finally did the install, then dug up my
    old slackware boot disks to get into the system and manually install the
    boot loader to fix things up.

    I've also had problems with distro boot systems not working with labels
    becouse there were too many drives in the system and it gave up before
    checking far enough to find the root partition (on that machine the root
    partition was sdr2)

    >> Why do you allow uni-processor kernel builds then?

    >
    > Funny you should suggest that...
    > I don't think OpenSuSE10.3 includes any UP kernels. There is code in
    > the kernel which detects the single processor case and removes some
    > the more expense "LOCK" operations to reduce the cost of using an SMP
    > kernel on a UP computer.
    > There is real value in reducing the number of options, and people have
    > obviously put work into making that a cost-effective proposition.


    but there's a huge difference between a distro deciding to not include UP
    kernels and removing the option to build a UP kernel from the kernel
    entirely. Nobody is saying that Ubuntu (or any other distro) should be
    prohibited from makeing everything SMP, or i686, we are just saying that
    the option to compile something UP or i486 should not be removed just
    becouse distros don't choose to use them much. (has the i386 option been
    completely erradicated yet? or is it still hanging on)

    David Lang

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: What still uses the block layer?

    On Mon, 15 Oct 2007, Theodore Tso wrote:

    > On Mon, Oct 15, 2007 at 03:04:00AM -0500, Rob Landley wrote:
    >
    >>> just
    >>> as Ethernet and PPP interfaces really are fundamentally the same
    >>> thing.

    >>
    >> They're the same thing?
    >>
    >> Do you mean that on a system with both, going:
    >> ifconfig eth1 66.92.53.140
    >> ifconfig ppp 192.168.0.42
    >>
    >> Would be functionally equivalent to:
    >> ifconfig eth1 192.168.0.42
    >> ifconfig ppp 66.92.53.140

    >
    > No, of course not. But we don't have separate IP stacks for ethernet
    > and ppp devices. And how we connect to a host via ssh makes no
    > difference whether we accessed it via Ethernet or PPP. And I would
    > argue that how we address a filesystem should also make no difference
    > depending on the path to hard drive.


    I think a close analogy would be that after a partition is mounted you
    don't need to know the path to the hard drive, and that is already true
    today. when you mount a drive (or assign and IP address to a network
    interface) the path to the device not only matters, it's critical.

    >> By the way, ethernet cards contain a unique MAC address. Hard
    >> drives do not seem to, or if they do it's not being consistently
    >> exposed in a way I can find.

    >
    > You can pull a Model and Serial number via hdparm -i, but it's not as
    > easy to manipulate as a fixed-length MAC address. That's why people
    > tend to use filesystem UUID's.
    >
    >>> More to the point, with SATA, hot plugging has been designed in, so
    >>> probing order is not going to be well defined,

    >>
    >> The spec may define the capability to hotplug, but your average
    >> laptop doesn't not offer the capability to hotplug anything into its
    >> SATA controllers. The hard drive is screwed in (due to the
    >> portability part of laptopness), all the controllers wired onto the
    >> motherboard are accounted for, none are exposed externally. What
    >> _is_ exposed externally is USB, and if you want to add an extra hard
    >> drive you can buy a cheap USB one at Fry's.

    >
    > That may be true for laptops today, but Linux doesn't run just on
    > servers. You can easily get home servers with hot-swap SATA bays. My
    > home fileserver, which is a white box I purchased on my own nickel,
    > NOT IBM big iron, has 3TB of raw storage for less than $10,000 a year
    > ago. Today, that amount of home storage with hot-swap SATA drives and
    > a battery-backed hardware RAID controller could probably be purchased
    > for about half that price.


    I also have a 3TB raid I built at home, it uses 3ware cards and a dozen
    300G IDE drives. since the 3ware driver is classified as SCSI if a drive
    fails all the other drives get renumbered on the next boot and it's
    painful to figure out which drive has a problem. I have to reboot and go
    into the 3ware BIOS to figure out which drive isn't reporting. This system
    also has an adaptec raid card in it and an adaptec regular SCSI card. The
    fact that these three cards take different drivers, and so the order of
    detection changes the drive numbering is a real pain when I'm installing a
    new distro onto it. once I get it installed I compile my own monolithic
    kernel and this problem stops becouse the kernel linking order determins
    the detection order.

    this replaced a 1.2TB raid that I just about filled up, and then stared
    having drive failures due to age on. It used 8 160G IDE drives, and when I
    had problems with a drive it was easy to see that /dev/hdk was missing
    from the set, and I was still able to have a removable drive bay for
    /dev/hdc that I could hook my tivo drive into (on a reboot for safety) and
    not have things go haywire if I left the bay empty (or switched off) when
    I booted.

    this may not be hundreds of drives, but it should be enough to show that I
    have experianced the pain that some people claim is the reason all of this
    must be dynamic with a userspace helper to sort it all out. My take is
    that adding the userspace helper and not enumerating things that are easy
    to enumerate is making things worse, not better.

    > And even for laptops, if you need the performance, you can get Cardbus
    > cards that will allow you to connect eSATA drives to your laptop at
    > Fry's.
    >
    > So even if you ignore "big data center" interconnects like FC, this
    > problem exists even for commodity grade SATA devices.


    but these are seperate SATA buses, while you could run into ordering
    issues if you hook multiple devices to one bus, you should be able to have
    no ordering issues if you don't have more then one device of a type on any
    one bus (you could have a SATA hard drive on the internal PCI controller,
    and another one of the Cardbus controller, but if you always order
    directly connected devices before cardbus connected devices they will
    always show up in the same order)

    >> It's necessary for IBM big iron to do this. It's generally not
    >> necessary for laptops or embedded systems to do this if they
    >> distinguish between _types_ of devices, which is something they
    >> until recently did for the types of devices I was interested in, and
    >> something they _stopped_ doing when everything got merged into the
    >> scsi layer, and I consider this a regression.

    >
    > As another example, it's easy to see a home media server running Linux
    > which doesn't have any expansion bays for additional hard drive --- so
    > the only way a user could expand their storage is by using one or more
    > permanently connected USB disks. So we do need to solve the general
    > device enumeration problem in the general case; it's not just the case
    > of IBM "big iron" as you seem to think.


    there are two seperate problems here.

    1. how to enumerate devices that have a repeatable, stable address.

    2. how to enumerate devices that do not.

    nobody is saying that there are no cases of #2 and that there is no need
    to address that problem, what I, and I think others are saying is that the
    solutions to #2 are not perfect, and while they are a reasonable fit for
    that case, they are in many ways inferior to simple enumeration for
    devices in catagory #1

    >> No, distinguishing between types of devices is not a perfect
    >> solution to device enumeration, but it was sufficient for all my use
    >> cases for many years, and would still be if the kernel still did it,
    >> and I'm not alone here.

    >
    > News flash! The kernel wasn't built just for you, and over time, more
    > and more people will have multiple disk drives of the same type, so we
    > will need to solve the device naming problem sooner or later. Why not
    > solve it sooner, especially given that a number of companies (not just
    > IBM) are funding the organization that is paying *your* salary are
    > interested in solving the general case?


    the kernel wasn't just built for people who have dozens or hundreds of
    devices on busses that make enumeration impossible either, why should
    their requirements be the only ones considered?

    (by the way, I think the crack about who is paying Rob's salary is a
    little below the belt)

    > Furthermore, I've already pointed a number of situations where the
    > home user might have multiple USB devices on their system today, and
    > that is probably going to go up over time, not down. Have you seen
    > how cheap 500GB USB disks are at Costco? And for a typical
    > unsophisticated user, plugging in another 500G USB disk when they need
    > more storage is a lot easier than cracking open the computer case and
    > futzing with screws and disk cables and power connectors.


    so let USB devices use 'best guess' nameing and let other devices use
    names based on their fixed addresses/hardware paths.

    you could use the suggestion made by Stefan Richter in Message-ID:
    <47139F15.7050702@s5r6.in-berlin.de> that lets the driver suggest a name
    if the system hasn't choosen to override it. Since distros look for
    /dev/sd* it should even be able to work without breaking new installs (the
    transition would break existing installs, so it would need to be optional)

    David Lang

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: What still uses the block layer?

    On Mon, 15 Oct 2007, Greg KH wrote:

    > On Mon, Oct 15, 2007 at 05:08:36AM -0500, Rob Landley wrote:
    >> On Monday 15 October 2007 4:06:20 am Julian Calaby wrote:
    >>> On 10/15/07, Rob Landley wrote:
    >>>> I note that the eth0 and eth1 names are dynamically assigned on a first
    >>>> come first serve basis (like scsi). This never causes me a problem
    >>>> because the driver loading order is constant, and once you figure out
    >>>> that eth0 is gigabit and eth1 is the 80211g it _stays_ that way across
    >>>> reboots, reliably. Yeah, it's a heuristic. Hands up everybody relying on
    >>>> such a heuristic in the real world.
    >>>
    >>> Umm, not quite, from my experiences with pre-production wireless
    >>> drivers, (another story, another time) fancy stuff is being done in
    >>> udev to make sure that your gigabit card is always assigned to eth0.

    >>
    >> I remember building a 2.4 kernel, statically linking in all the drivers, and
    >> getting the ethernet devices showing up in a reliable order for years. Where
    >> does the need for fancy stuff come in?

    >
    > Because PCI devices reorder their bus numbers all the time. And we have
    > ethernet devices hanging off of USB connections now (yes, even built-in
    > to the machine), and we have network connections on other hot-pluggable
    > busses (remember, PCI is hot pluggable.)


    do PCI devices reorder their bus numbers spontaniously, or only if you
    change the hardware?

    > So, the distros need to name network devices in a persistant way, that
    > is why the distros now do this. If you don't like the distro doing it,
    > complain to them, it's not a kernel issue


    I have, at least the response was to tell me how to kill this 'feature'
    even if they won't change it.

    David Lang

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: What still uses the block layer?

    On Mon, 15 Oct 2007, Stefan Richter wrote:

    > Subject: Re: What still uses the block layer?
    >
    > Matthew Wilcox wrote:
    >> On Mon, Oct 15, 2007 at 04:26:04AM -0500, Rob Landley wrote:
    >>> Combining USB and IDE into the same /dev/sd? namespace makes enumerating the
    >>> IDE devices much harder than in the traditional "/dev/hdb doesn't move
    >>> without a screwdriver" model. The merger creates a new problem for IDE, one
    >>> which didn't exist before: the addition or removal of other unrelated types
    >>> of devices may change this device's location next boot. It may be possible
    >>> to add additional complication to the system to compensate, but what was the
    >>> advantage of merging the namespaces in the first place?

    >>
    >> It's not something anyone particularly set out to do, it's just how
    >> it worked out. It was justified by saying "ok, this goes from a 99%
    >> solution to a 96% solution, but there's 100% solution called uuids".
    >> I don't particularly agree with this line of argumentation, but it did
    >> hold sway.

    >
    > Low-level networking drivers suggest a default interface name (per
    > interface or as a template like eth%d into which the networking core
    > inserts a lowest spare number). Userspace can rename interfaces, but
    > nevertheless it's nice to have different default kernel names for
    > ethernet, wlan etc..
    >
    > Could low-level SCSI drivers provide similar name templates which give a
    > hint on the transport involved? It's a bit more difficult as with
    > networking interfaces though because
    > - SCSI devices can have sd, sr, st, osst, ch, sg interfaces,
    > - SCSI device files share a namespace with all other device files.
    >
    > E.g.
    > /dev/sd-ide-b - second IDE HDD,
    > /dev/sd-iscsi-e - fifth iSCSI direct access device,
    > /dev/sr-sata-0 - first SATA CD-ROM,
    > /dev/sr-usb-0 - a USB CD-ROM,
    > /dev/st-fw-0 - a FireWire tape drive,
    > /dev/sda - a device whose transport driver didn't propose a name
    >
    > Of course the really interesting names will still be provided by
    > udev-generated symlinks.


    this is a nice option, and since most of the existing userspace code is
    looking for /dev/sd*, /dev/sr*, etc this should be able to work for new
    installs with no userspace changes. Since it would break existing installs
    it would need to be optional.

    one other option that could be considered (and I do realize I'm bringing
    up flame-bait here) is that drivers that have fixed addresses could offer
    up a device name that include that address.
    i.e. depending on the config option a device could show up as either sda,
    sd-scsi-a, sd-scsi-0:0:0:0, or even sd-scsi-

    if the driver or bus doesn't have a real numbering, it wouldn't invent a
    fake one (which is a big problem with most of the prior suggestions that
    have tried to offer a numbering option), it would just offer the most
    specific information it has.

    David Lang
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: OOM killer gripe (was Re: What still uses the block layer?)

    Nick Piggin writes:

    > On Monday 15 October 2007 18:04, Rob Landley wrote:
    >> On Sunday 14 October 2007 8:45:03 pm Theodore Tso wrote:

    >
    >> > > excuse for conflating different categories of devices in the first
    >> > > place.
    >> >
    >> > See the thinkpad Ultrabay drive example above.

    >>
    >> Last week I drove my laptop so deep into swap (with a "make -j" on qemu)
    >> that after half an hour trying to repaint my kmail window, it locked solid.
    >> Again. You'd think the oom killer would come to the rescue, but it didn't.
    >> Maybe Ubuntu disabled it. I have _2_gigs_ of ram in this sucker, on a
    >> stock Ubuntu 7.04 install (with the "upgrade all" tab pressed a few times),
    >> and yet I managed to make it swap itself to death one more time.
    >>
    >> Virtual memory isn't perfect. I've _always_ been able to come up with
    >> examples where it just doesn't work for me. This doesn't mean VM
    >> overcommit should be abolished, because it's useful more often than not.

    >
    > I hate to go completely offtopic here, but disks are so incredibly
    > slow when compared to RAM that there is really nothing the kernel
    > can do about this. Presumably the job will finish, given infinite
    > time.
    >
    > How much swap do you have configured? You really shouldn't configure
    > so much unless you do want the kernel to actually use it all, right?


    No.

    There are three basic swapping scenarios.
    - Pushing unused data out of ram
    - Swapping
    - Thrashing

    To effectively swap you need SWAP > RAM because after a little while of
    swapping all of your pages in RAM should be assigned a location in the
    page cache.

    I have not heard of many people swapping and not thrashing lately.
    I think part of the problem is that we do random access to the swap
    partition which makes us seek limited. And since the number of
    seeks per unit time has been increasing at a linear or slower rate
    that if we are doing random disk I/O then the amount we can use
    the disk for is very limited. I wonder if we could figure out
    how to push and pull 1M or bigger chunks into and out of swap?

    I don't know if swap has actually worked since we vmscan stopped
    going over the virtual addresses.

    > Because if we're not really conservative about OOM killing, then the
    > user who actually really did want to use all the swap they configured
    > gets angry when we kill their jobs without using it all.


    I totally agree. The fact that the OOM killer started is a sign that
    the system was completely overwhelmed and nothing better could happen.

    In this case my gut feel says limiting the total number of processes
    would have been much more effective then anything at all to do with
    swap. make -j reminds me of the classic fork bomb.

    > Would an oom-kill-someone-now sysrq be of help, I wonder?


    Well we have SAQ which should kill everything on your current VT
    which should include X and all of it's children.

    Eric
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: OOM killer gripe (was Re: What still uses the block layer?)

    On Mon, 15 Oct 2007, Eric W. Biederman wrote:

    > Nick Piggin writes:
    >
    >> How much swap do you have configured? You really shouldn't configure
    >> so much unless you do want the kernel to actually use it all, right?

    >
    > No.
    >
    > There are three basic swapping scenarios.
    > - Pushing unused data out of ram
    > - Swapping
    > - Thrashing
    >
    > To effectively swap you need SWAP > RAM because after a little while of
    > swapping all of your pages in RAM should be assigned a location in the
    > page cache.


    on some kernel versions you are correct about needing swap > ram, but on
    current versions you are not. the swap space gets allocated as needed, and
    re-used as needed (I don't know the mechanism of this, but I remember the
    last time this changed from vm=max(ram,swap) to vm=ram+swap)

    > I have not heard of many people swapping and not thrashing lately.
    > I think part of the problem is that we do random access to the swap
    > partition which makes us seek limited. And since the number of
    > seeks per unit time has been increasing at a linear or slower rate
    > that if we are doing random disk I/O then the amount we can use
    > the disk for is very limited. I wonder if we could figure out
    > how to push and pull 1M or bigger chunks into and out of swap?


    it has been noted by many people that linux is very slow to pull things
    back into ram from swap, significantly slower then simple seed limiting
    would seem to account for.

    Davdi Lang
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: What still uses the block layer?

    On Mon, Oct 15, 2007 at 07:54:22PM -0700, david@lang.hm wrote:
    > do PCI devices reorder their bus numbers spontaniously, or only if you
    > change the hardware?


    The only system I've had that reordered PCI bus numbers was when I had a
    partitionable system and changed the partitioning. Not quite "change
    the hardware", but neither was it "spontaneous". It was certainly
    unexpected (for me).

    Greg probably has quite different examples.

    --
    Intel are signing my paycheques ... these opinions are still mine
    "Bill, look, we understand that you're interested in selling us this
    operating system, but compare it to ours. We can't possibly take such
    a retrograde step."
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: OOM killer gripe (was Re: What still uses the block layer?)

    On Tuesday 16 October 2007 13:55, Eric W. Biederman wrote:
    > Nick Piggin writes:


    > > How much swap do you have configured? You really shouldn't configure
    > > so much unless you do want the kernel to actually use it all, right?

    >
    > No.
    >
    > There are three basic swapping scenarios.
    > - Pushing unused data out of ram
    > - Swapping
    > - Thrashing
    >
    > To effectively swap you need SWAP > RAM because after a little while of
    > swapping all of your pages in RAM should be assigned a location in the
    > page cache.


    I don't follow your logic. We don't need SWAP > RAM in order to swap
    effectively, IMO.


    > I have not heard of many people swapping and not thrashing lately.
    > I think part of the problem is that we do random access to the swap
    > partition which makes us seek limited. And since the number of
    > seeks per unit time has been increasing at a linear or slower rate
    > that if we are doing random disk I/O then the amount we can use


    I don't know if there is a causal relationship there. I mean, I
    think it's been a long time since thrashing was ever a viable mode
    of operation, right?

    Maybe desktops just have less need for swapping now, so nobody sees
    it much until something goes _really_ bad. When I'm using my 256MB
    machine, unused stuff goes to swap.


    > the disk for is very limited. I wonder if we could figure out
    > how to push and pull 1M or bigger chunks into and out of swap?


    Pulling in 1MB pages can really easily end up compounding the
    thrashing problem unless you're very sure a significant amount
    of it will be used.


    > I don't know if swap has actually worked since we vmscan stopped
    > going over the virtual addresses.


    I do, and it does


    > > Because if we're not really conservative about OOM killing, then the
    > > user who actually really did want to use all the swap they configured
    > > gets angry when we kill their jobs without using it all.

    >
    > I totally agree. The fact that the OOM killer started is a sign that
    > the system was completely overwhelmed and nothing better could happen.
    >
    > In this case my gut feel says limiting the total number of processes
    > would have been much more effective then anything at all to do with
    > swap. make -j reminds me of the classic fork bomb.


    Yep.


    > > Would an oom-kill-someone-now sysrq be of help, I wonder?

    >
    > Well we have SAQ which should kill everything on your current VT
    > which should include X and all of it's children.


    Which is exactly what you don't want to do if you've just forkbombed
    yourself. I missed the fact that we now have a manual oom kill...
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: What still uses the block layer?

    On Mon, 15 Oct 2007 22:04:01 -0600
    Matthew Wilcox wrote:

    > On Mon, Oct 15, 2007 at 07:54:22PM -0700, david@lang.hm wrote:
    > > do PCI devices reorder their bus numbers spontaniously, or only if
    > > you change the hardware?

    >
    > The only system I've had that reordered PCI bus numbers was when I
    > had a partitionable system and changed the partitioning. Not quite
    > "change the hardware", but neither was it "spontaneous". It was
    > certainly unexpected (for me).
    >


    a very common one is booting your laptop docked (a real dock, not just
    a port extender) versus non-docked....
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 3 of 5 FirstFirst 1 2 3 4 5 LastLast