Rearranging layout of code in the scheduler - Kernel

This is a discussion on Rearranging layout of code in the scheduler - Kernel ; Hello, Before I dive in, I should probably justify my motivations for writing this email. I'm working away on implementing an EDF scheduler for real time tasks in the kernel. This again leads to hacking at the existing source as ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: Rearranging layout of code in the scheduler

  1. Rearranging layout of code in the scheduler

    Hello,

    Before I dive in, I should probably justify my motivations for writing
    this email. I'm working away on implementing an EDF scheduler for real
    time tasks in the kernel. This again leads to hacking at the existing
    source as I'm not about to toss out the entire scheduler - just replace
    (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
    EDF, I think the answer to that is a bit too long (and not appropriate
    for this email anyway) so I'll leave that part out.

    However, what I do mean to discuss is the current state of the scheduler
    code. Working through the code, I must say I'm really impressed. The
    code is clean, it is well thought out and the new approach with
    sched_class and sched_entity makes it very modular. However, digging
    deeper, I find myself turning more and more desperate, looking over my
    shoulder for a way out.

    Now, I'm in no doubt that the code *is* modular, that it *is* clean and
    tidy, but coming from outside, it is not that easy to grasp it all. And,
    it is not just the sheer size and complexity of the scheduler itself,
    but also a lot with how the code is arranged.

    For instance, functions like free_fair_sched_group,
    alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
    sched.c. The same goes for several rt-functions and structs.

    So, if one drew up a list over all events that would cause functions in
    sched.c to be called, this could be used to make a minimized 'interface'
    and then let the scheduler call the appropriate function for the given
    class in the same fashion sched_tick is used today.

    What I would like, is to rip out all the *actual* scheduling logic and
    put this in sched_[fair|rt].c and let sched be purely event-driven
    (which would be a nice design goal in itself). This would also lead to
    sched_[fair|rt].h, where the structs, macros, defines etc can be
    defined. Today these are defined in just about everywhere, making the
    code unnecessary complicated (I'm not going to say messy since I'm not
    *that* senior to kernel coding :-))

    Why not use the sched_class for all that it's worth - make the different
    classes implement a set of functions and let sched.c be oblivious to
    what's going on other than turning the machinery around?

    Is this something worth pursuing? I mean, the scheduler *does* work, and
    if it ain't broken, don't fix it. However, I have a strong feeling that
    this can be done a lot cleaner, not to mention, changing from one type
    of scheduler to another will be much easier. :-)

    --
    med Vennlig Hilsen - Yours Sincerely
    Henrik Austad

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQBJBzDz/8Nu36TQ45ERAnp9AJ99jyqNIIGeBp/8JX8Z+w7KX3q6aACglSj1
    OwbpcRNsf2TYBl6gFnYjUbk=
    =grto
    -----END PGP SIGNATURE-----


  2. Re: Rearranging layout of code in the scheduler

    On Tue, 2008-10-28 at 16:34 +0100, Henrik Austad wrote:
    > Hello,
    >
    > Before I dive in, I should probably justify my motivations for writing
    > this email. I'm working away on implementing an EDF scheduler for real
    > time tasks in the kernel. This again leads to hacking at the existing
    > source as I'm not about to toss out the entire scheduler - just replace
    > (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
    > EDF, I think the answer to that is a bit too long (and not appropriate
    > for this email anyway) so I'll leave that part out.


    You and a few other folks. The most interesting part of EDF is not the
    actual scheduler itself (although there are fun issues with that as
    well), but extending the Priority Inheritance framework to deal with all
    the fun cases that come with EDF.

    > However, what I do mean to discuss is the current state of the scheduler
    > code. Working through the code, I must say I'm really impressed. The
    > code is clean, it is well thought out and the new approach with
    > sched_class and sched_entity makes it very modular. However, digging
    > deeper, I find myself turning more and more desperate, looking over my
    > shoulder for a way out.
    >
    > Now, I'm in no doubt that the code *is* modular, that it *is* clean and
    > tidy, but coming from outside, it is not that easy to grasp it all. And,
    > it is not just the sheer size and complexity of the scheduler itself,
    > but also a lot with how the code is arranged.
    >
    > For instance, functions like free_fair_sched_group,
    > alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
    > sched.c. The same goes for several rt-functions and structs.
    >
    > So, if one drew up a list over all events that would cause functions in
    > sched.c to be called, this could be used to make a minimized 'interface'
    > and then let the scheduler call the appropriate function for the given
    > class in the same fashion sched_tick is used today.


    I'd start out small by moving the functions to the right file. After
    that you could look at providing methods in the sched_class.

    > What I would like, is to rip out all the *actual* scheduling logic and
    > put this in sched_[fair|rt].c and let sched be purely event-driven
    > (which would be a nice design goal in itself). This would also lead to
    > sched_[fair|rt].h, where the structs, macros, defines etc can be
    > defined. Today these are defined in just about everywhere, making the
    > code unnecessary complicated (I'm not going to say messy since I'm not
    > *that* senior to kernel coding :-))


    You might need to be careful there, or introduce sched_(fair|rt).h for
    those.

    > Why not use the sched_class for all that it's worth - make the different
    > classes implement a set of functions and let sched.c be oblivious to
    > what's going on other than turning the machinery around?


    Sounds good, its been on the agenda for a while, but nobody ever got
    around to it.

    Other cleanups that can be done are:
    - get rid of all the load balance iterator stuff and move
    that all into sched_fair

    - extract the common sched_entity members and create:

    struct {
    struct sched_entity_common common;
    union {
    struct sched_entity fair;
    struct sched_rt_entity rt;
    }
    }

    > Is this something worth pursuing? I mean, the scheduler *does* work, and
    > if it ain't broken, don't fix it. However, I have a strong feeling that
    > this can be done a lot cleaner, not to mention, changing from one type
    > of scheduler to another will be much easier. :-)


    Well, adding a sched_class, no need to replace anything besides that.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: Rearranging layout of code in the scheduler

    On Tuesday 28 October 2008 17:50:27 Peter Zijlstra wrote:
    > On Tue, 2008-10-28 at 16:34 +0100, Henrik Austad wrote:
    > > Hello,
    > >
    > > Before I dive in, I should probably justify my motivations for writing
    > > this email. I'm working away on implementing an EDF scheduler for real
    > > time tasks in the kernel. This again leads to hacking at the existing
    > > source as I'm not about to toss out the entire scheduler - just replace
    > > (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
    > > EDF, I think the answer to that is a bit too long (and not appropriate
    > > for this email anyway) so I'll leave that part out.

    >
    > You and a few other folks. The most interesting part of EDF is not the
    > actual scheduler itself (although there are fun issues with that as
    > well), but extending the Priority Inheritance framework to deal with all
    > the fun cases that come with EDF.


    well, yes, you trade in priority inversion with deadline inversion. Probably a
    few other exotic issues as well :-)

    > > However, what I do mean to discuss is the current state of the scheduler
    > > code. Working through the code, I must say I'm really impressed. The
    > > code is clean, it is well thought out and the new approach with
    > > sched_class and sched_entity makes it very modular. However, digging
    > > deeper, I find myself turning more and more desperate, looking over my
    > > shoulder for a way out.
    > >
    > > Now, I'm in no doubt that the code *is* modular, that it *is* clean and
    > > tidy, but coming from outside, it is not that easy to grasp it all. And,
    > > it is not just the sheer size and complexity of the scheduler itself,
    > > but also a lot with how the code is arranged.
    > >
    > > For instance, functions like free_fair_sched_group,
    > > alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in
    > > sched.c. The same goes for several rt-functions and structs.
    > >
    > > So, if one drew up a list over all events that would cause functions in
    > > sched.c to be called, this could be used to make a minimized 'interface'
    > > and then let the scheduler call the appropriate function for the given
    > > class in the same fashion sched_tick is used today.

    >
    > I'd start out small by moving the functions to the right file. After
    > that you could look at providing methods in the sched_class.


    Yes, I was thinking something along those lines, slowly moving things out. I
    just wanted to clear the air before I got started in case someone had some
    real issues with this approach. After all, if it's never going to get merged,
    there's no point doing it.

    > > What I would like, is to rip out all the *actual* scheduling logic and
    > > put this in sched_[fair|rt].c and let sched be purely event-driven
    > > (which would be a nice design goal in itself). This would also lead to
    > > sched_[fair|rt].h, where the structs, macros, defines etc can be
    > > defined. Today these are defined in just about everywhere, making the
    > > code unnecessary complicated (I'm not going to say messy since I'm not
    > > *that* senior to kernel coding :-))

    >
    > You might need to be careful there, or introduce sched_(fair|rt).h for
    > those.


    This is a final goal, the light in the end of the tunnel if you like.

    > > Why not use the sched_class for all that it's worth - make the different
    > > classes implement a set of functions and let sched.c be oblivious to
    > > what's going on other than turning the machinery around?

    >
    > Sounds good, its been on the agenda for a while, but nobody ever got
    > around to it.
    >
    > Other cleanups that can be done are:
    > - get rid of all the load balance iterator stuff and move
    > that all into sched_fair


    Ah, yes. I can give that a go

    > - extract the common sched_entity members and create:
    >
    > struct {
    > struct sched_entity_common common;
    > union {
    > struct sched_entity fair;
    > struct sched_rt_entity rt;
    > }
    > }
    >
    > > Is this something worth pursuing? I mean, the scheduler *does* work, and
    > > if it ain't broken, don't fix it. However, I have a strong feeling that
    > > this can be done a lot cleaner, not to mention, changing from one type
    > > of scheduler to another will be much easier. :-)

    >
    > Well, adding a sched_class, no need to replace anything besides that.


    I wasn't really aiming at replacing anything (besides the rt-stuff, but as
    said earlier, that's not the issuen ow). I just want to move things to make
    it a bit clearer.

    --
    med Vennlig Hilsen - Yours Sincerely
    Henrik Austad

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQBJB2ru/8Nu36TQ45ERAhYSAJ0QZ5VGiWEpdYg9Py7kXNZp5OMm5ACfQEw u
    1798x+HqgChj59mkZZy2mQE=
    =KzFT
    -----END PGP SIGNATURE-----


  4. Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Thu, 2008-10-30 at 17:49 +0100, faggioli@gandalf.sssup.it wrote:
    > Quoting Peter Zijlstra :
    > >> Before I dive in, I should probably justify my motivations for writing
    > >> this email. I'm working away on implementing an EDF scheduler for real
    > >> time tasks in the kernel. This again leads to hacking at the existing
    > >> source as I'm not about to toss out the entire scheduler - just replace
    > >> (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
    > >> EDF, I think the answer to that is a bit too long (and not appropriate
    > >> for this email anyway) so I'll leave that part out.

    > Well, I understand that, but it could be interesting... At least to me. :-)
    >
    > > You and a few other folks.

    > Yes, here we are! :-)
    >
    > We also have some code, but it still is highly experimental and we are
    > deeply rearranging it.
    >
    > > The most interesting part of EDF is not the
    > > actual scheduler itself (although there are fun issues with that as
    > > well), but extending the Priority Inheritance framework to deal with all
    > > the fun cases that come with EDF.

    > The main problem is that, especially to deal with SMP systems, we also
    > need to investigate theoretical issues and find out what the best
    > approach could be.
    >
    > > Well, adding a sched_class, no need to replace anything besides that.
    > >

    > I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
    > rearranging, I also only wonder why replacing fixed priority RT
    > scheduling with EDF.
    >
    > I think they both could be in... Maybe we can discuss on where, I mean
    > on which position, in the linked list of scheduling classes, to put
    > each of them.


    Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
    end up with the following classes

    hedf - hard EDF
    sedf - soft EDF (bounded tardiness)
    fifo/rr - the current static priority scheduler
    fair - the current proportional fair scheduler
    idle - the idle scheduler

    The two edf classes must share some state, so that the sedf class knows
    about the utilisation consumed by hedf, and the main difference between
    these two classes is the schedulability test.

    [ NOTE: read EDF as any deadline scheduler, so it might as well be
    pfair PD^2 scheduler. ]

    The few problems this gives are things like kstopmachine and the
    migration threads, which should run at the max priority available on the
    system.

    [ NOTE: although possibly we could make an exception for the migration
    threads, as we generally don't need to migrate running RT tasks]

    Perhaps we can introduce another class on top of hedf which will run
    just these two tasks and is not exposed to userspace (yes, I understand
    it will ruin just about any schedulability analysis).

    Which leaves us with the big issue of priority inversion ;-)

    We can do deadline inheritance and bandwidth inheritance by changing
    plist to a rb-tree/binary heap and mapping the static priority levels
    somewhere at the back and also propagating the actual task responsible
    for the boost down the chain (so as to be able to do bandwidth
    inheritance).

    >From what I gather the sssup folks are doing that, although they

    reported that DI between disjoint schedule domains (partitions) posed an
    interesting problem.

    Personally I'd like to see the full priority inversion issue solved by
    something like the proxy execution protocol, however the SMP extension
    thereof seems to be a tad expensive - found a book on graph theory, all
    that remains is finding time to read it :-)

    The advantage of proxy execution is that its fully invariant to the
    schedule function and thus even works for proportional fair schedulers
    and any kind of scheduler hierarchy.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Thu, 2008-10-30 at 18:17 +0100, Peter Zijlstra wrote:

    > Personally I'd like to see the full priority inversion issue solved by
    > something like the proxy execution protocol, however the SMP extension
    > thereof seems to be a tad expensive - found a book on graph theory, all
    > that remains is finding time to read it :-)
    >
    > The advantage of proxy execution is that its fully invariant to the
    > schedule function and thus even works for proportional fair schedulers
    > and any kind of scheduler hierarchy.


    Basically the problem that I'm looking at is:

    Given a directed acyclic graph G, with entry nodes E (those who don't
    have an inbound edge) and exit nodes X (those without an outbound edge)
    then given an exit node x of X, split the graph into G1 and G2 so that
    G1 contains x and all paths leading to it.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: Rearranging layout of code in the scheduler

    Quoting Peter Zijlstra :
    >> Before I dive in, I should probably justify my motivations for writing
    >> this email. I'm working away on implementing an EDF scheduler for real
    >> time tasks in the kernel. This again leads to hacking at the existing
    >> source as I'm not about to toss out the entire scheduler - just replace
    >> (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at
    >> EDF, I think the answer to that is a bit too long (and not appropriate
    >> for this email anyway) so I'll leave that part out.

    Well, I understand that, but it could be interesting... At least to me. :-)

    > You and a few other folks.

    Yes, here we are! :-)

    We also have some code, but it still is highly experimental and we are
    deeply rearranging it.

    > The most interesting part of EDF is not the
    > actual scheduler itself (although there are fun issues with that as
    > well), but extending the Priority Inheritance framework to deal with all
    > the fun cases that come with EDF.

    The main problem is that, especially to deal with SMP systems, we also
    need to investigate theoretical issues and find out what the best
    approach could be.

    > Well, adding a sched_class, no need to replace anything besides that.
    >

    I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
    rearranging, I also only wonder why replacing fixed priority RT
    scheduling with EDF.

    I think they both could be in... Maybe we can discuss on where, I mean
    on which position, in the linked list of scheduling classes, to put
    each of them.

    Regards,
    Dario

    Thanks for the Cc. Peter, I also added Fabio and Michael that, you
    know, are working to this thing. :-)

    ----------------------------------------------------------------
    This message was sent using IMP, the Internet Messaging Program.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Thursday 30 October 2008 18:17:14 Peter Zijlstra wrote:
    > On Thu, 2008-10-30 at 17:49 +0100, faggioli@gandalf.sssup.it wrote:
    > > Quoting Peter Zijlstra :
    > > >> Before I dive in, I should probably justify my motivations for writing
    > > >> this email. I'm working away on implementing an EDF scheduler for real
    > > >> time tasks in the kernel. This again leads to hacking at the existing
    > > >> source as I'm not about to toss out the entire scheduler - just
    > > >> replace (by some Kconfig switch) the RR/FIFO classes. As to why I'm
    > > >> looking at EDF, I think the answer to that is a bit too long (and not
    > > >> appropriate for this email anyway) so I'll leave that part out.

    > >
    > > Well, I understand that, but it could be interesting... At least to me.
    > > :-)


    ok, simply put:
    * give each task a relative deadline (will probably introduce a new syscall,
    please don't shoot me).
    * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
    rel_deadline.
    * insert task in rq, sorted by abs_deadline
    * pick leftmost task and run it
    * when task is done, pick next task

    that's it.

    Then, of course, you have to add all the logic to make the thing work

    > >
    > > > You and a few other folks.

    > >
    > > Yes, here we are! :-)
    > >
    > > We also have some code, but it still is highly experimental and we are
    > > deeply rearranging it.


    I have a very clear idea about *what* the scheduler should do, the problem is
    how to get it to work :-)

    Things would be a lot easier if code in the scheduler was a bit 'more
    separated. I have started separating things and moving it to separate files.
    I'll ship off a few patches for comments if it's interesting?

    > >
    > > > The most interesting part of EDF is not the
    > > > actual scheduler itself (although there are fun issues with that as
    > > > well), but extending the Priority Inheritance framework to deal with
    > > > all the fun cases that come with EDF.


    Well, I find EDF intersting because it is so blissfully simple. :-)

    > > The main problem is that, especially to deal with SMP systems, we also
    > > need to investigate theoretical issues and find out what the best
    > > approach could be.


    Yes, well, EDF is not optimal for SMP systems, only for single core. However,
    you can do a pretty good attempt by assigning tasks to cores in a greedy
    fashion (simply put the next task at the CPU with the lowest load).

    As a further optimization, I guess you could do the whole sced-domain thing to
    minimize the search space.

    > > > Well, adding a sched_class, no need to replace anything besides that.

    > >
    > > I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
    > > rearranging, I also only wonder why replacing fixed priority RT
    > > scheduling with EDF.
    > >
    > > I think they both could be in... Maybe we can discuss on where, I mean
    > > on which position, in the linked list of scheduling classes, to put
    > > each of them.


    No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
    you absolutely require both, you should at least separate them on a per-core
    basis. If you mix them, they need to be aware of the other in order to make
    the right descision, and that is not good.

    > Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
    > end up with the following classes
    >
    > hedf - hard EDF
    > sedf - soft EDF (bounded tardiness)
    > fifo/rr - the current static priority scheduler
    > fair - the current proportional fair scheduler
    > idle - the idle scheduler
    >
    > The two edf classes must share some state, so that the sedf class knows
    > about the utilisation consumed by hedf, and the main difference between
    > these two classes is the schedulability test.


    Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF? Having
    fifo and edf together will complicate things. And, people looking for edf,
    will not use fifo/rr anyway (famous last words).

    Furthermore, hard/firm/soft could be treated as one class, but it should be
    treated differently at missed deadlines.
    * Soft: nothing. Scheduled at best effort, when deadline passes, prioritize
    other tasks to avoid cascading effect of deadlinemissies
    * Firm: keep some statistics that the user can modify and a possible
    event-handler when limits are exceeded
    * Hard: immediatly call a user registred function, or send signal to notify
    task.

    > [ NOTE: read EDF as any deadline scheduler, so it might as well be
    > pfair PD^2 scheduler. ]


    Well, the nice thing about EDF, is that it is provable optimal for any
    feasible schedule, IOW, if all the tasks are schedulable, you can have 100%
    utilization of the CPU.

    On a multicore, EDF is not optimal, as rearranging the tasks becomes NP-hard
    (basically a knapsack problem) for several backpacks :-)

    Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
    Basically you want to add the deadline to the tasks, put it in a sorted list
    and pick the leftmost task every time untill it completes.

    > The few problems this gives are things like kstopmachine and the
    > migration threads, which should run at the max priority available on the
    > system.
    >
    > [ NOTE: although possibly we could make an exception for the migration
    > threads, as we generally don't need to migrate running RT tasks]
    >
    > Perhaps we can introduce another class on top of hedf which will run
    > just these two tasks and is not exposed to userspace (yes, I understand
    > it will ruin just about any schedulability analysis).
    >
    > Which leaves us with the big issue of priority inversion ;-)


    Couldn't above idea solve a bit of this? I have some papers on deadline
    inheritance laying aorund somewhere, I can have a look at that, I think it
    was a fairly elegant solution to some of these issues there.

    >
    > We can do deadline inheritance and bandwidth inheritance by changing
    > plist to a rb-tree/binary heap and mapping the static priority levels
    > somewhere at the back and also propagating the actual task responsible
    > for the boost down the chain (so as to be able to do bandwidth
    > inheritance).


    IMHO, you are complicating things unnecessary. EDF is simple. Why not go for a
    *very* basic system, not offer things that will be very difficult to
    guarantee.

    > >From what I gather the sssup folks are doing that, although they

    >
    > reported that DI between disjoint schedule domains (partitions) posed an
    > interesting problem.
    >
    > Personally I'd like to see the full priority inversion issue solved by
    > something like the proxy execution protocol, however the SMP extension
    > thereof seems to be a tad expensive - found a book on graph theory, all
    > that remains is finding time to read it :-)
    >
    > The advantage of proxy execution is that its fully invariant to the
    > schedule function and thus even works for proportional fair schedulers
    > and any kind of scheduler hierarchy.




    --
    -> henrik
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Thu, 2008-10-30 at 22:44 +0100, Henrik Austad wrote:
    > On Thursday 30 October 2008 18:17:14 Peter Zijlstra wrote:
    > > On Thu, 2008-10-30 at 17:49 +0100, faggioli@gandalf.sssup.it wrote:
    > > > Quoting Peter Zijlstra :
    > > > >> Before I dive in, I should probably justify my motivations for writing
    > > > >> this email. I'm working away on implementing an EDF scheduler for real
    > > > >> time tasks in the kernel. This again leads to hacking at the existing
    > > > >> source as I'm not about to toss out the entire scheduler - just
    > > > >> replace (by some Kconfig switch) the RR/FIFO classes. As to why I'm
    > > > >> looking at EDF, I think the answer to that is a bit too long (and not
    > > > >> appropriate for this email anyway) so I'll leave that part out.
    > > >
    > > > Well, I understand that, but it could be interesting... At least to me.
    > > > :-)

    >
    > ok, simply put:
    > * give each task a relative deadline (will probably introduce a new syscall,
    > please don't shoot me).


    We call that sys_sched_setscheduler2().

    > * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
    > rel_deadline.
    > * insert task in rq, sorted by abs_deadline
    > * pick leftmost task and run it
    > * when task is done, pick next task
    >
    > that's it.
    >
    > Then, of course, you have to add all the logic to make the thing work


    Well, yes, I know how to do EDF, and its trivially simple - on UP.
    Deadline scheduling on SMP otoh is not.

    > > >
    > > > > You and a few other folks.
    > > >
    > > > Yes, here we are! :-)
    > > >
    > > > We also have some code, but it still is highly experimental and we are
    > > > deeply rearranging it.

    >
    > I have a very clear idea about *what* the scheduler should do, the problem is
    > how to get it to work :-)
    >
    > Things would be a lot easier if code in the scheduler was a bit 'more
    > separated. I have started separating things and moving it to separate files.
    > I'll ship off a few patches for comments if it's interesting?


    Sure, but implementing an EDF class isn't really all that hard - esp if
    you only want UP.

    The real fun is in the PI stuff and schedulability tests on SMP.

    > > >
    > > > > The most interesting part of EDF is not the
    > > > > actual scheduler itself (although there are fun issues with that as
    > > > > well), but extending the Priority Inheritance framework to deal with
    > > > > all the fun cases that come with EDF.

    >
    > Well, I find EDF intersting because it is so blissfully simple. :-)


    Yes it is, until you do SMP :-)

    > > > The main problem is that, especially to deal with SMP systems, we also
    > > > need to investigate theoretical issues and find out what the best
    > > > approach could be.

    >
    > Yes, well, EDF is not optimal for SMP systems, only for single core. However,
    > you can do a pretty good attempt by assigning tasks to cores in a greedy
    > fashion (simply put the next task at the CPU with the lowest load).
    >
    > As a further optimization, I guess you could do the whole sced-domain thing to
    > minimize the search space.


    The problem with greedy binpacking heuristics is that your schedulablity
    test are out the window, making the whole thing useless.

    > > > > Well, adding a sched_class, no need to replace anything besides that.
    > > >
    > > > I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
    > > > rearranging, I also only wonder why replacing fixed priority RT
    > > > scheduling with EDF.
    > > >
    > > > I think they both could be in... Maybe we can discuss on where, I mean
    > > > on which position, in the linked list of scheduling classes, to put
    > > > each of them.

    >
    > No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
    > you absolutely require both, you should at least separate them on a per-core
    > basis. If you mix them, they need to be aware of the other in order to make
    > the right descision, and that is not good.


    We _have_ to have both. Its that simple.

    POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of
    userspace that uses it, we cannot just replace it with a deadline
    scheduler.

    Also, start a linux-rt kernel and look at all the RT threads you have
    (and that's only going to get worse when we go to threads per interrupt
    handler instead of threads per interrupt source).

    Who is going to assign useful and meaningful periods, deadlines and
    execution times to all those threads?

    So the only other option is to add a deadline scheduler and allow people
    to use it.

    When you do that, you'll see that you have to add it above the FIFO/RR
    class because otherwise your schedulability is out the window again.

    > > Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
    > > end up with the following classes
    > >
    > > hedf - hard EDF
    > > sedf - soft EDF (bounded tardiness)
    > > fifo/rr - the current static priority scheduler
    > > fair - the current proportional fair scheduler
    > > idle - the idle scheduler
    > >
    > > The two edf classes must share some state, so that the sedf class knows
    > > about the utilisation consumed by hedf, and the main difference between
    > > these two classes is the schedulability test.

    >
    > Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF? Having
    > fifo and edf together will complicate things. And, people looking for edf,
    > will not use fifo/rr anyway (famous last words).


    errr, not so, see above. FIFO/RR are static priority schedulers, whereas
    deadline schedulers are job-static or dynamic priority schedulers. You
    cannot simply map FIFO onto them, you need more information and who is
    going to provide that?

    Also, with the above mentioned requirement that we must have FIFO/RR,
    there really is no choice.

    > Furthermore, hard/firm/soft could be treated as one class, but it should be
    > treated differently at missed deadlines.
    > * Soft: nothing. Scheduled at best effort, when deadline passes, prioritize
    > other tasks to avoid cascading effect of deadlinemissies
    > * Firm: keep some statistics that the user can modify and a possible
    > event-handler when limits are exceeded
    > * Hard: immediatly call a user registred function, or send signal to notify
    > task.


    Thing is, you have to run hard tasks first, and scheduler weaker forms
    in its slack time, otherwise you cannot guarantee anything.

    And the easiest way to do that slack time stealing, is with such a
    hierarchy. Sure, you can share a lot of code etc. but you need separate
    classes.

    > > [ NOTE: read EDF as any deadline scheduler, so it might as well be
    > > pfair PD^2 scheduler. ]

    >
    > Well, the nice thing about EDF, is that it is provable optimal for any
    > feasible schedule, IOW, if all the tasks are schedulable, you can have 100%
    > utilization of the CPU.


    On UP - which is not interesting on a general purpose kernel that runs
    on machines with up to 4096 CPUs.

    > On a multicore, EDF is not optimal, as rearranging the tasks becomes NP-hard
    > (basically a knapsack problem) for several backpacks :-)


    That is partitioned EDF or PEDF for short. There are many other EDF
    variants who do not suffer this, for example global EDF (GEDF).

    GEDF can guarantee a utilization bound of 50% for hard-rt and is soft-rt
    up to 100%.

    Then there are the pfair class of scheduling algorithms which can
    theoretically yield up to 100% utilization on SMP systems.

    > Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
    > Basically you want to add the deadline to the tasks, put it in a sorted list
    > and pick the leftmost task every time untill it completes.


    Sure, and all that is useless without schedulability tests.

    > > The few problems this gives are things like kstopmachine and the
    > > migration threads, which should run at the max priority available on the
    > > system.
    > >
    > > [ NOTE: although possibly we could make an exception for the migration
    > > threads, as we generally don't need to migrate running RT tasks]
    > >
    > > Perhaps we can introduce another class on top of hedf which will run
    > > just these two tasks and is not exposed to userspace (yes, I understand
    > > it will ruin just about any schedulability analysis).
    > >
    > > Which leaves us with the big issue of priority inversion ;-)

    >
    > Couldn't above idea solve a bit of this? I have some papers on deadline
    > inheritance laying aorund somewhere, I can have a look at that, I think it
    > was a fairly elegant solution to some of these issues there.


    Yes, its brilliant, on UP :-)

    But it should work on SMP as well with a bit more effort. But you really
    need bandwidth inheritance as well, which is yet another bit of effort.

    But the Pisa folks mentioned some issues wrt partitioned EDF; Michael,
    was that because the deadline clock wasn't synchronized or something? -
    Could you explain that again, my brain seems to have mis-filed it.

    This is relevant, because even if you do GEDF or better, linux allows
    you to break your system into partitions using cpusets.

    > > We can do deadline inheritance and bandwidth inheritance by changing
    > > plist to a rb-tree/binary heap and mapping the static priority levels
    > > somewhere at the back and also propagating the actual task responsible
    > > for the boost down the chain (so as to be able to do bandwidth
    > > inheritance).

    >
    > IMHO, you are complicating things unnecessary. EDF is simple. Why not go for a
    > *very* basic system, not offer things that will be very difficult to
    > guarantee.


    If you want a system that can guarantee anything, you must solve the
    priority inversion issue, and for deadline scheduling that means both DI
    and BI, even if you do PEDF.

    Because even if you do not allow your tasks to migrate, there will be
    shared resources (if not in userspace, then at least in the kernel), and
    as long as you have that....


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Fri, Oct 31, 2008 at 10:03:52AM +0100, Peter Zijlstra wrote:
    > On Thu, 2008-10-30 at 22:44 +0100, Henrik Austad wrote:
    > > On Thursday 30 October 2008 18:17:14 Peter Zijlstra wrote:
    > > > On Thu, 2008-10-30 at 17:49 +0100, faggioli@gandalf.sssup.it wrote:
    > > > > Quoting Peter Zijlstra :
    > > > > >> Before I dive in, I should probably justify my motivations for writing
    > > > > >> this email. I'm working away on implementing an EDF scheduler for real
    > > > > >> time tasks in the kernel. This again leads to hacking at the existing
    > > > > >> source as I'm not about to toss out the entire scheduler - just
    > > > > >> replace (by some Kconfig switch) the RR/FIFO classes. As to why I'm
    > > > > >> looking at EDF, I think the answer to that is a bit too long (and not
    > > > > >> appropriate for this email anyway) so I'll leave that part out.
    > > > >
    > > > > Well, I understand that, but it could be interesting... At least to me.
    > > > > :-)

    > >
    > > ok, simply put:
    > > * give each task a relative deadline (will probably introduce a new syscall,
    > > please don't shoot me).

    >
    > We call that sys_sched_setscheduler2().


    Ah, ok, I thought introducing new syscalls was *really* frowned upon.

    >
    > > * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
    > > rel_deadline.
    > > * insert task in rq, sorted by abs_deadline
    > > * pick leftmost task and run it
    > > * when task is done, pick next task
    > >
    > > that's it.
    > >
    > > Then, of course, you have to add all the logic to make the thing work

    >
    > Well, yes, I know how to do EDF, and its trivially simple - on UP.
    > Deadline scheduling on SMP otoh is not.


    True.

    > > > >
    > > > > > You and a few other folks.
    > > > >
    > > > > Yes, here we are! :-)
    > > > >
    > > > > We also have some code, but it still is highly experimental and we are
    > > > > deeply rearranging it.

    > >
    > > I have a very clear idea about *what* the scheduler should do, the problem is
    > > how to get it to work :-)
    > >
    > > Things would be a lot easier if code in the scheduler was a bit 'more
    > > separated. I have started separating things and moving it to separate files.
    > > I'll ship off a few patches for comments if it's interesting?

    >
    > Sure, but implementing an EDF class isn't really all that hard - esp if
    > you only want UP.
    >
    > The real fun is in the PI stuff and schedulability tests on SMP.


    As a start, that is the approach at least I would like to take. Once you have a
    proven, functional EDF on a single core, you can extend that to handle several
    cores, if you really want to.

    > > > > > The most interesting part of EDF is not the
    > > > > > actual scheduler itself (although there are fun issues with that as
    > > > > > well), but extending the Priority Inheritance framework to deal with
    > > > > > all the fun cases that come with EDF.

    > >
    > > Well, I find EDF intersting because it is so blissfully simple. :-)

    >
    > Yes it is, until you do SMP :-)


    Well, I have a simple solution to that to

    > > > > The main problem is that, especially to deal with SMP systems, we also
    > > > > need to investigate theoretical issues and find out what the best
    > > > > approach could be.

    > >
    > > Yes, well, EDF is not optimal for SMP systems, only for single core. However,
    > > you can do a pretty good attempt by assigning tasks to cores in a greedy
    > > fashion (simply put the next task at the CPU with the lowest load).
    > >
    > > As a further optimization, I guess you could do the whole sced-domain thing to
    > > minimize the search space.

    >
    > The problem with greedy binpacking heuristics is that your schedulablity
    > test are out the window, making the whole thing useless.


    Well, not really. I mean, to be optimal, you should also consider WCET, but
    then, that's really not interesting as IMHO that's the userspace-programmer's
    responsibility. If the user wants to add tasks that sum up to 210% utilization,
    it's really not much we can do anyway. You certainly wouldn't want the kernel to
    stop accepting new jobs.

    So, keep the kernel logic as simple as possible and move the job to the user.
    By keeping the kernel logic simple - we make the job easier for the end-users. A
    very complex EDF-scheduler will make the testing very difficult.

    If, on the other hand, we *know* that the scheduler is not optimal, but that it
    behaves in a predictable manner, the end users have a simpler task of finding
    out why something bad happened.

    Because, no matter *what* you do, and *how* you implement it, with *whatever*
    features, there will be cases when things fall apart, and having a simple,
    predictable scheduler will be necessary to figure it out.

    > > > > > Well, adding a sched_class, no need to replace anything besides that.
    > > > >
    > > > > I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
    > > > > rearranging, I also only wonder why replacing fixed priority RT
    > > > > scheduling with EDF.
    > > > >
    > > > > I think they both could be in... Maybe we can discuss on where, I mean
    > > > > on which position, in the linked list of scheduling classes, to put
    > > > > each of them.

    > >
    > > No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
    > > you absolutely require both, you should at least separate them on a per-core
    > > basis. If you mix them, they need to be aware of the other in order to make
    > > the right descision, and that is not good.

    >
    > We _have_ to have both. Its that simple.


    No, we do not. Or, at least not at the same time (see below)

    > POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of userspace that
    > uses it, we cannot just replace it with a deadline scheduler.


    I didn't mean to rip the whole fifo/rr out of the kernel, but adding a switch at
    compile-time so that you could choose *either* normal, static RT *or* EDF. Then
    we could, at least for the first few versions, have it depend on !SMP to avoid
    the whole SMP-non-optimal-mess.

    > Also, start a linux-rt kernel and look at all the RT threads you have (and
    > that's only going to get worse when we go to threads per interrupt handler
    > instead of threads per interrupt source).


    yes, that would certainly be a problem. But, if we can configure in either EDF
    or FIFO/RR, adding this to the kernel-rt-threads should be possible.

    > Who is going to assign useful and meaningful periods, deadlines and execution
    > times to all those threads?


    well, periods is not that difficult. Basically you treat everything as
    asynchronus events and put them in the runqueue when they become runnable. And
    the application itself should set a relative deadline via the syscall to tell
    the kernel "when I become runnable, I must finish before time_now +
    rel_deadline"

    RT-tasks that runs forever will certainly be a problem, but what about something
    like mark it as soft/firm and for every time a deadline is missed, have the
    trigger function set a new deadline? It would mean you have to rewrite a whole
    lot of apps though.

    > So the only other option is to add a deadline scheduler and allow people to
    > use it.
    >
    > When you do that, you'll see that you have to add it above the FIFO/RR class
    > because otherwise your schedulability is out the window again.
    >
    > > > Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
    > > > end up with the following classes
    > > >
    > > > hedf - hard EDF sedf - soft EDF (bounded tardiness) fifo/rr - the
    > > > current static priority scheduler fair - the current proportional fair
    > > > scheduler idle - the idle scheduler
    > > >
    > > > The two edf classes must share some state, so that the sedf class knows
    > > > about the utilisation consumed by hedf, and the main difference between
    > > > these two classes is the schedulability test.

    > >
    > > Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF? Having
    > > fifo and edf together will complicate things. And, people looking for edf,
    > > will not use fifo/rr anyway (famous last words).

    >
    > errr, not so, see above. FIFO/RR are static priority schedulers, whereas
    > deadline schedulers are job-static or dynamic priority schedulers. You
    > cannot simply map FIFO onto them, you need more information and who is
    > going to provide that?


    I'm not going to map anything anywhere. EDF is very different from priority
    based scheduling and we should try to map something back (as priorities,
    strictly speaking, are a set of deadlines analyzed and mapped to priorities
    sothat the deadlines are met).

    > Also, with the above mentioned requirement that we must have FIFO/RR,
    > there really is no choice.
    >
    > > Furthermore, hard/firm/soft could be treated as one class, but it should be
    > > treated differently at missed deadlines.
    > > * Soft: nothing. Scheduled at best effort, when deadline passes, prioritize
    > > other tasks to avoid cascading effect of deadlinemissies
    > > * Firm: keep some statistics that the user can modify and a possible
    > > event-handler when limits are exceeded
    > > * Hard: immediatly call a user registred function, or send signal to notify
    > > task.

    >
    > Thing is, you have to run hard tasks first, and scheduler weaker forms
    > in its slack time, otherwise you cannot guarantee anything.


    Well, then you suddenly introduce priorities to the deadlines, and that is not
    good. A hard task is not more important than a soft, but the effect of missing
    the deadline is. If the schedule is infeasible, it really doesn't matter what
    you do, as you will miss deadlines, and if you prioritize hard tasks, you will
    end up starving firm and soft

    Before you go on and tell me how wrong I am, note that I don't disagree with
    you, I think choosing hrt before the others, is the best solution from an
    implementation point of view.

    > And the easiest way to do that slack time stealing, is with such a
    > hierarchy. Sure, you can share a lot of code etc. but you need separate
    > classes.
    >
    > > > [ NOTE: read EDF as any deadline scheduler, so it might as well be
    > > > pfair PD^2 scheduler. ]

    > >
    > > Well, the nice thing about EDF, is that it is provable optimal for any
    > > feasible schedule, IOW, if all the tasks are schedulable, you can have 100%
    > > utilization of the CPU.

    >
    > On UP - which is not interesting on a general purpose kernel that runs
    > on machines with up to 4096 CPUs.


    But, and pardon my ignorance, will an EDF-scheduler be intersting for such a
    large system? From what I've gathered, small systems are the ones that could
    benefit from an EDF as you can analyze and predict behaviour, and then, since
    EDF is optimal, tune the CPU-freq down and still know that things will work.

    Some embedded people can probably provide a lot better input here than me, as
    this is just a general idea I snapped up 'somewhere' (where somewhere is an
    element of the set of all places I've been the last 6 months).

    > > On a multicore, EDF is not optimal, as rearranging the tasks becomes NP-hard
    > > (basically a knapsack problem) for several backpacks :-)

    >
    > That is partitioned EDF or PEDF for short. There are many other EDF
    > variants who do not suffer this, for example global EDF (GEDF).
    >
    > GEDF can guarantee a utilization bound of 50% for hard-rt and is soft-rt
    > up to 100%.
    >
    > Then there are the pfair class of scheduling algorithms which can
    > theoretically yield up to 100% utilization on SMP systems.


    Do you know about any pratical attempts on this, and what kind of result that
    produced?

    >
    > > Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
    > > Basically you want to add the deadline to the tasks, put it in a sorted list
    > > and pick the leftmost task every time untill it completes.

    >
    > Sure, and all that is useless without schedulability tests.


    Yes, but should the kernel do the schedulability test? Or should the ball be
    passed on to userspace? To analyze the schedulability, you would need the worst
    case execution time (WCET) of the process, and if the kernel/scheduler should
    start trying to estimate that...

    So, as a start, why not just 'ignore' WCET in the first versions, and that can
    be added later on, if necessary.

    > > > The few problems this gives are things like kstopmachine and the
    > > > migration threads, which should run at the max priority available on the
    > > > system.
    > > >
    > > > [ NOTE: although possibly we could make an exception for the migration
    > > > threads, as we generally don't need to migrate running RT tasks]
    > > >
    > > > Perhaps we can introduce another class on top of hedf which will run
    > > > just these two tasks and is not exposed to userspace (yes, I understand
    > > > it will ruin just about any schedulability analysis).


    Or, have the task run with minimal deadline, and then set to TASK_INTERRUPTIBLE,
    use a timer to awaken it again in a short time, run it, yield and so on. Make a
    periodic tick that will preempt whatever task that's running?


    > > >
    > > > Which leaves us with the big issue of priority inversion ;-)

    > >
    > > Couldn't above idea solve a bit of this? I have some papers on deadline
    > > inheritance laying aorund somewhere, I can have a look at that, I think it
    > > was a fairly elegant solution to some of these issues there.

    >
    > Yes, its brilliant, on UP :-)
    >
    > But it should work on SMP as well with a bit more effort. But you really
    > need bandwidth inheritance as well, which is yet another bit of effort.
    >
    > But the Pisa folks mentioned some issues wrt partitioned EDF; Michael,
    > was that because the deadline clock wasn't synchronized or something? -
    > Could you explain that again, my brain seems to have mis-filed it.
    >
    > This is relevant, because even if you do GEDF or better, linux allows
    > you to break your system into partitions using cpusets.
    >
    > > > We can do deadline inheritance and bandwidth inheritance by changing
    > > > plist to a rb-tree/binary heap and mapping the static priority levels
    > > > somewhere at the back and also propagating the actual task responsible
    > > > for the boost down the chain (so as to be able to do bandwidth
    > > > inheritance).

    > >
    > > IMHO, you are complicating things unnecessary. EDF is simple. Why not go for a
    > > *very* basic system, not offer things that will be very difficult to
    > > guarantee.

    >
    > If you want a system that can guarantee anything, you must solve the
    > priority inversion issue, and for deadline scheduling that means both DI
    > and BI, even if you do PEDF.
    >
    > Because even if you do not allow your tasks to migrate, there will be
    > shared resources (if not in userspace, then at least in the kernel), and
    > as long as you have that....


    A lot of good points, and I certainly see your side of it. However (and yes, I
    have to argue a bit more ), I don't think an EDF-scheduler should contain a
    lot of features.

    If you want to use the EDF, why not give the user a list of consequenses like
    - Only a single core
    - No other RT-scheduler, if other userspace program breaks, so be it, the user
    has been warned.
    - Best effort only
    - Provide handlers for a given set of signals that will be sent to any
    application missing a deadline
    - no cpu-scaling
    - ... keep going, basically strip away every piece of dynamic behaviour and
    complex scheduling code

    henrik
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Fri, 2008-10-31 at 13:09 +0100, Henrik Austad wrote:

    > Ah, ok, I thought introducing new syscalls was *really* frowned upon.


    We prefer not to, but sometimes there just isn't any other option.
    If we want to extend struct sched_param, we need 2 new syscalls.

    > > Sure, but implementing an EDF class isn't really all that hard - esp if
    > > you only want UP.
    > >
    > > The real fun is in the PI stuff and schedulability tests on SMP.

    >
    > As a start, that is the approach at least I would like to take. Once you have a
    > proven, functional EDF on a single core, you can extend that to handle several
    > cores, if you really want to.


    Well, you're of course free to do so, but I don't think its a very
    interesting thing to do.

    > > > > > The main problem is that, especially to deal with SMP systems, we also
    > > > > > need to investigate theoretical issues and find out what the best
    > > > > > approach could be.
    > > >
    > > > Yes, well, EDF is not optimal for SMP systems, only for single core. However,
    > > > you can do a pretty good attempt by assigning tasks to cores in a greedy
    > > > fashion (simply put the next task at the CPU with the lowest load).
    > > >
    > > > As a further optimization, I guess you could do the whole sced-domain thing to
    > > > minimize the search space.

    > >
    > > The problem with greedy binpacking heuristics is that your schedulablity
    > > test are out the window, making the whole thing useless.

    >
    > Well, not really. I mean, to be optimal, you should also consider WCET, but
    > then, that's really not interesting as IMHO that's the userspace-programmer's
    > responsibility. If the user wants to add tasks that sum up to 210% utilization,
    > it's really not much we can do anyway. You certainly wouldn't want the kernel to
    > stop accepting new jobs.
    >
    > So, keep the kernel logic as simple as possible and move the job to the user.
    > By keeping the kernel logic simple - we make the job easier for the end-users. A
    > very complex EDF-scheduler will make the testing very difficult.
    >
    > If, on the other hand, we *know* that the scheduler is not optimal, but that it
    > behaves in a predictable manner, the end users have a simpler task of finding
    > out why something bad happened.
    >
    > Because, no matter *what* you do, and *how* you implement it, with *whatever*
    > features, there will be cases when things fall apart, and having a simple,
    > predictable scheduler will be necessary to figure it out.


    I agree that the scheduler should be simple, and even something like
    PD^2 is relatively simple.

    But I disagree that we should not do schedulability tests. Doing those,
    and esp. enforcing tasks to their given limits increases the QoS for
    others in the presence of faulty/malicious tasks.

    Also, WCET is still the users responsibility.

    If for each deadline task you specify a period, a deadline and a budget.
    Then the WCET computation is reflected in the budget.

    By enforcing the schedulability test and execution budget you raise the
    quality of service, because even in the presence of a mis-behaving task
    only that task will be impacted. The other tasks will still meet their
    deadlines.

    > > > No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If
    > > > you absolutely require both, you should at least separate them on a per-core
    > > > basis. If you mix them, they need to be aware of the other in order to make
    > > > the right descision, and that is not good.

    > >
    > > We _have_ to have both. Its that simple.

    >
    > No, we do not. Or, at least not at the same time (see below)
    >
    > > POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of userspace that
    > > uses it, we cannot just replace it with a deadline scheduler.

    >
    > I didn't mean to rip the whole fifo/rr out of the kernel, but adding a switch at
    > compile-time so that you could choose *either* normal, static RT *or* EDF. Then
    > we could, at least for the first few versions, have it depend on !SMP to avoid
    > the whole SMP-non-optimal-mess.


    But _why_? why not leave FIFO/RR in? There is absolutely no downside to
    keeping it around.

    > > Thing is, you have to run hard tasks first, and scheduler weaker forms
    > > in its slack time, otherwise you cannot guarantee anything.

    >
    > Well, then you suddenly introduce priorities to the deadlines, and that is not
    > good. A hard task is not more important than a soft, but the effect of missing
    > the deadline is. If the schedule is infeasible, it really doesn't matter what
    > you do, as you will miss deadlines, and if you prioritize hard tasks, you will
    > end up starving firm and soft
    >
    > Before you go on and tell me how wrong I am, note that I don't disagree with
    > you, I think choosing hrt before the others, is the best solution from an
    > implementation point of view.


    This is, if you make the soft-deadline class aware of the hard-deadline
    class's tasks and schedulability contraints, then you can keep the
    soft-rt class schedulable too.

    So srt is in no way less important, its just has less restrictions on
    the schedule, therefore we can run it in the hrt slack/idle time.

    And adding the schedulability test in the kernel avoids these starvation
    issues, because you just cannot.

    > > On UP - which is not interesting on a general purpose kernel that runs
    > > on machines with up to 4096 CPUs.

    >
    > But, and pardon my ignorance, will an EDF-scheduler be intersting for such a
    > large system? From what I've gathered, small systems are the ones that could
    > benefit from an EDF as you can analyze and predict behaviour, and then, since
    > EDF is optimal, tune the CPU-freq down and still know that things will work.
    >
    > Some embedded people can probably provide a lot better input here than me, as
    > this is just a general idea I snapped up 'somewhere' (where somewhere is an
    > element of the set of all places I've been the last 6 months).


    Not that large indeed, but people are interested in running RT workloads
    on machines in the 32/64 scale.

    And even the embedded folks are now staring quad core arm11 chips in the
    face, wondering how to do things.

    > > Then there are the pfair class of scheduling algorithms which can
    > > theoretically yield up to 100% utilization on SMP systems.

    >
    > Do you know about any pratical attempts on this, and what kind of result that
    > produced?


    Fairly decent, http://www.cs.unc.edu/~anderson/papers/rtss08b.pdf

    > > > Besides that, EDF is the simplest, most brain-dead scheduler you can imagine.
    > > > Basically you want to add the deadline to the tasks, put it in a sorted list
    > > > and pick the leftmost task every time untill it completes.

    > >
    > > Sure, and all that is useless without schedulability tests.

    >
    > Yes, but should the kernel do the schedulability test? Or should the ball be
    > passed on to userspace? To analyze the schedulability, you would need the worst
    > case execution time (WCET) of the process, and if the kernel/scheduler should
    > start trying to estimate that...
    >
    > So, as a start, why not just 'ignore' WCET in the first versions, and that can
    > be added later on, if necessary.


    Like said above, WCET is represented in the execution budget.

    > A lot of good points, and I certainly see your side of it. However (and yes, I
    > have to argue a bit more ), I don't think an EDF-scheduler should contain a
    > lot of features.
    >
    > If you want to use the EDF, why not give the user a list of consequenses like
    > - Only a single core


    There won't be a single core machine left soon ;-)

    > - No other RT-scheduler, if other userspace program breaks, so be it, the user
    > has been warned.


    That's a no go, and I don't see why you would need that.

    > - Best effort only


    That's pretty useless imho. Best-effort and RT are a bit contradictory.

    > - Provide handlers for a given set of signals that will be sent to any
    > application missing a deadline


    Yeah, the idea was to send SIGXCPU to tasks who exceed their budget (and
    thus will miss their deadline).

    > - no cpu-scaling
    > - ... keep going, basically strip away every piece of dynamic behaviour and
    > complex scheduling code


    I'm thinking there's little useful left after all that ;-)
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Friday 31 October 2008 14:30:08 Peter Zijlstra wrote:
    > On Fri, 2008-10-31 at 13:09 +0100, Henrik Austad wrote:
    > > Ah, ok, I thought introducing new syscalls was *really* frowned upon.

    >
    > We prefer not to, but sometimes there just isn't any other option.
    > If we want to extend struct sched_param, we need 2 new syscalls.
    >
    > > > Sure, but implementing an EDF class isn't really all that hard - esp if
    > > > you only want UP.
    > > >
    > > > The real fun is in the PI stuff and schedulability tests on SMP.

    > >
    > > As a start, that is the approach at least I would like to take. Once you
    > > have a proven, functional EDF on a single core, you can extend that to
    > > handle several cores, if you really want to.

    >
    > Well, you're of course free to do so, but I don't think its a very
    > interesting thing to do.
    >
    > > > > > > The main problem is that, especially to deal with SMP systems, we
    > > > > > > also need to investigate theoretical issues and find out what the
    > > > > > > best approach could be.
    > > > >
    > > > > Yes, well, EDF is not optimal for SMP systems, only for single core.
    > > > > However, you can do a pretty good attempt by assigning tasks to cores
    > > > > in a greedy fashion (simply put the next task at the CPU with the
    > > > > lowest load).
    > > > >
    > > > > As a further optimization, I guess you could do the whole sced-domain
    > > > > thing to minimize the search space.
    > > >
    > > > The problem with greedy binpacking heuristics is that your
    > > > schedulablity test are out the window, making the whole thing useless.

    > >
    > > Well, not really. I mean, to be optimal, you should also consider WCET,
    > > but then, that's really not interesting as IMHO that's the
    > > userspace-programmer's responsibility. If the user wants to add tasks
    > > that sum up to 210% utilization, it's really not much we can do anyway.
    > > You certainly wouldn't want the kernel to stop accepting new jobs.
    > >
    > > So, keep the kernel logic as simple as possible and move the job to the
    > > user. By keeping the kernel logic simple - we make the job easier for the
    > > end-users. A very complex EDF-scheduler will make the testing very
    > > difficult.
    > >
    > > If, on the other hand, we *know* that the scheduler is not optimal, but
    > > that it behaves in a predictable manner, the end users have a simpler
    > > task of finding out why something bad happened.
    > >
    > > Because, no matter *what* you do, and *how* you implement it, with
    > > *whatever* features, there will be cases when things fall apart, and
    > > having a simple, predictable scheduler will be necessary to figure it
    > > out.

    >
    > I agree that the scheduler should be simple, and even something like
    > PD^2 is relatively simple.
    >
    > But I disagree that we should not do schedulability tests. Doing those,
    > and esp. enforcing tasks to their given limits increases the QoS for
    > others in the presence of faulty/malicious tasks.
    >
    > Also, WCET is still the users responsibility.
    >
    > If for each deadline task you specify a period, a deadline and a budget.
    > Then the WCET computation is reflected in the budget.
    >
    > By enforcing the schedulability test and execution budget you raise the
    > quality of service, because even in the presence of a mis-behaving task
    > only that task will be impacted. The other tasks will still meet their
    > deadlines.


    Ah, ok. I see.

    >
    > > > > No. You should have *either* FIFO/RR *or* EDF, not both at the same
    > > > > time. If you absolutely require both, you should at least separate
    > > > > them on a per-core basis. If you mix them, they need to be aware of
    > > > > the other in order to make the right descision, and that is not good.
    > > >
    > > > We _have_ to have both. Its that simple.

    > >
    > > No, we do not. Or, at least not at the same time (see below)
    > >
    > > > POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of
    > > > userspace that uses it, we cannot just replace it with a deadline
    > > > scheduler.

    > >
    > > I didn't mean to rip the whole fifo/rr out of the kernel, but adding a
    > > switch at compile-time so that you could choose *either* normal, static
    > > RT *or* EDF. Then we could, at least for the first few versions, have it
    > > depend on !SMP to avoid the whole SMP-non-optimal-mess.

    >
    > But _why_? why not leave FIFO/RR in? There is absolutely no downside to
    > keeping it around.


    My motivation was the increased complexity in meeting deadlines etc. By having
    only EDF (or only RR/FIFO) things would be a lot simpler. Life is not simple
    it seems :-)

    >
    > > > Thing is, you have to run hard tasks first, and scheduler weaker forms
    > > > in its slack time, otherwise you cannot guarantee anything.

    > >
    > > Well, then you suddenly introduce priorities to the deadlines, and that
    > > is not good. A hard task is not more important than a soft, but the
    > > effect of missing the deadline is. If the schedule is infeasible, it
    > > really doesn't matter what you do, as you will miss deadlines, and if you
    > > prioritize hard tasks, you will end up starving firm and soft
    > >
    > > Before you go on and tell me how wrong I am, note that I don't disagree
    > > with you, I think choosing hrt before the others, is the best solution
    > > from an implementation point of view.

    >
    > This is, if you make the soft-deadline class aware of the hard-deadline
    > class's tasks and schedulability contraints, then you can keep the
    > soft-rt class schedulable too.
    >
    > So srt is in no way less important, its just has less restrictions on
    > the schedule, therefore we can run it in the hrt slack/idle time.
    >
    > And adding the schedulability test in the kernel avoids these starvation
    > issues, because you just cannot.
    >
    > > > On UP - which is not interesting on a general purpose kernel that runs
    > > > on machines with up to 4096 CPUs.

    > >
    > > But, and pardon my ignorance, will an EDF-scheduler be intersting for
    > > such a large system? From what I've gathered, small systems are the ones
    > > that could benefit from an EDF as you can analyze and predict behaviour,
    > > and then, since EDF is optimal, tune the CPU-freq down and still know
    > > that things will work.
    > >
    > > Some embedded people can probably provide a lot better input here than
    > > me, as this is just a general idea I snapped up 'somewhere' (where
    > > somewhere is an element of the set of all places I've been the last 6
    > > months).

    >
    > Not that large indeed, but people are interested in running RT workloads
    > on machines in the 32/64 scale.
    >
    > And even the embedded folks are now staring quad core arm11 chips in the
    > face, wondering how to do things.


    Hmm, I must admit, that is a very good point.

    >
    > > > Then there are the pfair class of scheduling algorithms which can
    > > > theoretically yield up to 100% utilization on SMP systems.

    > >
    > > Do you know about any pratical attempts on this, and what kind of result
    > > that produced?

    >
    > Fairly decent, http://www.cs.unc.edu/~anderson/papers/rtss08b.pdf


    Thanks!

    >
    > > > > Besides that, EDF is the simplest, most brain-dead scheduler you can
    > > > > imagine. Basically you want to add the deadline to the tasks, put it
    > > > > in a sorted list and pick the leftmost task every time untill it
    > > > > completes.
    > > >
    > > > Sure, and all that is useless without schedulability tests.

    > >
    > > Yes, but should the kernel do the schedulability test? Or should the ball
    > > be passed on to userspace? To analyze the schedulability, you would need
    > > the worst case execution time (WCET) of the process, and if the
    > > kernel/scheduler should start trying to estimate that...
    > >
    > > So, as a start, why not just 'ignore' WCET in the first versions, and
    > > that can be added later on, if necessary.

    >
    > Like said above, WCET is represented in the execution budget.
    >
    > > A lot of good points, and I certainly see your side of it. However (and
    > > yes, I have to argue a bit more ), I don't think an EDF-scheduler
    > > should contain a lot of features.
    > >
    > > If you want to use the EDF, why not give the user a list of consequenses
    > > like - Only a single core

    >
    > There won't be a single core machine left soon ;-)
    >
    > > - No other RT-scheduler, if other userspace program breaks, so be it, the
    > > user has been warned.

    >
    > That's a no go, and I don't see why you would need that.
    >
    > > - Best effort only

    >
    > That's pretty useless imho. Best-effort and RT are a bit contradictory.
    >
    > > - Provide handlers for a given set of signals that will be sent to any
    > > application missing a deadline

    >
    > Yeah, the idea was to send SIGXCPU to tasks who exceed their budget (and
    > thus will miss their deadline).
    >
    > > - no cpu-scaling
    > > - ... keep going, basically strip away every piece of dynamic behaviour
    > > and complex scheduling code

    >
    > I'm thinking there's little useful left after all that ;-)


    bah, you just picked all my arguments to shreds and convinced me I'm wrong. I
    guess it's back to the drawing board then, but now I have a better
    understanding at least.

    Thans for all the feedback guys, especially Peter! I really appreciate this!

    --
    med Vennlig Hilsen - Yours Sincerely
    Henrik Austad
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Gio, 30 Ottobre 2008 10:44 pm, Henrik Austad wrote:
    >> As to why I'm
    >> looking at EDF, I think the answer to that is a bit too long (and
    >> not
    >> appropriate for this email anyway) so I'll leave that part out.
    >> >
    >> > Well, I understand that, but it could be interesting... At least to
    >> > me.

    >
    > ok, simply put:
    > * give each task a relative deadline (will probably introduce a new
    > syscall,
    > please don't shoot me).
    > * when the task enters TASK_RUNNING, set abosolute deadline to time_now +
    > rel_deadline.
    > * insert task in rq, sorted by abs_deadline
    > * pick leftmost task and run it
    > * when task is done, pick next task
    >
    > that's it.
    >

    Ok, that is how EDF work, and I know it... I was asking something
    different... But nevermind! :-D

    >> > > The most interesting part of EDF is not the
    >> > > actual scheduler itself (although there are fun issues with that as
    >> > > well), but extending the Priority Inheritance framework to deal with
    >> > > all the fun cases that come with EDF.

    >
    > Well, I find EDF intersting because it is so blissfully simple. :-)
    >

    I agree, EDF is very simple and has a lot of very nice properties...
    Problem is do decide how to assign a deadline to a task, if it is not a
    classical soft or hard real-time one! :-O

    But you're not talking about things like that, aren't you?

    > Yes, well, EDF is not optimal for SMP systems, only for single core.
    > However,
    > you can do a pretty good attempt by assigning tasks to cores in a greedy
    > fashion (simply put the next task at the CPU with the lowest load).
    >

    I definitely agree that hard real time workloads are better handled by
    partitioned EDF, but for soft ones, it was sad to suffer from the possible
    CPU utilization loss it entails.

    Also, what about resources shared by different tasks in different
    CPU/partitions? And if you avoid sharing resources between tasks in
    different partitions (is that acceptable?), what about system resources,
    that are shared by _all_ tasks in the system by definition?

    Sorry about asking so much questions, but these are the issues we are
    trying to address, and I'm quite interested in knowing if you have any
    idea about them. :-)

    > No. You should have *either* FIFO/RR *or* EDF, not both at the same time.
    >

    Oh... Why?

    > If
    > you absolutely require both, you should at least separate them on a
    > per-core
    > basis. If you mix them, they need to be aware of the other in order to
    > make
    > the right descision, and that is not good.
    >

    Well, obvioulsy it's something that we have to think carefully, but I
    can't see any harmful situation in having a deadline based and a fixed
    priority based scheduling from where to pick task in (sorry) priority
    order.

    > Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF?
    > Having
    > fifo and edf together will complicate things. And, people looking for edf,
    > will not use fifo/rr anyway (famous last words).
    >

    Ok, maybe it's a matter of personal feelings, but I think that such a
    design, even more complicated, could be very nice and useful.

    >> Which leaves us with the big issue of priority inversion ;-)

    >
    > Couldn't above idea solve a bit of this? I have some papers on deadline
    > inheritance laying aorund somewhere, I can have a look at that, I think it
    > was a fairly elegant solution to some of these issues there.
    >

    Well, I personally think that partitioning _raises_ issues about resource
    sharing instead of lightening them... In an OS like Linux is, at least...
    :-O

    Regards,
    Dario Faggioli

    PS. Sorry for the webmail... I'm abroad and I've not my laptop with me :-(

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: Deadline scheduling (was: Re: Rearranging layout of code in the scheduler)

    On Gio, 30 Ottobre 2008 6:17 pm, Peter Zijlstra wrote:
    > Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
    > end up with the following classes
    >
    > hedf - hard EDF
    > sedf - soft EDF (bounded tardiness)
    > fifo/rr - the current static priority scheduler
    > fair - the current proportional fair scheduler
    > idle - the idle scheduler
    >

    Oh, so two classes? Well, yes, could be nice.

    > The two edf classes must share some state, so that the sedf class knows
    > about the utilisation consumed by hedf, and the main difference between
    > these two classes is the schedulability test.
    >

    Yep. Actually I think that schedulability test could be an issue as well,
    especially if we like group/hierarchical approach, since the known
    hierarchical admission tests are quite complex to implement and, probably,
    time consuming if performed on-line in an highly dynamic (with respec to
    to task arrival and leaving) system.

    > The few problems this gives are things like kstopmachine and the
    > migration threads, which should run at the max priority available on the
    > system.
    >

    Yeah, that's exactly what we was thinking too.

    Actually, I was also thinking that having fixed priority scheduling
    _before_ EDF could be of some benefit if you have to be sure that a task
    is going to be executed at a very precise instant in time, but have not a
    precise about that yet.

    > Perhaps we can introduce another class on top of hedf which will run
    > just these two tasks and is not exposed to userspace (yes, I understand
    > it will ruin just about any schedulability analysis).

    Well, could be a solution... And if this is only used for such kind of
    special tasks, maybe it is not impossible to bound or account for their
    scheduling contribution... But I'm just shooting inthe dark here, sorry
    about that! :-P

    > We can do deadline inheritance and bandwidth inheritance by changing
    > plist to a rb-tree/binary heap and mapping the static priority levels
    > somewhere at the back and also propagating the actual task responsible
    > for the boost down the chain (so as to be able to do bandwidth
    > inheritance).
    >
    > From what I gather the sssup folks are doing that, although they
    > reported that DI between disjoint schedule domains (partitions) posed an
    > interesting problem.
    >

    Yes, that's right, this is what we are investigating and trying to do in
    these days (... Or weeks... Or months!).

    > Personally I'd like to see the full priority inversion issue solved by
    > something like the proxy execution protocol, however the SMP extension
    > thereof seems to be a tad expensive - found a book on graph theory, all
    > that remains is finding time to read it :-)
    >

    Wow... So, good luck for that! :-)

    Maybe it's my fault, but I see some issues with proxy execution and
    similar protocols.
    That is, if you have, let's say, task A blocked on task B, blocked on task
    C, and you are using proxy execution, that means that you have not
    dequeued A and B when they blocked, but that you, for example, filled a
    pointer that reminds you, when you schedule them, that you have to
    actually run C, am I right?

    Now, what happens if C blocks on a non-rt mutex lock, or if it simply go
    to sleep? Is that acceptable to track the blocking chain in order to
    actually dequeue also A and B, and to requeue them again when C will wake
    up as well?

    Forgive if that's a stupid point... :-(

    > The advantage of proxy execution is that its fully invariant to the
    > schedule function and thus even works for proportional fair schedulers
    > and any kind of scheduler hierarchy.
    >

    Yes, I agree and I like it very much too. If you go for it, you could also
    add bandwidth inheritance (e.g., for group scheduling) and things like
    that almost for free (if wanted! :-))

    Regards,
    Dario Faggioli


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread