[Q] Why does dma_alloc_coherent() of ia64 GFP_DMA? - Kernel

This is a discussion on [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA? - Kernel ; Hello. I have a (may be dumb) question about dma_alloc_coherent() for ia64. Why does dma_alloc_coherent() of ia64 force GFP_DMA yet? And why is swiotlb_dma_alloc_coherent() default routine of platform_dma_alloc_coherent()? ------- #define dma_alloc_coherent(dev, size, handle, gfp) \ platform_dma_alloc_coherent(dev, size, handle, (gfp) | ...

+ Reply to Thread
Results 1 to 14 of 14

Thread: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

  1. [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?


    Hello.

    I have a (may be dumb) question about dma_alloc_coherent() for ia64.


    Why does dma_alloc_coherent() of ia64 force GFP_DMA yet?
    And why is swiotlb_dma_alloc_coherent() default routine of
    platform_dma_alloc_coherent()?

    -------
    #define dma_alloc_coherent(dev, size, handle, gfp) \
    platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
    --------

    Even if a device allows over 4G access and the driver doesn't specify
    GFP_DMA, dma_alloc_coherent() returns under 4G area.
    I guess many people think this is not so big issue because drivers require
    very small memory generally.

    However, I think this has the possibility of a finishing blow of OOM.
    For example,

    1) Page caches occupy normal zone, and DMA zone is free.
    2) A user's application requires a few GB memory and mlock it.
    All DMA zone is occupied by it.
    3) A device which allows over 4GB is hot-added.
    But dma_alloc_coherent() try to allocate DMA zone.
    Then OOM occurs because there is no freeable pages.

    I heard there are some users who require a few GB mlock.
    There are similar trouble in past.


    If GFP_DMA is removed from above definition of dma_alloc_coherent(),
    what will happen?

    If it is not allowed, how is followings?

    dma_alloc_coherent()
    -> platform_dma_alloc_coherent()
    -> normal_alloc_coherent()
    {
    if (dma_mask allow over 4G)
    ret = __get_free_pages();
    :
    (check validation of returned address)
    :
    else
    swiotlb_alloc_coherent();
    }
    }

    If I'm something misunderstanding, please let me know.


    Thanks.

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    Yasunori Goto writes:
    >
    > However, I think this has the possibility of a finishing blow of OOM.
    > For example,
    >
    > 1) Page caches occupy normal zone, and DMA zone is free.
    > 2) A user's application requires a few GB memory and mlock it.
    > All DMA zone is occupied by it.


    The VM has special "lower zone protection" to protect against these
    kinds of deadlocks. They can be circumvented, but it takes effort.

    > 3) A device which allows over 4GB is hot-added.
    > But dma_alloc_coherent() try to allocate DMA zone.
    > Then OOM occurs because there is no freeable pages.
    >
    > I heard there are some users who require a few GB mlock.


    Normally mlock is limited to half the memory exactly to avoid
    such problems.

    Also I believe there are some issues with non continuous memory on
    some IA64 systems -- e.g. Altixes iirc have the requirement
    that you use the IOMMU for higher memory. So it's probably
    not easy to change.

    -Andi

    --
    ak@linux.intel.com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Mon, Nov 10, 2008 at 01:59:57PM +0100, Andi Kleen wrote:
    > Yasunori Goto writes:


    > > 3) A device which allows over 4GB is hot-added.
    > > But dma_alloc_coherent() try to allocate DMA zone.
    > > Then OOM occurs because there is no freeable pages.
    > >
    > > I heard there are some users who require a few GB mlock.

    >
    > Normally mlock is limited to half the memory exactly to avoid
    > such problems.
    >
    > Also I believe there are some issues with non continuous memory on
    > some IA64 systems -- e.g. Altixes iirc have the requirement
    > that you use the IOMMU for higher memory. So it's probably
    > not easy to change.


    I am not sure what is be referred to here, but all of an Altix's memory is
    DMA capable with the exception of the stuff covered by the MSPEC driver
    (that is uncached memory). There are certainly all sort of special
    requirements for doing transfers on Altix to eliminate memory ordering
    problems, but nothing specific that I recall related to address ranges
    and DMA.

    I appologize in advance if I am answering a different question that
    was asked.

    Thanks,
    Robin
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Mon, 10 Nov 2008, Robin Holt wrote:

    > I am not sure what is be referred to here, but all of an Altix's memory is
    > DMA capable with the exception of the stuff covered by the MSPEC driver
    > (that is uncached memory). There are certainly all sort of special
    > requirements for doing transfers on Altix to eliminate memory ordering
    > problems, but nothing specific that I recall related to address ranges
    > and DMA.


    But then ZONE_DMA has nothing to do with memory being dmaable or not.
    ZONE_DMA is for legacy devices that cannot do DMA to all of memory.
    I vaguely remember having stuffed all the memory into ZONE_NORMAL at some
    point. ZONE_DMA vanishes for Altix configurations.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Mon, 10 Nov 2008, Yasunori Goto wrote:

    > Even if a device allows over 4G access and the driver doesn't specify
    > GFP_DMA, dma_alloc_coherent() returns under 4G area.


    GFP_DMA can become 0 for configurations that have
    !CONFIG_ZONE_DMA. Then all of memory is available.

    The call is subarch specific. So f.e. Altix sn_dma_alloc_coherent does
    not set __GFP_DMA.

    If you have an IA64 arch that only support 32bit I/O then __GFP_DMA in
    dma_alloc_coherent makes sense.





    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    > Yasunori Goto writes:
    > >
    > > However, I think this has the possibility of a finishing blow of OOM.
    > > For example,
    > >
    > > 1) Page caches occupy normal zone, and DMA zone is free.
    > > 2) A user's application requires a few GB memory and mlock it.
    > > All DMA zone is occupied by it.

    >
    > The VM has special "lower zone protection" to protect against these
    > kinds of deadlocks. They can be circumvented, but it takes effort.


    I wrote documentation about lowmem_reserve_ratio, which is the current name of
    lower_zone_protection, in Documantation/filesystem/proc.txt.
    But, I really doubt there are many users who can understand how to estimate
    there value.


    >
    > > 3) A device which allows over 4GB is hot-added.
    > > But dma_alloc_coherent() try to allocate DMA zone.
    > > Then OOM occurs because there is no freeable pages.
    > >
    > > I heard there are some users who require a few GB mlock.

    >
    > Normally mlock is limited to half the memory exactly to avoid
    > such problems.


    Half? I don't know which documentation desribes it. :-P
    Probably, the most of users don't know it...

    To be honest, I can understand kernel hacker hope there is NO user who
    uses mlock for some GB memory. However, the reality is relentless.
    There were some real user who tried it. I remember the user who had 8GB memory
    box and mlocked 5GB.
    But, even if they mlocked only 4GB, OOM must occur.

    (In addition, Fujitsu PrimeQuest reserves about 2GB area for maximum
    I/O equipment. Then, Zone DMA is only 2GB..... (Ueeeeeeeep!))


    Anyway, I don't want to discuss about mlock's specification.
    Because users can understand the side effect of mlock if OOM occurs.
    But they can't understand why Zone DMA is used even if driver doesn't
    require Zone DMA. It looks simply BUG from user's view point
    when it was finishing blow of OOM. I have to explain why its BUG(?)
    still remains. But I'm newbie around this area. So, I would like to know.


    Thanks.

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    > On Mon, 10 Nov 2008, Yasunori Goto wrote:
    >
    > > Even if a device allows over 4G access and the driver doesn't specify
    > > GFP_DMA, dma_alloc_coherent() returns under 4G area.

    >
    > GFP_DMA can become 0 for configurations that have
    > !CONFIG_ZONE_DMA. Then all of memory is available.
    >
    > The call is subarch specific. So f.e. Altix sn_dma_alloc_coherent does
    > not set __GFP_DMA.


    I heard that Altix has IOMMU and __GFP_DMA is not necessary.

    > If you have an IA64 arch that only support 32bit I/O then __GFP_DMA in
    > dma_alloc_coherent makes sense.


    Agree.
    But our box supports both of 32bit I/O and 64bit I/O without IOMMU.
    Is it abnormal platform? New interface is necessary for our box like Altix?


    Bye.

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Mon, 10 Nov 2008 13:59:57 +0100
    Andi Kleen wrote:

    > Also I believe there are some issues with non continuous memory on
    > some IA64 systems -- e.g. Altixes iirc have the requirement
    > that you use the IOMMU for higher memory. So it's probably
    > not easy to change.


    Are you talking about this odd dma requirement?

    http://www.gelato.unsw.edu.au/archiv...0305/5604.html
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Mon, 10 Nov 2008 13:47:51 +0900
    Yasunori Goto wrote:

    > I have a (may be dumb) question about dma_alloc_coherent() for ia64.
    >
    >
    > Why does dma_alloc_coherent() of ia64 force GFP_DMA yet?
    > And why is swiotlb_dma_alloc_coherent() default routine of
    > platform_dma_alloc_coherent()?
    >
    > -------
    > #define dma_alloc_coherent(dev, size, handle, gfp) \
    > platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
    > --------
    >
    > Even if a device allows over 4G access and the driver doesn't specify
    > GFP_DMA, dma_alloc_coherent() returns under 4G area.
    > I guess many people think this is not so big issue because drivers require
    > very small memory generally.
    >
    > However, I think this has the possibility of a finishing blow of OOM.
    > For example,
    >
    > 1) Page caches occupy normal zone, and DMA zone is free.
    > 2) A user's application requires a few GB memory and mlock it.
    > All DMA zone is occupied by it.
    > 3) A device which allows over 4GB is hot-added.
    > But dma_alloc_coherent() try to allocate DMA zone.
    > Then OOM occurs because there is no freeable pages.
    >
    > I heard there are some users who require a few GB mlock.
    > There are similar trouble in past.
    >
    >
    > If GFP_DMA is removed from above definition of dma_alloc_coherent(),
    > what will happen?


    Probably, it breaks swiotlb with devices that don't have
    DMA_64BIT_MASK coherent_dma_mask.


    > If it is not allowed, how is followings?
    >
    > dma_alloc_coherent()
    > -> platform_dma_alloc_coherent()
    > -> normal_alloc_coherent()
    > {
    > if (dma_mask allow over 4G)
    > ret = __get_free_pages();
    > :
    > (check validation of returned address)
    > :
    > else
    > swiotlb_alloc_coherent();
    > }
    > }
    >
    > If I'm something misunderstanding, please let me know.


    Hmm, platform_dma_alloc_coherent is supposed to handle multiple dma
    ops, swiotlb, hardware IOMMUs like VT-d and sba, etc?

    In IA64, the gfp zone flag matters for only swiotlb, I think.


    =
    From: FUJITA Tomonori
    Subject: [PATCH] IA64: use GFP_DMA in dma_alloc_coherent only when necessary

    For swiotlb, we need to set GFP_DMA if a device doesn't have
    DMA_64BIT_MASK coherent_dma_mask. hardware IOMMUs like VT-d and sba
    should ignore gfp zone flag.

    diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h
    index bbab7e2..d4de41b 100644
    --- a/arch/ia64/include/asm/dma-mapping.h
    +++ b/arch/ia64/include/asm/dma-mapping.h
    @@ -52,7 +52,9 @@ extern struct ia64_machine_vector ia64_mv;
    extern void set_iommu_machvec(void);

    #define dma_alloc_coherent(dev, size, handle, gfp) \
    - platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
    + platform_dma_alloc_coherent(dev, size, handle, \
    + (dev)->coherent_dma_mask != DMA_64BIT_MASK ? \
    + (gfp) | GFP_DMA : gfp)

    /* coherent mem. is cheap */
    static inline void *
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Mon, 10 Nov 2008 16:06:52 -0600 (CST)
    Christoph Lameter wrote:

    > On Mon, 10 Nov 2008, Yasunori Goto wrote:
    >
    > > Even if a device allows over 4G access and the driver doesn't specify
    > > GFP_DMA, dma_alloc_coherent() returns under 4G area.

    >
    > GFP_DMA can become 0 for configurations that have
    > !CONFIG_ZONE_DMA. Then all of memory is available.
    >
    > The call is subarch specific. So f.e. Altix sn_dma_alloc_coherent does
    > not set __GFP_DMA.


    Is it because it does some kinda address translation
    (provider->dma_map_consistent) later? The zone flag is meaningless if
    you do sorta address translation (e.g. hardware IOMMU like VT-d).
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    > On Mon, 10 Nov 2008 13:47:51 +0900
    > Yasunori Goto wrote:
    >
    > > I have a (may be dumb) question about dma_alloc_coherent() for ia64.
    > >
    > >
    > > Why does dma_alloc_coherent() of ia64 force GFP_DMA yet?
    > > And why is swiotlb_dma_alloc_coherent() default routine of
    > > platform_dma_alloc_coherent()?
    > >
    > > -------
    > > #define dma_alloc_coherent(dev, size, handle, gfp) \
    > > platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
    > > --------
    > >
    > > Even if a device allows over 4G access and the driver doesn't specify
    > > GFP_DMA, dma_alloc_coherent() returns under 4G area.
    > > I guess many people think this is not so big issue because drivers require
    > > very small memory generally.
    > >
    > > However, I think this has the possibility of a finishing blow of OOM.
    > > For example,
    > >
    > > 1) Page caches occupy normal zone, and DMA zone is free.
    > > 2) A user's application requires a few GB memory and mlock it.
    > > All DMA zone is occupied by it.
    > > 3) A device which allows over 4GB is hot-added.
    > > But dma_alloc_coherent() try to allocate DMA zone.
    > > Then OOM occurs because there is no freeable pages.
    > >
    > > I heard there are some users who require a few GB mlock.
    > > There are similar trouble in past.
    > >
    > >
    > > If GFP_DMA is removed from above definition of dma_alloc_coherent(),
    > > what will happen?

    >
    > Probably, it breaks swiotlb with devices that don't have
    > DMA_64BIT_MASK coherent_dma_mask.
    >
    >
    > > If it is not allowed, how is followings?
    > >
    > > dma_alloc_coherent()
    > > -> platform_dma_alloc_coherent()
    > > -> normal_alloc_coherent()
    > > {
    > > if (dma_mask allow over 4G)
    > > ret = __get_free_pages();
    > > :
    > > (check validation of returned address)
    > > :
    > > else
    > > swiotlb_alloc_coherent();
    > > }
    > > }
    > >
    > > If I'm something misunderstanding, please let me know.

    >
    > Hmm, platform_dma_alloc_coherent is supposed to handle multiple dma
    > ops, swiotlb, hardware IOMMUs like VT-d and sba, etc?
    >
    > In IA64, the gfp zone flag matters for only swiotlb, I think.
    >
    >
    > =
    > From: FUJITA Tomonori
    > Subject: [PATCH] IA64: use GFP_DMA in dma_alloc_coherent only when necessary
    >
    > For swiotlb, we need to set GFP_DMA if a device doesn't have
    > DMA_64BIT_MASK coherent_dma_mask. hardware IOMMUs like VT-d and sba
    > should ignore gfp zone flag.
    >
    > diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h
    > index bbab7e2..d4de41b 100644
    > --- a/arch/ia64/include/asm/dma-mapping.h
    > +++ b/arch/ia64/include/asm/dma-mapping.h
    > @@ -52,7 +52,9 @@ extern struct ia64_machine_vector ia64_mv;
    > extern void set_iommu_machvec(void);
    >
    > #define dma_alloc_coherent(dev, size, handle, gfp) \
    > - platform_dma_alloc_coherent(dev, size, handle, (gfp) | GFP_DMA)
    > + platform_dma_alloc_coherent(dev, size, handle, \
    > + (dev)->coherent_dma_mask != DMA_64BIT_MASK ? \
    > + (gfp) | GFP_DMA : gfp)
    >
    > /* coherent mem. is cheap */
    > static inline void *


    Wao! Seems reasonable! Ack Ack!

    Acked-by: Yasunori Goto

    Thanks a lot!

    --
    Yasunori Goto


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Tue, 11 Nov 2008, Yasunori Goto wrote:

    > But our box supports both of 32bit I/O and 64bit I/O without IOMMU.
    > Is it abnormal platform? New interface is necessary for our box like Altix?


    No its like x86 with the GFP_DMA zone for < 16M addresses. The special
    memory creates an imbalance that sometimes leads to weird VM behavior. I'd
    make sure to set GFP_DMA only for devices that actually require < 4GB
    memory and only use it if no IOMMU like stuff is available. Its best to
    not use GFP_DMA.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Tue, 11 Nov 2008, FUJITA Tomonori wrote:

    > Is it because it does some kinda address translation
    > (provider->dma_map_consistent) later? The zone flag is meaningless if
    > you do sorta address translation (e.g. hardware IOMMU like VT-d).


    Yes it can do address translation. Therefore a < 4G address can show up at
    any 64 bit address. So no need for a special DMA zone. The same is true
    for more x86_64 platforms that have an IOMMU.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [Q] Why does dma_alloc_coherent() of ia64 GFP_DMA?

    On Tue, 11 Nov 2008 14:32:46 -0600 (CST)
    Christoph Lameter wrote:

    > On Tue, 11 Nov 2008, FUJITA Tomonori wrote:
    >
    > > Is it because it does some kinda address translation
    > > (provider->dma_map_consistent) later? The zone flag is meaningless if
    > > you do sorta address translation (e.g. hardware IOMMU like VT-d).

    >
    > Yes it can do address translation. Therefore a < 4G address can show up at
    > any 64 bit address. So no need for a special DMA zone. The same is true
    > for more x86_64 platforms that have an IOMMU.


    Yes, with address translation hardware such as IOMMU, the zone is
    meaningless. The IOMMU drivers ignore the zone flag
    (e.g. intel_alloc_coherent).

    But the GFP_DMA in IA64's platform_dma_alloc_coherent() is still
    necessary for swiotlb with devices that don't have DMA_64BIT_MASK
    coherent_dma_mask. They need a < 4G address.

    This is exactly what x86 and x86_64 do, dma_alloc_coherent in
    arch/x86/include/asm/dma-mapping.h. It sets GFP_DMA and GFP_DMA32 for
    swiotlb and pci-nommu.c when necessary.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread