Apache 2 + perl UTF-8 problem - modperl

This is a discussion on Apache 2 + perl UTF-8 problem - modperl ; Hi. I apologise if this is not really a mod_perl problem, but this list might be my best chance to find the competences required for some tips. Platform : SunOS 5.8 (Solaris 8) Apache : Apache/2.0.52 Perl : v5.8.5 built ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Apache 2 + perl UTF-8 problem

  1. Apache 2 + perl UTF-8 problem

    Hi.

    I apologise if this is not really a mod_perl problem, but this list
    might be my best chance to find the competences required for some tips.

    Platform : SunOS 5.8 (Solaris 8)
    Apache : Apache/2.0.52
    Perl : v5.8.5 built for sun4-solaris
    CGI.pm : 3.37

    I have a perl cgi-bin script which handles a POST from a form, using
    CGI.pm to retrieve the POSTed values via param().

    In this form, a Java applet picks up the values of some input fields
    from the
    , and sends them as a POST to the cgi-bin script, along
    with some other parameters (values) created in the applet itself.
    The html form is UTF-8 encoded, and has (in addition) a meta tag which
    says so. The browser "knows" it is UTF-8 (verified).
    The itself has an Accept-charset=utf-8 attribute.
    Among the values send by the applet in the post, are some paths names,
    of files on the workstation (Windows).

    The html contains a special field for testing, whose value is "Üñicôdé"
    encoded as UTF-8, which allows me, in the form-handling cgi-bin script,
    to really check what I am receiving, in terms of charset.
    (And I hope you see this correctly in this email, otherwise the
    comprehension may suffer; it is the word Unicode, but modified so that
    some characters would be "accented" and thus fall in a 2-byte-per-char
    range).

    In the cgi-bin, I could be receiving either a string marked by perl as
    utf-8, of which the result of length(string) would be 7, and the "utf8
    flag" would be on. That's what I expected.
    It isn't so. I appear to receive a string, non-marked as utf8, and
    whose length tests as 11 (the number of bytes of the UTF-8 string).
    I can properly Encode::decode it to utf-8 though, and then it matches
    what is sent by the form.

    Still following ?

    Now, the first thing I would like to understand is why this is so.
    Since this is a POST, and since the browser knows that "everything" is
    UTF-8, I would expect it to send the proper multipart POST, with each
    item marked as UTF-8. So why does my cgi-bin script not see it as such ?

    The second part : some of the POSTed values sent by the applet are "file
    upload" objects, which include a path. This path also can contain
    accented characters (for a German filename e.g.).
    When that is the case, then it seems that what I am getting in the
    cgi-bin script is (for the file path) a string where accented characters
    have been replaced by question marks "?". This path string is also not
    marked utf-8 for perl.


    The last question is that I have configured Apache with the following
    run-time directives :
    ScriptLog /var/something/log/scripts.log
    ScriptLogBuffer 32765
    (the real POST is very small)

    The directory is writeable by the user-id under which Apache is running,
    but there is no log to be found. If I create the file in advance with
    proper user and permission, the file stays desperately empty.
    (I am trying to do that to see the content of the real POST, before
    CGI.pm grabs it.)
    Anyone has an idea why this log does not show up ?


    Does anyone have any idea that may help me on any of the above, and in
    the general search of the truth ?


    Thanks in advance,
    André


  2. Re: Apache 2 + perl UTF-8 problem

    On Sun 22 Jun 2008, André Warnier wrote:
    > Now, the first thing I would like to understand is why this is so.
    > Since this is a POST, and since the browser knows that "everything" is
    > UTF-8, I would expect it to send the proper multipart POST, with each
    > item marked as UTF-8. *So why does my cgi-bin script not see it as such ?


    Yes that is the current state. Neither CGI nor libapreq2 does that conversion
    for you, afaik. You have to do it yourself.

    > (I am trying to do that to see the content of the real POST, before
    > CGI.pm grabs it.)


    You could write a small mod_perl input filter. That's not complicated in your
    case. With luck one of the examples on perl.apache.org does fit your needs.

    Torsten

    --
    Need professional mod_perl support?
    Just hire me: torsten.foertsch@gmx.net


  3. Re: Apache 2 + perl UTF-8 problem

    André Warnier wrote:
    > Hi.
    >
    > I apologise if this is not really a mod_perl problem, but this list
    > might be my best chance to find the competences required for some tips.
    >
    > Platform : SunOS 5.8 (Solaris 8)
    > Apache : Apache/2.0.52
    > Perl : v5.8.5 built for sun4-solaris
    > CGI.pm : 3.37


    That version of CGI.pm has support for what you need:

    use CGI qw( -utf8 );

    Although the documentation warns it will interfere with file uploads.

    As an alternative, below is a customization I've been using that tries to keep
    file uploads intact. It's been running live for almost 3 years now. The code
    looks pretty similar to what's in CGI 3.37, so maybe that warning is just FUD.
    I suggest you test either solution before believing me ;-)

    Usage: Add it to your startup.pl, or add a "use CGI::as_utf;". It assumes you
    always use the object interface.

    Rhesa


    package CGI::as_utf;

    BEGIN
    {
    use strict;
    use warnings;
    use CGI;
    use Encode;

    {
    no warnings 'redefine';
    my $param_org = \&CGI:aram;

    my $might_decode = sub {
    my $p = shift;
    return ( !$p || ( ref $p && fileno($p) ) )
    ? $p
    : eval { decode_utf8($p) } || $p;
    };

    *CGI:aram = sub {
    my $q = $_[0]; # assume object calls always
    my $p = $_[1];

    goto &$param_org if scalar @_ != 2;

    return wantarray
    ? map { $might_decode->($_) } $q->$param_org($p)
    : $might_decode->( $q->$param_org($p) );
    }
    }
    }


    1;


  4. Re: Apache 2 + perl UTF-8 problem


    Torsten Foertsch wrote:
    > On Sun 22 Jun 2008, André Warnier wrote:
    >> Now, the first thing I would like to understand is why this is so.
    >> Since this is a POST, and since the browser knows that "everything" is
    >> UTF-8, I would expect it to send the proper multipart POST, with each
    >> item marked as UTF-8. So why does my cgi-bin script not see it as such ?

    >
    > Yes that is the current state. Neither CGI nor libapreq2 does that conversion
    > for you, afaik. You have to do it yourself.
    >

    Thanks.
    For the moment, I am dealing with CGI.pm, without mod_perl or libapreq2.
    I'll deal with those afterward.

    I see a problem though : as far as I can tell, CGI.pm does not offer any
    way to find out the "charset" header with which each POST parameter was
    sent. Or am I missing something ?

    André


+ Reply to Thread