offset-based hash table for ASCII data - Unix

This is a discussion on offset-based hash table for ASCII data - Unix ; I'm looking for an offset-based data structure to hold character data. Background: I'm working with an app whose server is written in Java while the client is in C. The server needs to package up a bunch of data and ...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 44

Thread: offset-based hash table for ASCII data

  1. offset-based hash table for ASCII data

    I'm looking for an offset-based data structure to hold character data.
    Background: I'm working with an app whose server is written in Java
    while the client is in C. The server needs to package up a bunch of data
    and ship it to the client where it will be traversed, read-only, many
    many times. There's actually a working implementation which sends the
    data in an XML format, but it's too slow and profiling reveals that it
    spends almost all its time in XML parsing, so I need to replace the XML
    with something more natural to the purpose.

    The client starts thousands of child processes which each start by
    parsing the XML document in order to place its data in hash tables,
    after which it is accessed via the hash table only. So the XML is very
    much an intermediate format, and the obvious win would be to have the
    Java server package the data in a form which is directly usable by the
    clients. Presumably the data would arrive from the server and be stored
    in a file: each client would then map the file into its address space
    and treat it as a hash table (or similar data structure) right away.

    The traditional issue with transportable data structures is that since
    the client can't reliably control what address the data is mapped to,
    all addressing must be made relative to the starting point. Does anyone
    know of an implementation of such a format which can be generated in
    Java and consumed in C code?

    TIA,
    RM

  2. Re: offset-based hash table for ASCII data

    On Apr 15, 8:15 am, Rex Mottram wrote:
    > I'm looking for an offset-based data structure to hold character data.
    > Background: I'm working with an app whose server is written in Java
    > while the client is in C. The server needs to package up a bunch of data
    > and ship it to the client where it will be traversed, read-only, many
    > many times. There's actually a working implementation which sends the
    > data in an XML format, but it's too slow and profiling reveals that it
    > spends almost all its time in XML parsing, so I need to replace the XML
    > with something more natural to the purpose.
    >
    > The client starts thousands of child processes which each start by
    > parsing the XML document in order to place its data in hash tables,
    > after which it is accessed via the hash table only. So the XML is very
    > much an intermediate format, and the obvious win would be to have the
    > Java server package the data in a form which is directly usable by the
    > clients. Presumably the data would arrive from the server and be stored
    > in a file: each client would then map the file into its address space
    > and treat it as a hash table (or similar data structure) right away.
    >
    > The traditional issue with transportable data structures is that since
    > the client can't reliably control what address the data is mapped to,
    > all addressing must be made relative to the starting point. Does anyone
    > know of an implementation of such a format which can be generated in
    > Java and consumed in C code?
    >
    > TIA,
    > RM


    Can the client parse the XML and share it with the child processes
    (thus avoiding having each child parse the XML)?

    Also, if the XML parsing is only done once by each child process, why
    is the parsing performance so important?

  3. Re: offset-based hash table for ASCII data

    shakahshakah@gmail.com wrote:
    > Can the client parse the XML and share it with the child processes
    > (thus avoiding having each child parse the XML)?


    Yes, that's also an option but it doesn't change the underlying
    complication much, while leaving in an extra stage which would be better
    avoided.

    The fundamental need is for an offset-based data structure since the
    child processes can't be guaranteed of being able to map the file at a
    constant location. This would still be a problem between the client and
    its children. So given that someone needs to place the data in an
    offset-based table, it would be far preferable to do the ADT work in
    Java and send it straight to the children. Yes, the server could send
    XML and the client could rework it to a native format and pass it on to
    the children but there's just more moving parts to break that way, not
    to mention more code written in C to core dump. In general I, along with
    I think most people in the situation, prefer to push as much coding as
    possible over to the Java side where you have garbage collection,
    exceptions, and a really nice debugger. Not to mention that the server
    process typically runs on a "server" (bigger, faster) machine.

    > Also, if the XML parsing is only done once by each child process, why
    > is the parsing performance so important?


    The short and somewhat obnoxious answer would be that "why" doesn't
    matter - profiling has revealed that XML is the major cost and profiling
    doesn't lie. The more substantive response is that many of these child
    processes will end up doing either very little or nothing at all;
    basically they start up, parse the XML data, and compare current reality
    against what the server says it should be. In the majority of cases
    current reality is fine, in which case the process can exit immediately.
    Thus the constant factor of XML processing can become the dominant part
    of child performance when multiplied by thousands of them.

    RM

  4. Re: offset-based hash table for ASCII data

    On Apr 15, 9:59 am, Rex Mottram wrote:
    > shakahsha...@gmail.com wrote:
    > > Can the client parse the XML and share it with the child processes
    > > (thus avoiding having each child parse the XML)?

    >
    > Yes, that's also an option but it doesn't change the underlying
    > complication much, while leaving in an extra stage which would be better
    > avoided.
    >
    > The fundamental need is for an offset-based data structure since the
    > child processes can't be guaranteed of being able to map the file at a
    > constant location. This would still be a problem between the client and
    > its children. So given that someone needs to place the data in an
    > offset-based table, it would be far preferable to do the ADT work in
    > Java and send it straight to the children. Yes, the server could send
    > XML and the client could rework it to a native format and pass it on to
    > the children but there's just more moving parts to break that way, not
    > to mention more code written in C to core dump. In general I, along with
    > I think most people in the situation, prefer to push as much coding as
    > possible over to the Java side where you have garbage collection,
    > exceptions, and a really nice debugger. Not to mention that the server
    > process typically runs on a "server" (bigger, faster) machine.
    >
    > > Also, if the XML parsing is only done once by each child process, why
    > > is the parsing performance so important?

    >
    > The short and somewhat obnoxious answer would be that "why" doesn't
    > matter - profiling has revealed that XML is the major cost and profiling
    > doesn't lie. The more substantive response is that many of these child
    > processes will end up doing either very little or nothing at all;
    > basically they start up, parse the XML data, and compare current reality
    > against what the server says it should be. In the majority of cases
    > current reality is fine, in which case the process can exit immediately.
    > Thus the constant factor of XML processing can become the dominant part
    > of child performance when multiplied by thousands of them.
    >
    > RM


    Fair enough.

    I guess I read your OP as describing a client launching many long-
    lived (and busy) children, where the initial XML parse wouldn't matter
    so much.

  5. Re: offset-based hash table for ASCII data

    Rex Mottram writes:

    > I'm working with an app whose server is written in
    > Java while the client is in C.


    > The traditional issue with transportable data structures is that since
    > the client can't reliably control what address the data is mapped to,
    > all addressing must be made relative to the starting point. Does
    > anyone know of an implementation of such a format which can be
    > generated in Java and consumed in C code?


    This is done a lot by Remote Procedure Call systems (though your
    format may be more complex than they can handle) but I would look
    using some form of RPC, or at least the library that marshals the
    data. Since Sun used RPC a lot for its networking, I'd image there is
    good Java support/bindings (but that is just a wild guess).

    If you fancy standards, there is always ASN.1. Old, and it used to
    thought of as rather "heavy" but that was before everyone started
    using XML for everything.

    --
    Ben.

  6. Re: offset-based hash table for ASCII data

    May be a binary XML format would be the simplest way to improve the
    performance. For example there is wbxml4j for java and libbxml for C. I
    never used them, but it seems worth a try.
    If your XML structure is not too complex you could also implement your
    own simple binary format. So that the clients just reads a kind of
    nested structs from shared memory that use offsets to point to childs an
    siblings.

  7. Re: offset-based hash table for ASCII data

    Rex Mottram wrote:
    > I'm working with an app whose server is written in Java
    > while the client is in C. The server needs to package up a bunch of data
    > and ship it to the client where it will be traversed, read-only, many
    > many times. There's actually a working implementation which sends the
    > data in an XML format, but it's too slow and profiling reveals that it
    > spends almost all its time in XML parsing, so I need to replace the XML
    > with something more natural to the purpose.


    Have you considered JSON?

    --
    RGB

  8. Re: offset-based hash table for ASCII data

    Ben Bacarisse writes:
    > Rex Mottram writes:
    >> I'm working with an app whose server is written in
    >> Java while the client is in C.

    >
    >> The traditional issue with transportable data structures is that since
    >> the client can't reliably control what address the data is mapped to,
    >> all addressing must be made relative to the starting point. Does
    >> anyone know of an implementation of such a format which can be
    >> generated in Java and consumed in C code?

    >
    > This is done a lot by Remote Procedure Call systems (though your
    > format may be more complex than they can handle) but I would look
    > using some form of RPC, or at least the library that marshals the
    > data.


    The easy way to solve such a problem is

    a) get over the completely insane idea that designing data
    exchange formats would be something network programmers should
    not care about.

    b) defined a (presumably simple) binary message format
    containing the information you want to communicate.

    Regarding b, if you want to make your life easy, avoid numbers not
    composed of an integral number of octets and try to preserve 'natural
    alignment' of the members, ie let four-octet quantities (32-bit
    integers) start on a four-octet boundary relative to the start of the
    message.

    For a structure which absolutely must contain a lot of pointers,
    introducing a 'fixup' step could be sensible: Store the offsets of all
    contained pointers in some array, traverse this array and add the
    start address of the buffer the message is contained in to the value
    stored at the location start + in the buffer.

    > If you fancy standards, there is always ASN.1. Old, and it used to
    > thought of as rather "heavy" but that was before everyone started
    > using XML for everything.


    ASN.1 means 'abstract syntax notation one' and it is (mostly) a
    standardized high-level language for declaring structured data types.
    It is not a definition of an actual 'wire formats' of data. That would
    be the purpose of some set of encoding rules for ASN.1, eg BER (basic
    encoding rules) or PER (packed encoding rules). BER encoding is
    already fairly byzantine and that it transmits and encoded length and
    an encoded type for each 'information quantity' is useless for
    applications where producer and consumer 'know' how the messages are
    supposed to be structured.

    According to one of the people responsible for the PER-rules, these
    were initially conceived by a couple of frustrated people busy with
    getting drunk, with the frustration resulting from the fact that too
    many members of some gremium had provided negative, constructive
    feedback regarding their previous attempt at defining an 'improved
    BER'. PER was designed to address this problem and the person I am
    referring to (author of a freely downloadable book on ASN.1)
    recommends that people don't even try to understand the
    specification.

  9. Re: offset-based hash table for ASCII data

    Rex Mottram wrote:
    > I'm looking for an offset-based data structure to hold character data.


    I'm not sure what you mean by "offset based data structure." I think I
    missed that chapter when doing my school work, where the official offset
    based data structure that all languages inter-operate with was described.

    It sounds like you want just plain old binary data. This is a huge
    mistake, imo, since your client and server combination is going to be a
    lot less modifiable than just one system by itself. Better to send the
    XML to the client, parse the whole thing once into some binary format,
    then spawn the child processes. Getting two systems to work together in
    a binary format is going to be very hard to maintain.

    But anyway, look at DataOutputStream for Java. Send the info to a
    network stream, or you can buffer it to memory by hooking the DataOutput
    Stream to a ByteArrayOutputStream, and then send the buffer at your leisure.

    Think very hard about how you will modify that binary data in the future
    when you implement this.

    If we knew a little more about what was being parsed here, we might be
    able to help you further. My advice is "don't" but you seem wedded to
    the idea, so hopefully we can save you from disaster. It's still really
    unclear to me how sending data in binary is going to present some huge
    savings on the client. Networks are slow, CPUs are fast. Servers also
    tend to be heavily loaded. Parsing once in C on the client seems like
    the obvious answer and much less headache prone.

    Anyway, good luck.


    > The traditional issue with transportable data structures is that since
    > the client can't reliably control what address the data is mapped to,
    > all addressing must be made relative to the starting point. Does anyone
    > know of an implementation of such a format which can be generated in
    > Java and consumed in C code?


    Re-reading this, I'm still totally unclear on what you are actually
    asking here. You need "addressing?" Are you saying you don't know what
    indirection is in C? Ever hear of a look-up table? How about a hash table?

    That may come off as condescending, but that's what your description
    sounds like to me. Maybe a clearer explanation of how you think the
    data will be parsed/accessed in binary will help. I think part of the
    problem is you are a bit unclear yourself.

  10. Re: offset-based hash table for ASCII data

    Mark Space writes:
    > Rex Mottram wrote:
    >> I'm looking for an offset-based data structure to hold character
    >> data.

    >
    > I'm not sure what you mean by "offset based data structure."


    This fairly obviously means 'replacing pointers by offsets relative to
    the start of the data structure' (because pointer values usually
    cannot even be meaningfully communicated between different processes
    running on the same machine, let alone processes running on different
    machines).

    [...]

    > It sounds like you want just plain old binary data.


    That was indeed what was intended: Use a data format which can be
    'readily prepared' on the server for use as-is, because the overhead
    of parsing an XML-steganogram of a structured message into a
    structured message had been _proven_ to be to expensive.

    > This is a huge mistake, imo, since your client and server
    > combination is going to be a lot less modifiable than just one
    > system by itself.


    At worst, it will need twice the effort for modifying one system.

    > Better to send the XML to the client, parse the whole thing once
    > into some binary format, then spawn the child processes. Getting
    > two systems to work together in a binary format is going to be very
    > hard to maintain.


    Why would it? The code itself isn't structurally different: For the
    most general case, you have a 'native' representation on system #1,
    which is encoded into a transport representation by an encoding
    routine, transmitted to system #2 and then decoded into a native
    representation by a decoding routine.

    [...]

    > If we knew a little more about what was being parsed here, we might be
    > able to help you further. My advice is "don't" but you seem wedded to
    > the idea, so hopefully we can save you from disaster. It's still
    > really unclear to me how sending data in binary is going to present
    > some huge savings on the client.


    Did it occur to you that your lack of understanding in this respect
    means that you are a less-than-ideal person regarding having a
    sensible opinion on the topic?

    The original problem was that _measurements_ had proven that the need
    to parse an XML-representation was too expensive for this particular
    application. Assuming this as true, not parsing the data on the
    client, or at least decoding something signficantly less noisy than
    XML (eg a binary interchange format or a simpler text format) is an
    obvious solution. It will reduce the load on the server, too
    (constructing XML is not cheaper than deconstructing XML) and will
    even lead to less bandwidht waste during transmission, assuming the
    'other message format' is designed with a better SDU/ PDU than XML can
    provide (which will be easy[*]).
    [*] No, compression is not the solution: It's a ressource
    intense workaround for the original problem.

  11. Re: offset-based hash table for ASCII data

    On Apr 15, 2:30 pm, Rainer Weikusat wrote:
    >
    > The original problem was that _measurements_ had proven that the need
    > to parse an XML-representation was too expensive for this particular
    > application. ...


    Slight quibble here in that while measurements may have
    shown parsing to be too expensive for a parse-in-every-child
    implementation, the same might not hold
    true in an implementation where parsing is done once on the
    client and then shared with the children (particularly
    if those are child processes in the UNIX sense). The existence
    of "a working implementation which sends the data
    in an XML format" (implying existing C parsing code and internal
    representation) would only make me look further in that direction.

  12. Re: offset-based hash table for ASCII data

    Rainer Weikusat wrote:
    > Mark Space writes:
    >> Rex Mottram wrote:
    >>> I'm looking for an offset-based data structure to hold character
    >>> data.

    >> I'm not sure what you mean by "offset based data structure."

    >
    > This fairly obviously means 'replacing pointers by offsets relative to
    > the start of the data structure' (because pointer values usually
    > cannot even be meaningfully communicated between different processes
    > running on the same machine, let alone processes running on different
    > machines).


    Is that all you need? The Java version should be a tad easier, actually.
    Note the last loop uses nothing but offsets from the buffer to print
    out the data.

    If you where hoping for the magic library that does this for you, I
    think it's called "hands on keyboard."


    lut_test a 12 longer_string_test C

    Total buffer size: 63
    Numer of entries: 5
    String 0: filename
    String 1: a
    String 2: 12
    String 3: longer_string_test
    String 4: C

    /*
    * File: lut_test.c
    *
    * Created on April 15, 2008, 12:43 PM
    */

    #include
    #include

    struct pre_parsed {
    int size;
    int length;
    int indexes[];
    };

    int main(int argc, char** argv) {

    struct pre_parsed *buffer;
    char ** argv_copy = argv;

    if( argc < 2 )
    {
    fprintf( stderr, "Usage: lut_test outfile string_list\n");
    }
    size_t total_string_size = 0;

    while( *++argv )
    {
    size_t len = strlen( *argv );
    total_string_size += len+1;
    }

    // Test some values in debugger

    size_t struct_size = sizeof (struct pre_parsed);
    size_t int_size = sizeof (int);
    size_t array_size = sizeof (int) * (argc-1);

    // Build the buffer

    buffer = malloc( sizeof (struct pre_parsed) + sizeof (int) *
    (argc-1) + total_string_size );

    (*buffer).length = argc -1;
    (*buffer).size = sizeof (struct pre_parsed) + sizeof (int) *
    (argc-1) + total_string_size;
    int index = 0;
    size_t offset = sizeof (struct pre_parsed)
    + sizeof (int) * (argc-1);
    while( *++argv_copy )
    {
    (*buffer).indexes[index] = offset;
    strcpy( (char*)(buffer + offset), *argv_copy );
    offset += strlen( *argv_copy ) + 1;
    index++;
    }

    // Read back from the buffer

    printf( "Total buffer size: %d\n", (*buffer).size );
    printf( "Numer of entries: %d\n", (*buffer).length );
    int i;
    for( i = 0; i < (*buffer).length; i++ )
    {
    printf( "String %d: %s\n", i,
    (buffer + (*buffer).indexes[i]) );
    }

    return (EXIT_SUCCESS);
    }

  13. Re: offset-based hash table for ASCII data

    On Tue, 15 Apr 2008 08:15:06 -0400, Rex Mottram wrote:

    >I'm looking for an offset-based data structure to hold character data.
    >Background: I'm working with an app whose server is written in Java
    >while the client is in C. The server needs to package up a bunch of data
    >and ship it to the client where it will be traversed, read-only, many
    >many times.

    If it is read-only then why do you need to parse it many times? Parse
    it once at the client end and let all the child processes access the
    result.

    >There's actually a working implementation which sends the
    >data in an XML format, but it's too slow and profiling reveals that it
    >spends almost all its time in XML parsing, so I need to replace the XML
    >with something more natural to the purpose.

    Is that actually CPU use in parsing or in network IO? If IO then
    compression might be faster - probably a long shot.


    >
    >The client starts thousands of child processes which each start by
    >parsing the XML document in order to place its data in hash tables,
    >after which it is accessed via the hash table only. So the XML is very
    >much an intermediate format, and the obvious win would be to have the
    >Java server package the data in a form which is directly usable by the
    >clients. Presumably the data would arrive from the server and be stored
    >in a file: each client would then map the file into its address space
    >and treat it as a hash table (or similar data structure) right away.

    Why are you hashing to memory addresses (or pointers)? Why not hash
    to array indexes which immediately makes the whole structure
    relocatable in memory since everything is indexed from the start of
    the array. You would need fixed-length data for this. Would
    fixed-length data be a problem? A C union could handle otherwise
    different types in a single array.

    Would comma separated be simpler to parse than XML? It does not have
    as much overhead and is generally less verbose.

    >
    >The traditional issue with transportable data structures is that since
    >the client can't reliably control what address the data is mapped to,
    >all addressing must be made relative to the starting point. Does anyone
    >know of an implementation of such a format which can be generated in
    >Java and consumed in C code?

    Heap, sorted array + binary search, hash to an array address are all
    possibilities.

    I suspect that the biggest saving would be parse once rather than
    parse many. If you only parse once then you can do a lot of
    complicated setup at the client end and the work is amortised over the
    many children.

    rossum

    >
    >TIA,
    >RM



  14. Re: offset-based hash table for ASCII data

    rossum wrote:

    > Would comma separated be simpler to parse than XML? It does not have
    > as much overhead and is generally less verbose.


    That's actually a pretty good idea. Much simpler and more portable.

  15. Re: offset-based hash table for ASCII data

    Mark Space wrote:
    > Rex Mottram wrote:
    >> I'm looking for an offset-based data structure to hold character data.

    > It sounds like you want just plain old binary data.


    Possibly you missed the phrase "ASCII data" cleverly hidden in the
    subject line, and likewise "character data" in the first sentence? :-)

    Anyway, in response to you and a few other posters, yes, parsing the XML
    once into some other format at the client end would indeed solve the
    problem mentioned. But although you argue that would be more flexible, I
    think it would result in a Rube Goldberg type of system with two
    different representations needing to be kept in sync. Not very robust
    programming practice IMHO.

    Also, again in response to you and a few others, network bandwidth is
    not an issue. I.e. moving the XML document, even if it's large (and it
    often is) is not a significant cost, and for the record it's already
    transported in compressed form. It's the multi-parsing that's a problem.

    I could of course write up my own index-based ADT in C, but would much
    prefer not to. There are a number of free ADT packages out there - I'm
    currently using one called Kazlib which is very nice and I have notes on
    6-10 others which also look good. But all these use pointers, with the
    cited problem.

    There actually an open source C library which was written to be
    offset-based. I remember seeing it a few years ago. I haven't gone
    digging for it yet because I was hoping to find something I could build
    up from the Java side. Will go see if I can find that next.

    RM

  16. Re: offset-based hash table for ASCII data

    Rex Mottram wrote:

    > I could of course write up my own index-based ADT in C, but would much
    > prefer not to. There are a number of free ADT packages out there - I'm
    > currently using one called Kazlib which is very nice and I have notes on
    > 6-10 others which also look good. But all these use pointers, with the
    > cited problem.



    Ah ha. And specifically you want to serialize and deserialize your
    ADTs. Between C and Java. Nothing like picking the implementation
    first then going looking for solutions.

    What you're asking for is called pointer swizzling, btw, and the general
    case seems to be very hard, which may be why you aren't find any libraries.

    And no I don't know any Java packages for that. Java programmers seem
    to prefer general solutions that always work (XML) to light weight
    solutions that may impose some restrictions. Sorry about that.

    If you find anything interesting, let us know.

    You might try some custom de-/serialization on the Java side, so you can
    control the format, and produce something easier to parse in C. Just a
    thought.

  17. Re: offset-based hash table for ASCII data

    Rex Mottram wrote:
    > I'm looking for an offset-based data structure to hold character data.
    > Background: I'm working with an app whose server is written in Java
    > while the client is in C. The server needs to package up a bunch of data
    > and ship it to the client where it will be traversed, read-only, many
    > many times. There's actually a working implementation which sends the
    > data in an XML format, but it's too slow and profiling reveals that it
    > spends almost all its time in XML parsing, so I need to replace the XML
    > with something more natural to the purpose.


    Let me see if I can understand the problem statement. It sounds like
    you do not need a hash table specifically, but what you actually need
    is a way to map string values to other string values. As far as I know,
    you do not say you need to be able to iterate over the keys, although
    it's a common requirement, so you might need it.

    If it is true that that is all you actually need, removing the artificial
    constraint of a hash might open up some implementation possibilities.

    For example, you could send a sorted list of pairs of C strings concatenated
    together, by which I mean 8-bit bytes terminated by null bytes. The client
    side would be very easy to implement:

    * load the entire image (i.e. message) into a contiguous area of memory
    * make a pass through it to count pairs of strings (which is equivalent
    to counting the zero bytes in the block of memory, then dividing by two)
    * allocate an array of pairs of pointers (of type "const char *")
    * make another pass and write the memory address of every string into
    the array of pointers
    * do key lookup via binary search

    Note that this approach does not require encoding any integers or offsets
    at all into the message. It is O(N) to parse it, but that is no worse
    asymptotically than the time required to receive it over the network.

    Another alternative would be to build a trie on the server side, where
    states in the trie's state machine (I think of a trie as a finite state
    machine that recognizes strings) are represented by byte offsets from the
    beginning of the image. The client could then do lookups on the structure
    directly (except that to move to the next state, it would have to add an
    offset, but that is no big deal).

    - Logan


  18. Re: offset-based hash table for ASCII data

    Mark Space wrote:

    > You might try some custom de-/serialization on the Java side, so you can
    > control the format, and produce something easier to parse in C. Just a
    > thought.


    Oh, and the hash table itself in a Java hash map isn't serializable,
    iirc. So if you're expecting to send the hash table over directly,
    rather than just the objects it contains, you'll have to roll your own
    implementation or do something similar.

  19. Re: offset-based hash table for ASCII data

    Mark Space wrote:
    > Mark Space wrote:
    >
    >> You might try some custom de-/serialization on the Java side, so you
    >> can control the format, and produce something easier to parse in C.
    >> Just a thought.

    >
    > Oh, and the hash table itself in a Java hash map isn't serializable, iirc.


    You don't.

    public class HashMap
    extends AbstractMap
    implements Map, Cloneable, Serializable

    > So if you're expecting to send the hash table over directly,
    > rather than just the objects it contains, you'll have to roll your own
    > implementation or do something similar.


    Or just use the HashMap built-in implementation of Serializable.

    Every listed subclass of AbstractMap except WeakHashMap is Serializable.

    --
    Lew

  20. Re: offset-based hash table for ASCII data

    Rex Mottram writes:
    > Mark Space wrote:
    >> Rex Mottram wrote:
    >>> I'm looking for an offset-based data structure to hold character
    >>> data.

    >> It sounds like you want just plain old binary data.

    >
    > Possibly you missed the phrase "ASCII data" cleverly hidden in the
    > subject line, and likewise "character data" in the first sentence?
    > :-)


    Possibly, you missed the 'hash table' cleverly hidden in the subject
    line. This means 'plain old binary data', ie a table of pointers.

    > Anyway, in response to you and a few other posters, yes, parsing the
    > XML once into some other format at the client end would indeed solve
    > the problem mentioned. But although you argue that would be more
    > flexible, I think it would result in a Rube Goldberg type of system
    > with two different representations needing to be kept in sync.


    If it would, then this would be the type of system you already have,
    because you already have one representation on the server, a transport
    representation and a representation on the client. That you
    (presently) construct identical client representations x times in
    parallell and manage to overwhelm the 'client machine' with this
    nonsense doesn't change the structure of the application, just its
    "simple-mindedness". Parsing the XML-steganogram once and re-using it
    x - 1 times would be by far the simplest solution from a coding
    standpoint (as someone else already wrote).

    > Also, again in response to you and a few others, network bandwidth is
    > not an issue. I.e. moving the XML document, even if it's large (and it
    > often is) is not a significant cost,


    .... because, as you need to understand, this is MY PRIVATE GIG-E LAN
    which is been solely provided to transport MY XML (and it has excess
    bandwidth for that) ...

    It is not possible to assess the significance of the 'cost' of a
    particular communication isolated. Networks are usually used by
    multiple (and often, many) computers for multiple (often many)
    applications in parallell.

    > and for the record it's already transported in compressed form.


    If network usage is not of concern, why do you bother to compress the
    XML? Not doing so would be another easy way to make the processing on
    both client and server much simpler.

    > It's the multi-parsing that's a problem.


    Well, then don't.

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast