Reading sequential files in COBOL vs SQL - IBM AS400

This is a discussion on Reading sequential files in COBOL vs SQL - IBM AS400 ; Hi everyone, I'm writing an OPM COBOL program that needs to process a large number of sequential files (around 4,000 records). It's a parsing algorithm that needs to find a few records at an unpredictable location in the file. Obviously ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: Reading sequential files in COBOL vs SQL

  1. Reading sequential files in COBOL vs SQL


    Hi everyone,

    I'm writing an OPM COBOL program that needs to process a large number
    of sequential files (around 4,000 records). It's a parsing algorithm
    that needs to find a few records at an unpredictable location in the
    file. Obviously in COBOL I'd having to allow for reading a high
    percentage of the records in each of the files.

    Alternativey I could embed the SQL.code inside of the COBOL.

    Which would be faster? Is the answer obvious or should I test it both
    ways. Would the relative speed depend on other factors, ie size of
    computer, memory, hard disk and OS400 version? The program will be run
    on many different machines.

    Thanks

    Elliot


  2. Re: Reading sequential files in COBOL vs SQL


    "Elliot" wrote in message news:1188399391.593464.227160@r34g2000hsd.googlegr oups.com...
    |
    | Hi everyone,
    |
    | I'm writing an OPM COBOL program that needs to process a large number
    | of sequential files (around 4,000 records). It's a parsing algorithm
    | that needs to find a few records at an unpredictable location in the
    | file. Obviously in COBOL I'd having to allow for reading a high
    | percentage of the records in each of the files.
    |
    | Alternativey I could embed the SQL.code inside of the COBOL.
    |
    | Which would be faster? Is the answer obvious or should I test it both
    | ways. Would the relative speed depend on other factors, ie size of
    | computer, memory, hard disk and OS400 version? The program will be run
    | on many different machines.
    |
    | Thanks
    |
    | Elliot
    |


    Elliot,

    I think any experiments on any hardware / software configuration will yield results like those I predict below.

    If the files are already in the database and have an index that would allow keyed retrieval of the records you seek then SQL would
    be much faster than a COBOL or RPG sequential search.

    If the data is already in database files but there are no useful indexes, an SQL search might be a little faster because it uses
    lower level file access code than COBOL sequential processing but it will still have to read the whole file to find all the records.

    If the files are in the IFS and would need to be copied into a database for an SQL search, then a sequential search of them where
    they sit will be about twice as fast as a copy to DB2 followed by a search.

    I hope this helps,

    Mike Sicilian




  3. Re: Reading sequential files in COBOL vs SQL


    "Mike" wrote in message news:UygBi.30925$RX.30834@newssvr11.news.prodigy.n et...
    |
    | "Elliot" wrote in message news:1188399391.593464.227160@r34g2000hsd.googlegr oups.com...
    ||
    || Hi everyone,
    ||
    || I'm writing an OPM COBOL program that needs to process a large number
    || of sequential files (around 4,000 records). It's a parsing algorithm
    || that needs to find a few records at an unpredictable location in the
    || file. Obviously in COBOL I'd having to allow for reading a high
    || percentage of the records in each of the files.
    ||
    || Alternativey I could embed the SQL.code inside of the COBOL.
    ||
    || Which would be faster? Is the answer obvious or should I test it both
    || ways. Would the relative speed depend on other factors, ie size of
    || computer, memory, hard disk and OS400 version? The program will be run
    || on many different machines.
    ||
    || Thanks
    ||
    || Elliot
    ||
    |
    |
    | Elliot,
    |
    | I think any experiments on any hardware / software configuration will yield results like those I predict below.
    |
    | If the files are already in the database and have an index that would allow keyed retrieval of the records you seek then SQL would
    | be much faster than a COBOL or RPG sequential search.
    |
    | If the data is already in database files but there are no useful indexes, an SQL search might be a little faster because it uses
    | lower level file access code than COBOL sequential processing but it will still have to read the whole file to find all the
    records.
    |
    | If the files are in the IFS and would need to be copied into a database for an SQL search, then a sequential search of them where
    | they sit will be about twice as fast as a copy to DB2 followed by a search.
    |
    | I hope this helps,
    |
    | Mike Sicilian

    Elliot,

    One additional thought. If your program can quit the search as soon as you find the record you are looking for, then a COBOL
    sequential search will be almost twice as fast as SQL because on average you will quit after reading half the file and SQL will
    always read the whole file.

    Mike Sicilian





  4. Re: Reading sequential files in COBOL vs SQL

    "Mike" writes:

    > sequential search will be almost twice as fast as SQL because on
    > average you will quit after reading half the file and SQL will
    > always read the whole file.


    Some databases allow a clause stating how many results you need, so
    that it can terminate when those have been generated.

    Would DB2/400 have this?
    --
    Thorbjørn Ravn Andersen

  5. Re: Reading sequential files in COBOL vs SQL

    Mike wrote:
    > "Mike" wrote in message news:UygBi.30925$RX.30834@newssvr11.news.prodigy.n et...
    > |
    > | "Elliot" wrote in message news:1188399391.593464.227160@r34g2000hsd.googlegr oups.com...
    > ||
    > || Hi everyone,
    > ||
    > || I'm writing an OPM COBOL program that needs to process a large number
    > || of sequential files (around 4,000 records). It's a parsing algorithm
    > || that needs to find a few records at an unpredictable location in the
    > || file. Obviously in COBOL I'd having to allow for reading a high
    > || percentage of the records in each of the files.
    > ||
    > || Alternativey I could embed the SQL.code inside of the COBOL.
    > ||
    > || Which would be faster? Is the answer obvious or should I test it both
    > || ways. Would the relative speed depend on other factors, ie size of
    > || computer, memory, hard disk and OS400 version? The program will be run
    > || on many different machines.
    > ||
    > |
    > | Elliot,
    > |
    > | I think any experiments on any hardware / software configuration will yield results like those I predict below.
    > |
    > | If the files are already in the database and have an index that would allow keyed retrieval of the records you seek then SQL would
    > | be much faster than a COBOL or RPG sequential search.
    > |
    > | If the data is already in database files but there are no useful indexes, an SQL search might be a little faster because it uses
    > | lower level file access code than COBOL sequential processing but it will still have to read the whole file to find all the
    > records.
    > |
    > | If the files are in the IFS and would need to be copied into a database for an SQL search, then a sequential search of them where
    > | they sit will be about twice as fast as a copy to DB2 followed by a search.
    > |
    > | I hope this helps,
    > |
    > | Mike Sicilian
    >
    > Elliot,
    >
    > One additional thought. If your program can quit the search as soon as you find the record you are looking for, then a COBOL
    > sequential search will be almost twice as fast as SQL because on average you will quit after reading half the file and SQL will
    > always read the whole file.
    >
    > Mike Sicilian
    >


    Why will SQL "always read the whole file"? If the program declares an
    SQL cursor and does FETCH requests (for example), it should be possible
    to quit before fetching all rows.

    The OP's problem details are a little sketchy, but the best alternative
    may depend on whether SQL can be used to some advantage. If the records
    of interest are "at unpredictable locations" in the file, such that by
    definition rows must be read sequentially from beginning of file as
    input to the parsing algorithm, there may not be much advantage to
    fetching rows via SQL vs native I/O. If OTOH part or all of the
    algorithm could be implemented by SQL (eg the WHERE clause of a SELECT),
    it may be more efficient because SQL could filter out extraneous
    rows/columns and only return pertinent data to the program.

    If the data resides in keyed physical files, or in PFs having associated
    keyed logical files, keyed access may be possible using native I/O (vs
    sequential reads).

    Re SQL clauses to specify needed results, these might apply:
    FETCH FIRST n ROWS
    OPTIMIZE FOR n ROWS
    http://publib.boulder.ibm.com/infoce...zmsth2clcu.htm

    --
    Karl Hanson


  6. Re: Reading sequential files in COBOL vs SQL


    "Karl Hanson" wrote in message news:46d5fa6a$1@kcnews01...
    | Mike wrote:
    | > "Mike" wrote in message news:UygBi.30925$RX.30834@newssvr11.news.prodigy.n et...
    | > |
    | > | "Elliot" wrote in message news:1188399391.593464.227160@r34g2000hsd.googlegr oups.com...
    | > ||
    | > || Hi everyone,
    | > ||
    | > || I'm writing an OPM COBOL program that needs to process a large number
    | > || of sequential files (around 4,000 records). It's a parsing algorithm
    | > || that needs to find a few records at an unpredictable location in the
    | > || file. Obviously in COBOL I'd having to allow for reading a high
    | > || percentage of the records in each of the files.
    | > ||
    | > || Alternativey I could embed the SQL.code inside of the COBOL.
    | > ||
    | > || Which would be faster? Is the answer obvious or should I test it both
    | > || ways. Would the relative speed depend on other factors, ie size of
    | > || computer, memory, hard disk and OS400 version? The program will be run
    | > || on many different machines.
    | > ||
    | > |
    | > | Elliot,
    | > |
    | > | I think any experiments on any hardware / software configuration will yield results like those I predict below.
    | > |
    | > | If the files are already in the database and have an index that would allow keyed retrieval of the records you seek then SQL
    would
    | > | be much faster than a COBOL or RPG sequential search.
    | > |
    | > | If the data is already in database files but there are no useful indexes, an SQL search might be a little faster because it
    uses
    | > | lower level file access code than COBOL sequential processing but it will still have to read the whole file to find all the
    | > records.
    | > |
    | > | If the files are in the IFS and would need to be copied into a database for an SQL search, then a sequential search of them
    where
    | > | they sit will be about twice as fast as a copy to DB2 followed by a search.
    | > |
    | > | I hope this helps,
    | > |
    | > | Mike Sicilian
    | >
    | > Elliot,
    | >
    | > One additional thought. If your program can quit the search as soon as you find the record you are looking for, then a COBOL
    | > sequential search will be almost twice as fast as SQL because on average you will quit after reading half the file and SQL will
    | > always read the whole file.
    | >
    | > Mike Sicilian
    | >
    |
    | Why will SQL "always read the whole file"? If the program declares an
    | SQL cursor and does FETCH requests (for example), it should be possible
    | to quit before fetching all rows.
    |
    | The OP's problem details are a little sketchy, but the best alternative
    | may depend on whether SQL can be used to some advantage. If the records
    | of interest are "at unpredictable locations" in the file, such that by
    | definition rows must be read sequentially from beginning of file as
    | input to the parsing algorithm, there may not be much advantage to
    | fetching rows via SQL vs native I/O. If OTOH part or all of the
    | algorithm could be implemented by SQL (eg the WHERE clause of a SELECT),
    | it may be more efficient because SQL could filter out extraneous
    | rows/columns and only return pertinent data to the program.
    |
    | If the data resides in keyed physical files, or in PFs having associated
    | keyed logical files, keyed access may be possible using native I/O (vs
    | sequential reads).
    |
    | Re SQL clauses to specify needed results, these might apply:
    | FETCH FIRST n ROWS
    | OPTIMIZE FOR n ROWS
    | http://publib.boulder.ibm.com/infoce...zmsth2clcu.htm
    |
    | --
    | Karl Hanson
    |

    Karl.

    It appears I was wrong and I apologize to you and the group and thank you for helping me understand what really happens.

    Based on somewhat old info when I looked into a similar question, I had the impression that on the System-i, when you execute a
    SELECT statement and there is no suitable index, the system builds an index, which involves processing the whole file, and then the
    index is read to satisfy the SELECT search criteria. I think this is / was done because there is an assumption that further SELECT
    requests will be made which makes this an efficient approach.

    However, I agree that your reference to the OPTIMIZE clause implies that the optimizer may be smart enough to handle this special
    case. Does anyone know if it is really that smart?

    Thanks for waking me up to this possibility.

    Mike Sicilian











  7. Re: Reading sequential files in COBOL vs SQL

    And the SQL can also take advantage of parallel read algorithms,
    built into the DB2 for i5/OS query engine, to obtain the results
    quicker. Whereas in the non-SQL case, any parallel read actions would
    have to be coded into/as part of the application itself. So just as it
    might be generally considered easier to code the SQL request, it is also
    easier for the SQL to take advantage of an optimizer and underlying
    query features.
    If the files are all small [it is unclear if the 4K records was a
    statement of each file or the number of files as obtained from 4K rows
    identifying the files of an unknown number of records], then the command
    SETOBJACC may prove beneficial for either row level or SQL access.

    Regards, Chuck
    --
    All comments provided "as is" with no warranties of any kind
    whatsoever and may not represent positions, strategies, nor views of my
    employer

+ Reply to Thread