Literate Programming under CP/M - CP/M

This is a discussion on Literate Programming under CP/M - CP/M ; LITPROG.WS4 by Emmanuel ROCHE ----------- Several months ago, I published a message on the comp.os.cpm Newsgroup explaining what "Literate Programming" is. I have finally been able to implement this concept under CP/M. As far as I know, this is the ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: Literate Programming under CP/M

  1. Literate Programming under CP/M

    LITPROG.WS4 by Emmanuel ROCHE
    -----------

    Several months ago, I published a message on the comp.os.cpm
    Newsgroup explaining what "Literate Programming" is.

    I have finally been able to implement this concept under CP/M.

    As far as I know, this is the first time that it is done...

    Until now, Literate Programming was the domain of "Mainframes".

    This concept was invented by one Donald Knuth, well known in the
    USA.

    I hope that this CP/M implementation will generate as much
    development as the mainframe one.

    Don Knuth dixit (1984): "WEB itself is chiefly a combination of
    two other languages: (1) a document formatting language and (2)
    a programming language. My prototype WEB system uses TEX as the
    document formatting language and Pascal as the programming
    language, but the same principles would apply equally well if
    other languages were substituted. Instead of TEX, one could use
    a language like Scribe or Troff; instead of Pascal, one could
    use ADA, ALGOL, LISP, COBOL, FORTRAN, APL, C, etc., or even
    assembly language."

    The fundamental idea of Literate Programming is that programs
    should be generated from one single source file. Here,
    "programs" mean all the files that have been proved to be
    necessary to write, debug, and maintain a program: its source
    code in a particular programming language, its user's manual,
    and its programmer's manual.

    Instead of having at least 3 different files, with the inherent
    difficulties in synchronizing them, there should be only one
    source file, with a way to generate (at least) a document
    explaining the program, and the file used to generate the
    executable with a compiler or assembler.

    Above, Knuth mentions Scribe and Troff, which are "document
    formatting languages". Before formatting, we must create the
    "document". Under CP/M, the standard word processor used to
    create "documents" (note that Micro**** kept using the DOC
    filetype, after leaving CP/M) is WordStar. At the beginning, WS
    was a screen-oriented version of ED. Like ED, it was using
    (Digital Research) TEX for "formatting" on the printer its
    files. It was doing this by inclusion of "dot commands" at the
    beginning of separate lines providing commands to TEX, not to
    ED.

    When WordStar met with success, the dot commands of TEX were
    incorporated into it, with many new commands, in particular for
    "merge printing" (allowing to generate several files from one
    "document" file and a data file -- like an address book). The
    merge printing capacity of WordStar is amazing, and is more
    powerful than a Tiny BASIC! But I decided to keep it simple, so
    not to use it.

    Regarding programming languages, Don Knuth does not mention
    BASIC but, under CP/M and microcomputers, BASIC is the most
    often used language, by far.

    So, since I want to keep using CP/M(-86) Plus, I decided to
    implement "Literate Programming" under CP/M, using WordStar and
    BASIC.

    Unfortunately, BASIC happens to be one of the less suitable
    languages for Literate Programming. Why? Because, (1) when you
    explain a program, you don't always do it from beginning to end,
    and (2) when you develop a program, you keep changing it. And
    the problem is that BASIC uses "line numbers" for 2 purposes:
    (1) editing, and (2) destinations for conditional GOTO
    statements. As a consequence, BASIC is obliged to have a
    "RENUMber" command, used everytime you insert/delete one or more
    line(s). But WordStar has no such command and, if you mention
    one line number in your explanation, it is highly probable that
    sooner or later, the explanation will be out of sync with the
    program, precisely the thing that Literate Programming is
    supposed to prevent!

    One way, for BASIC fans, to use Literate Programming is to use
    REM keywords like this:

    330 REM Pstring Subroutine/Function/etc

    Then, in the WS4 file, you should only refer to "Pstring", not
    to the line number (330). And, in the BASIC program, each time
    you use one line number to jump, you must follow the line number
    by its name-in-full. Example:

    450 IF echo THEN GOTO 330 ' Pstring

    This works fine for BASICs using line numbers. Some BASICs, like
    CBASIC and CB-8x, don't use line numbers, but labels. Other
    descendants from BASIC, like COMAL, use line numbers only for
    editing, but save programs code without line numbers, and use
    labels for all their conditional statements.

    All this was explaining what to do for a "straight-line" program
    using line numbers. But sometimes you don't want to explain your
    program from beginning to end. Or, like Pascal, the programming
    language obliges you to declare all the variables used, then the
    procedures, then finally the "main" program. (Pascal does this
    because it is a single-pass compiler: as a result, by the way
    the compiler reaches "main", it must know all the procedures
    used, and the procedures must know the variables used.)

    In order to keep things simple, this first implementation of
    Literate Programming for CP/M does not manage the handling and
    re-arranging of separate procedures or modules. The program is
    assumed to proceed logically from beginning to end.

    However, note that some CP/M programming languages, like Forth,
    LISP, Logo, APL, use the "workspace" concept and, thus, don't
    care in which order the elements of a program are loaded. Forth
    uses a "dictionary" to store its "words", and does not mind in
    which order they are stored. Logo uses a "workspace" containing
    all its procedures. Each procedure can call the others. For
    Logo, each procedure can be a program. Digital Research Dr. Logo
    even have some "primitives" enabling it to re-arrange the order
    in which procedures are listed, when you display everything that
    is present in memory.

    Conclusion: the less line numbers, the more labels used, the
    better for Literate Programming.

    But how to implement this concept under CP/M, using only
    WordStar and BASIC?

    10 REM LITPROG.BAS by Emmanuel ROCHE
    20 :

    Like my WordStar files, all my BASIC programs start with a name
    (with my name, shame on me! I dont follow the "egoless
    programming" principles of Gerald Weinberg). All my WS4 files
    also end with a "EOF". It is only later that I discovered that
    PL/M was doing the same. I don't know why, but it just seems
    more logical to end files like this. Maybe it is because I
    started with paper tapes on an ASR-33 Teletype?

    30 PRINT
    40 INPUT "LitProg> Enter WS4 File Name: " ; file$
    50 PRINT
    60 file1$ = file$ + ".WS4"
    70 nofile$ = FIND$ (file1$)
    80 IF nofile$ = "" THEN PRINT CHR$ (7) "File not found." : PRINT : END
    90 OPEN "R", #1, file1$, 1
    100 FIELD #1, 1 AS byte$
    110 :

    As can be seen, the above lines deal with the WS4 file, get its
    name from the user, make sure the correct filetype is used,
    check that it is present on the disk, open it, and get one byte.

    120 ln = 0
    130 echo = 0 ' = FALSE

    Some variables. LN = Line Number, echo = echo to screen switch.
    (0 = FALSE, NOT FALSE = TRUE = -1)

    140 ' Trick if we use a WHILE NOT EOF...
    150 GET #1
    160 GOSUB 450
    170 :
    180 WHILE NOT EOF (1)
    190 GET #1
    200 IF ASC (byte$) = &H1A THEN GOTO 770
    210 GOSUB 450
    220 WEND
    230 :

    This is getting technical. Random access files, contrary to
    sequential files, can start by a 1A hex byte, which happens to
    be the byte used by CP/M to signal the end of a file... Since
    most people (including me) find it more logical to use a WHILE
    NOT EOF statement to loop, we are obliged to read the first byte
    (GET #1) BEFORE using the WHILE... That's all! This also oblige
    us to use a GOSUB, which make the logic of the program clearer.
    (Notice that this first implementation starts by not conforming
    to its principles... According to the rules, each GOSUB 450
    should be followed by a comment explaining the purpose of the
    subroutine... It is a nice demonstration that a working program
    can be not enough documented.)

    240 ' Echo ASCII char to screen?
    250 line$ = line$ + STRIP$ (byte$)
    260 RETURN
    270 :
    280 ' Echo char > ASCII to screen?
    290 GET #1
    300 line$ = line$ + byte$
    310 GET #1
    320 RETURN
    330 :

    Here, to explain why we are using 2 separate subroutines, it
    must be known that WordStar uses 2 different ways of storing
    characters in its files. Each word is ending by setting the Most
    Significant Bit (to 1) of its last character. That's why the
    STRIP$ keyword is needed, to display correctly the ASCII
    character. As for characters above the ASCII set, they are
    surrounded by a 1Bh byte (hence the first GET #1), then followed
    by a 1Ch byte (hence the second GET #1). The byte read can be
    any value, even ASCII. This is the mechanism used by WordStar to
    deal with characters outside the ASCII set (control characters,
    and the 128 characters above the ASCII set).

    Why? Because WordStar started as an American program, and the
    Americans only use a 7-bit character set (known as "USASCII").
    One reason was that all the communication lines (in the USA)
    were using 7 bits for data, and 1 bit for checking. So, the
    creator of WordStar was setting some most significant bit to 1,
    since he assumed that they would never be used... Some years
    later, many new users needed to use more characters (at the
    beginning, it was hoped that "stacking" characters upon each
    others (like on a printer) would be enough, hence the existence
    of several characters (like the famous "tilde") but, in the end,
    the "powers that be" decided to use more characters, so the IBM
    Clown (for example) was designed for a 8-bit character set.
    Hence the need for this (inefficient) way for WordStar to deal
    with them, by surrounding them with 2 other (control)
    characters...

    To finish the explanation, we should say that the "Echo ASCII
    char to screen?" comment was relevent to a previous version of
    the program (WS4ASC), which contained PRINT byte$ commands. That
    is to say: it (WS4ASC) was working at the byte level, when
    LITPROG now works at the line level. That's why it is adding
    characters into a line$. Another example of "legacy software"
    (as the Americans say) where the comments no longer match the
    logic of a program...

    340 ' Get rid of WS4 internal commands.
    350 i$ = CHR$(9)+CHR$(10)+CHR$(13)+CHR$(27)+CHR$(155)
    360 i = INSTR (i$, byte$)
    370 REM Bytes: 09 0A 0D 1B 9B
    380 ON i GOSUB 250, 250, 250, 290, 290
    390 IF ASC (byte$) = &H82 THEN RETURN
    400 IF ASC (byte$) > &H1F THEN GOSUB 250
    410 IF ASC (byte$) = &HA THEN GOSUB 690 : line$ = ""
    420 IF ASC (byte$) = &H1A THEN GOTO 770
    430 RETURN
    440 :

    Obviously a very complex loop, whose logic was found online, at
    the terminal. I don't remember how I programmed WS4ASC but, upon
    finding that the ASCII TAB (09), LF (0A), and CR (0A) characters
    needed to be displayed, I added an INSTR instruction to deal
    with them thanks to the above-mentioned subroutine. (I could
    have added another subroutine just to display ASCII characters,
    but decided to use this subroutine making sure that the most
    significant bit is cleared.) Later tests showed that WordStar
    was, sometimes, setting the most significant bit of 1B, hence
    the 9B (9B = 1B + 80 hex). Re-read the explanation of WordStar's
    use of 2 different ways to store characters.

    The remaining IF-THEN lines were added while testing the
    program. WS4ASC did not have the IF ASC (byte$) = &H1A THEN GOTO
    730, but it was found that, sometimes, the program was looping
    forever at the end of a file. This line corrected the problem.
    You will note that the above 3 lines are in descending order
    (82, 1F, 0A -- for some unknown reasons, my BASIC does not want
    to display &H0A, and keep on removing the "0", hence the
    resulting &HA. This is an example of a language-specific
    quirk!).

    Another thing must be explained: The &HA value correspond to 0A,
    that is to say: ASCII LF (Line Feed), which happens to be the
    character ending each line... Hence the jump to display it, and
    to re-initialise it (else, the 256 character limit for strings
    is quickly reached!). This line was not present in WS4ASC, and
    was enough to change the logic of the program, from byte level
    to line level. As usual, we put aside, in a subroutine, the
    logic needed to implement the action needed by reaching the end
    of the line.

    450 IF byte$ = "." THEN GOTO 550
    460 ' WS4 text
    470 GOSUB 350
    480 WHILE ASC (byte$) <> &HA
    490 GET #1
    500 GOSUB 350
    510 IF ASC (byte$) = &H8A THEN RETURN
    520 WEND
    530 :
    540 ' Dot commands
    550 WHILE ASC (byte$) <> &HA
    560 GET #1
    570 GOSUB 350
    580 IF ASC (byte$) = &H8A THEN RETURN
    590 WEND
    600 RETURN
    610 :

    Finally the main loop! Again, we prefer to use WHILE loops to
    govern the actions of the program. Notice how many times LF
    (&HA) is involved (and 8A = 0A + 80 hex). In case the WS4 file
    starts by a "dot command", we test for it, then proceed the
    following characters, thinking that they will most probably be
    lines of text, or dot commands followed by their parameters. The
    difference being that text is displayed, while dot commands are
    not. Hence the absence of a GOSUB 350 in the later case. (Again,
    note the lack of a comment after the line number... I should
    really rewrite this program!)

    620 ' Process ..TY dot command
    630 type$ = RIGHT$ (line$, 5)
    640 type$ = LEFT$ (type$, 3) ' Remove CR/LF
    650 file2$ = file$ + "." + type$
    660 IF ln = 0 THEN OPEN "O", #2, file2$
    670 RETURN
    680 :
    690 IF LEFT$ (UPPER$ (line$), 3) = ".TY" THEN GOSUB 630
    700 ln = ln + 1
    710 IF LEFT$ (line$, 3) = ".*/" THEN echo = 0
    720 IF LEFT$ (line$, 3) = "./*" THEN echo = NOT echo : GOTO 750
    730 IF echo THEN PRINT line$ ;
    740 IF echo THEN PRINT #2, line$ ;
    750 RETURN
    760 :

    The lines added to WS4ASC to make it a LITPROG utility.

    As can be see, there are (at present) 3 LEFT$ lines. They are
    processing the 3 commands used (at present) to implement
    Literate Programming with WordStar.

    You may remember that WordStar uses "dot commands" for some of
    its work. Since WordStar is really a good program, its
    designer(s) provided for the need of commenting WS4 files. This
    can be done by either ".IG" or ".." dot commands. So, I decided
    to use ".." commands for Literate Programming, leaving "."
    commands to WordStar. This way, Literate Programming dot
    commands appears as comments to WordStar, which leaves them
    alone.

    You may remember that, at the start, the program contains
    several lines dealing with the (input) WS4 file. I could have
    "hard-coded" the (output) filetype used (for example: BAS) but,
    then, I would have needed to have a LPBAS program, then a LPLOGO
    program, then a LPASM program, etc. One for each programming
    language used. NO. I thought it much simpler to let the program
    select "automatically" the filetype to be used.

    By the way, one more explanation: WordStar, by setting the last
    character of some words, or some control characters, is no
    longer producing an ASCII 7-bit file, but an 8-bit file. But,
    for historical reasons, most compilers were designed to use
    ASCII files as their input. I thought about patching the
    compilers that I use to be able to read WordStar files but,
    then, I would need to patch all the compilers that I use and
    they are NEVER provided with their source code...

    So, the more logical way is to first separate the documentation
    from the source code, then compile it. This is precisely the
    "Literate Programming" concept. In addition, it speeds
    compilation.

    In the case of BASIC, BASIC stores its program in a "packed"
    format, using the BAS filename. But it can save and can load
    ASC(ii) files. So, we translate WordStar lines into ASCII, then
    put source code lines in a ASC file (in the case of BASIC). Back
    to the above lines.

    We start by looking for ".TY" at the start of the line. ("TY" is
    standing, of course, for "fileTYpe".) Why ".TY", and not "..TY"?
    Because the main loop has already "eaten" the first dot, to
    decide what to do.

    Since this is the "..TY" command, we get the filetype to be used
    (3 chars max). To do that, we get the last 5 characters of the
    line. Why 5? Because each line ends with a CR/LF pairs. 3 + 2 =
    5 This way of doing could cope with a "..TYPE", or even
    "..TYPE=asc".

    Once we have the filetype, we construct the output file, not
    forgetting the "dot" separating the filename from the filetype.

    We now have the correct output filespec. But there is a problem:
    here, this subroutine is entered EVERY line! So, we must find a
    find a way to know that "we" (the program) are dealing with a
    Literate Programming source code. I could have used a switch
    which could have been located anywhere inside the file, but I
    decided that it was simpler to locate the "..TY" dot command at
    the first line of the WS4 file. However, its absence needs to be
    checked. We will do that when we close the file, after having
    reached its end.

    For the time being, we have a proper Literate Programming
    WordStar file, with a "..TY" dot command as its first line, so
    we open the output file, then increment the line number counter,
    so that we don't erase and open the output file at each line.
    Finally, since this code should, normally, be used only once,
    for the first line, we put it in a separate subroutine, so the
    code is cleaner, and all 3 dot commands are easily seen. (The
    line number counter could display its total at the end of the
    run, if someone wanted to know how long is the source code.)

    This was the first Literate Programming dot command.

    Now, how do we tell the program to start/stop copying the source
    code of a programming language, and not (for example) this
    documentation?

    After some head scratching, I decided to keep it simple and re-
    use some symbols that any programmer will recognize, even at 3
    a.m.... So, I ended selecting "../*" (= BEGIN) and "..*/ (=
    END). They are also two characters long, like most WordStar dot
    commands (but I don't care).

    Once we encounter a "begin echoing text to ASCII file", we do
    it. The PRINT line$ shows on screen what is happening, and can
    be removed for batch operation. The only difficulty was that the
    first "../*" was echoed but, since there is nothing interesting
    at the end of the line, we simply RETURN to the main loop,
    thanks to the RETURN of this subroutine.

    770 IF type$ = "" THEN PRINT CHR$ (7) "This WS4 file is not a
    Literate Programming source file."
    780 PRINT
    790 CLOSE
    800 END

    As can be seen, this is the closing code. Technically, I could
    have put it anywhere in the BASIC program, but decided that it
    would be more appropriate at the end of the file (CB-8x, if I
    remember correctly, insist that the only END should be on the
    last line. But, as can be seen in this BASIC program, there is
    an END when the WS4 file is not found, at the beginning of the
    program -- again, I cannot give a line number, and there is no
    label up there...)

    Conclusion: We have seen several cases where the program could
    be improved, mainly by adding appropriate "labels" in comments.

    This shows that a working program is not enough to talk
    correctly about its inner working.

    So, this first Literate Programming utility under CP/M seems to
    prove the value of the concept.

    I hope that the other remaining CP/M programmers will, thus,
    improve their programs.

    The great beauty of this concept is that it can be used with
    almost any combination of word processor / programming language,
    provided they are powerful enough, and CP/M programs are
    powerful...

    The proof? You are reading this!


    Yours Sincerely,
    Mr Emmanuel Roche


    EOF


  2. Re: Literate Programming under CP/M

    *roche182* wrote on Fri, 08-01-18 10:45:
    >And the problem is that BASIC uses "line numbers" for 2
    >purposes: (1) editing, and (2) destinations for conditional
    >GOTO statements.


    None of the more advanced dialects does this anymore.


+ Reply to Thread