Re: WANTED: Volunteer to Scan Old Programs - CP/M

This is a discussion on Re: WANTED: Volunteer to Scan Old Programs - CP/M ; Emmanuel! I and we all have tolerated a lot from you but when you start spamming the group with binary and HTML junk you have passed the limit. This is just not on. Axel...

+ Reply to Thread
Results 1 to 7 of 7

Thread: Re: WANTED: Volunteer to Scan Old Programs

  1. Re: WANTED: Volunteer to Scan Old Programs

    Emmanuel!

    I and we all have tolerated a lot from you but when you start spamming
    the group with binary and HTML junk you have passed the limit. This is
    just not on.

    Axel


  2. Re: WANTED: Volunteer to Scan Old Programs

    Hello, Axel!

    > I and we all have tolerated a lot from you but when you start spamming
    > the group with binary and HTML junk you have passed the limit. This is
    > just not on.


    All I want are:

    1) to be able to use the "Box drawing" characters that are present on
    all my IBM PCs and PC printers.

    2) to be able to use a "fixed-width" font, since all my WS4 files are
    78-columns wide.

    By the way, Axel, when searching how to display the "extended"
    characters correctly on the comp.os.cpm Newsgroup, I found the answer
    in a file explaining how to display all the 256 chars of the Atari
    ST... So, just changing the values in my BASIC program would enable
    you to display correctly the 256 characters of your Atari ST on any
    computer or program using the "UTF-8" encoding. (It has nothing to do
    with HTML, it is a way to encode "extended" characters, said to be
    compatible with ASCII -- it was done by an American.)

    (I am rewriting from scratch the program, now that I have finally
    understood how Google displays characters. After a few more tests, I
    will publish a message explaining how it works. Meanwhile, if someone
    knows more about UTF-8 and HTML and CSS than me, I still have a few
    questions.)

    Yours Sincerely,
    Mr Emmanuel Roche


  3. Re: WANTED: Volunteer to Scan Old Programs

    "Mr Emmanuel Roche, France" writes:

    >Hello, Axel!


    >> I and we all have tolerated a lot from you but when you start spamming
    >> the group with binary and HTML junk you have passed the limit. This is
    >> just not on.


    >All I want are:


    >1) to be able to use the "Box drawing" characters that are present on
    >all my IBM PCs and PC printers.


    You can - but *not* in a newsgroup..!

    >2) to be able to use a "fixed-width" font, since all my WS4 files are
    >78-columns wide.


    All of us do.
    But no other 8regular) writer uses the Web-Interface
    of 'google' to post...

    >By the way, Axel, when searching how to display the "extended"
    >characters correctly on the comp.os.cpm Newsgroup


    You can't. So: Stop trying!

    >(I am rewriting from scratch the program, now that I have finally
    >understood how Google displays characters.


    Just use a 'real' Newsreader.


    > After a few more tests, I


    *If* you have to 'test' please use a group dedicated to "test".


    >will publish a message explaining how it works. Meanwhile, if someone
    >knows more about UTF-8 and HTML and CSS than me, I still have a few
    >questions.)


    Amicalement, Holger

  4. Re: WANTED: Volunteer to Scan Old Programs

    *Mr Emmanuel Roche, France* wrote on Wed, 08-04-02 09:39:
    >All I want are:


    Use a mail program - any mail program. The follwing is a quote from
    your last post exactly as I came to see it:

    *Mr Emmanuel Roche, France* wrote on Tue, 08-04-01 10:07:
    >4pSA4pSA4pSA4pSs4pSA4pSA4pSA4pSs4pSA4pSA4pSA4pSs4p SA4pSA4pSA4pSs4pSA4pSA4p
    >4pSs4pSA4pSA4pSA4pSs4pSA4pSA4pSA4pSs4pSA4pSA4pSA4p Ss4pSA4pSA4pSA4pSs4pSA4p
    >4pSA4pSs4pSA4pSA4pSA4pSs4pSA4pSA4pSA4pSs4pSA4pSA4p SA4pSs4pSA4pSA4pSA4pSs4p
    >4pSA4pSA4pSs4pSA4pSA4pSA4pSQDQoyIOKUgiAgIOKUgiAhIO KUgiAiIOKUgiAjIOKUgiAkIO
    >giAlIOKUgiAmIOKUgiAnIOKUgiAoIOKUgiApIOKUgiAqIOKUgi ArIOKUgiAsIOKUgiAtIOKUgi



  5. Re: WANTED: Volunteer to Scan Old Programs

    Here is a table explaining which "Code Page" Micro**** uses:

    Language Postoffice Windows NT
    ---------------------------------------------
    Chinese (Simplified) 936 936
    Chinese (Traditional) 950 950
    Czech 852 1250
    Danish 850 1252
    Dutch 850 1252
    English 850 1252
    Finnish 850 1252
    French 850 1252
    German 850 1252
    Greece 737 1252
    Hungarian 852 1250
    Italian 850 1252
    Japanese (3.0) 932 932
    Japanese 932 932
    Korean 949 949
    Norwegian 850 1252
    Polish 852 1250
    Portuguese 850 1252
    Russian 866 1251
    Spanish 850 1252
    Swedish 850 1252
    Turkey 737 1252

    So, the English, French, and German MS-DOS computers used "Code Page
    850" but, when Micro**** introduced WinDoze, it changed to "Code Page
    1252". Hence the problems when printing MS-DOS / CP/M-86 files on this
    cybercafe computer running under WinDoze.

    What my program does is convert those "Code Page 850" characters into
    something called "UTF-8" which seems to be widely used, now, following
    the internationalization of the Internet (web browsers are now obliged
    to display correctly characters from a lot of countries).

    That's all.

    (If you are a Chinese / Japanese / Korean still using WordMaster under
    CP/M 2.2, simply changing the values of a table in my program will
    enable you to convert your WordMaster files into UTF(-8) files.)

    Yours Sincerely,
    Mr Emmanuel Roche


  6. Re: WANTED: Volunteer to Scan Old Programs

    WS4UTF.WS4
    ----------

    A WS4-to-UTF(-8) File Converter

    Before computers, telecommunication people used Baudot code.
    Circa 1963, the (American) Teletype company created its "Model
    33" (as you can see, there had been 32 models before it...),
    using a code that was soon after normalised as the US-ASCII
    (with 2 modifications). It is this famous ASR-33 Teletype that
    was the standard "terminal" when the first microcomputers were
    created.

    Now, it is essential to understand that the ASR-33 Teletype was
    the son of a long line of teletypes: that's why it was not
    totally compatible with the ASCII code.

    +-------+---+---+---+---+---+---+---+---+
    |b7 --->|0 |0 |0 |0 |1 |1 |1 |1 |
    |b6 --->| 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
    |b5 --->| 0| 1| 0| 1| 0| 1| 0| 1|
    +-------+---+---+---+---+---+---+---+---+
    |b b b b| | | | | | | | |
    |4 3 2 1| | | | | | | | |
    || | | || | | | | | | | |
    |V V V V| | | | | | | | |
    +-------+---+---+---+---+---+---+---+---+
    |0 0 0 0|NUL|DC0| | 0 | @ | P | ^ | | |
    |0 0 0 1|SOH|DC1| ! | 1 | A | Q | | | U |
    |0 0 1 0|EOA|DC2| " | 2 | B | R | | | N |
    |0 0 1 1|EON|DC3| # | 3 | C | S | U | A |
    |0 1 0 0|EOT|DC4| $ | 4 | D | T | N | S |
    |0 1 0 1|WRU|ERR| % | 5 | E | U | A | S |
    |0 1 1 0|RU|SYNC| & | 6 | F | V | S | I |
    |0 1 1 1|BEL|LEM| ' | 7 | G | W | S | G |
    |1 0 0 0|FE |SO | ( | 8 | H | X | I | N |
    |1 0 0 1|[1]|S1 | ) | 9 | I | Y | G | E |
    |1 0 1 0|LF |S2 | * | : | J | Z | N | D |
    |1 0 1 1|TAB|S3 | + | ; | K | [ | E | | |
    |1 1 0 0|FF |S4 | , | < | L | \ | D |ACK|
    |1 1 0 1|CR |S5 | - | = | M | ] | | |[2]|
    |1 1 1 0|SO |S6 | . | > | N |[3]| | |ESC|
    |1 1 1 1|SI |S7 | / | ? | O |[4]| V |DEL|
    +-------+---+---+---+---+---+---+---+---+

    Notes:

    [1] = MT/SK
    [2] = ALT MODE
    [3] = Up Arrow
    [4] = Left Arrow

    As can be seen, there were 4 "control codes" located at the end
    of the allowable characters, where, in the ASCII code, only
    "DEL" remains. (I hope that you are able to understand that
    "UNASSIGNED" means that there were no lowercase characters: the
    ASR-33 TTY was uppercase only. That's why Altair BASIC -- until
    Version 4 -- (also known as "MITS 4K BASIC") was uppercase
    only.) Please note, especially, that "ESC" and "ACK" were
    located there, not in the first 32 "control codes". Of course,
    since this ASR-33 Teletype was pre-ASCII, the names of the
    "control codes" that it used were also different. Also, note the
    famous "ALT MODE", which was also a relic of previous codes.
    Before, when telecommuncation devices used 6 bits, "ALT MODE"
    was used, for example, to switch from characters to numbers, or
    from black ink to red ink (that's why Altair BASIC tests it),
    depending upon the teletypewriter used (there were several
    teletypes makers, besides Teletype).

    Finally, note the mentions of a "up arrow" and a "left arrow",
    where the ASCII code uses "^" and "_" (caret and underline).
    Some programming languages used "left arrow" as their symbols
    for "assignment" (i.e., the equivalent of "LET" in BASIC).

    Also, note that those 2 arrows were pointing left and up, that
    is to say: the right and down arrows were missing... (Several
    ASR-33 users complained about this!)

    Re-finally, also note that the ASR-33 Teletype was uppercase
    only: there were no lowercase characters, and the { | } ~
    characters used (among others, by the C language).

    So, circa 1964, the US-ASCII became a standard.

    run"ascii

    0 1 2 3 4 5 6 7 8 9 A B C D E F

    2: ! " # $ % & ' ( ) * + , - . /
    3: 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
    4: @ A B C D E F G H I J K L M N O
    5: P Q R S T U V W X Y Z [ \ ] ^ _
    6: ` a b c d e f g h i j k l m n o
    7: p q r s t u v w x y z { | } ~ DEL

    Ok

    This is a 7-bit code (previous codes were often 6 bits, 5 bits,
    etc.).

    WordStar, the word processor that I use, being made by an
    American, is also 7 bits. Later versions of WordStar, like
    Version 4.0, enable someone to use 8- bits codes.

    I wanted to show you, here, the characters from 80h to 0ffh but,
    for portability reason, finally decided not to. You will have to
    bring back your old copy of the "IBM PC Technical Reference
    Manual (1981)", and open it to Section 7: "Characters,
    Keystrokes, and Colors". That's the reference that I used.

    As could be seen, this WS4 file could contain all the characters
    used by my IBM PC (from 20h to 0ffh: it is also possible to
    display the values corresponding to the "control chars", but I
    decided not to implement it). I have 2 copies of the WordStar 4
    manuals: one "CP/M edition" (which lists only the ASCII set) and
    one "Professionnal" (which lists the "ASCII character codes and
    extended character set"). I checked those characters, one by
    one: the shapes of B0, B1, and B2 are horizontal, rather than
    vertical; the shapes of DB, DC, DD, DE, and DF seem to be
    compressed vertically; in my opinion, the Greek char that they
    used for E8 is the uppercase version, but I could be wrong,
    since I have not read Greek since University...

    Those were the characters used by the IBM PC in 1981. Those are
    the characters still displayed on the screen of my IBM PC in
    2008, and printed on my (PC) printer. They were so widely used
    that they became a "de facto" standard, normalized as "ISO 646"
    in 1988, 7 years after the creation of the IBM PC.

    Yet, when I go to a cybercafe and publish a text on the
    Internet, the "extended characters" are different, preventing me
    from using the "Box drawing" characters... What went wrong with
    the "IBM Clown"?

    According to my various docs, when Micro**** introduced WinDoze
    (thus dropping MS-DOS), he switched to something called "Windows
    Latin 1" (also named "ISO Latin 1", or ISO-8859-1). This
    character set is well-known in France, since 2 common french
    characters are missing in it, preventing you from writing words
    like "eye" and "beef" with proper french characters! ("This is
    not a bug, this is a feature!") The "extended chars" of this set
    are almost only combinations of various European characters
    (which were previously available by printing over a previous
    character: each combination is now a single character. This is
    reminding of the Ethiopian alphabet, which used to have separate
    voyels. Then, during the Middle Age, people started writing
    those voyels at the bottom of the consonnants -- as if you wrote
    a,e,o,i,u under the previous consonnant -- until they were
    familiar with this "ba", "ca", "da" system. As a result, the
    Ethiopian alphabet now has 114 characters, rather than the
    original 26-or-so characters). For some unknown reason, there
    are no "Box drawing" characters under WinDoze. You need a PAO
    program to make a table.

    In reaction, a consortium made of Adobe, Apple, IBM, Sun, and
    Xerox, called "UNICODE", made the ISO adopt one year later a so-
    called "UNICODE", which is 16-bits, so able to contain 64K
    different characters.

    Thinking that 65,536 characters were not enough, ISO then made
    its own standard, "ISO 10646" (note the 646) which is 32-bits,
    hence able to define 4 294 967 296 characters... (All that
    because a people in Lebanon used 24 characters (inspired by the
    alphabet used in Irak) 2,300 years ago! The Greeks (who were in
    commercial relations with them) adopted those characters... The
    rest is history!)

    Now, since 65536 characters do not fit in 7-bits ASCII, how is
    it possible that UNICODE became a world-wide standard for
    Internet applications? The main reason is that an American,
    Kenneth Thompson, invented a way (compatible with US-ASCII) to
    code those 65536 characters, named UTF-8.

    US-ASCII is 7 bits, so the MSBit of each char is 0: 0xxx$xxxxb.
    (I am using the CP/M ASM "blank char" $ to separate the
    nibbles.) Their values range from 00h to 7fh.

    Now, one of the difficulties of UTF-8 is that it jumps from one
    size to another at unusual values. Explanation: when you count
    7, 8, 9, when you reach 10, you now have 2 digits. During 25+
    years, I was used that a byte value was "jumping" from one byte
    at 0ffh to two bytes at 0001$0000h (Here, I use the $ to
    separate the bytes.)

    With UTF-8, it is different, because the 2nd and 3rd bytes start
    with 10b. But, since it is a value greater than 7fh, the first
    byte of a two-byte value should also start with a 1... It is
    here that things are a little bit tricky, so I suggest to
    examine the following lines:

    1 byte: 00-7f: 0xxxx$xxxxb
    2 bytes: 0080-07ff: 110x$xxxxb 10xx$xxxxb
    3 bytes: 0800-ffff: 1110$xxxxb 10xx$xxxxb 10xx$xxxxb

    (The UTF-8 standard also allows a 4-byte value but, since my
    WS4-to-UTF(-8) File Converter don't use a 4-byte value, I will
    limit my explanation to the first 3-byte values, which are also
    the most often used.)

    If you look at the above 3 lines, something should be clear: the
    xxxx bits are always preceded by a 0.

    In the case of a byte, this 0 is the MSBit.

    In the case of a 2-byte, this 0 is preceded by a 11 before the 0
    of the MSByte, and by a 1 in the case of the LSByte.

    In the case of a 3-byte, this 0 is preceded by a 111 before the
    0 of the MSByte, and by a 1 in the case of the MidByte and
    LSByte (this is also the case for 4-byte values).

    The trick is that 2-byte values start with a 11: that is to say:
    "two ones" means that this is a 2 bytes value.

    3-byte values start with a 111: that is to say: "three ones"
    means that this is a 3 bytes value.

    (4-byte values start with... Guess what? a 1111.)

    So, if you see a byte starting with a 0, you know that it is a
    byte (a 1-byte value).

    If you see a byte starting with a 110, you know that this is a
    word (a 2-byte value).

    If you see a byte starting with a 1110, you know that this is a
    3-byte value.

    (If you see a byte starting with a 10, you know that it is a
    "following byte". You loop until you encounter another byte
    starting with 0, 110, 1110, etc.)

    Let us now examine an example, so you see the point.

    When creating my WS4-to-UTF(-8) File Converter, I had some
    difficulty with the "currency symbol" for "Pesetas" (the Spanish
    coins). Opening the UNICODE book, I found that the only Peseta
    symbol was 20A7 (hex).

    Now, this would be too simple: you don't simply DOKE the value
    20A7 into your WordStar file! First, it has to be "surrounded"
    by 2 characters used internally by WordStar to know when it is
    dealing with "Extended characters".

    Second, and more important, since it allows us to display the
    character set of the IBM PC ("Code Page 850", but this could be
    another one) correctly to people using the Internet programs
    which, being world-wide, need to be able to display correctly a
    lot of characters from foreign countries.

    UTF-8, being compatible with US-ASCII for byte values, is widely
    used. In practice, 65536 characters is enough to deal with a lot
    of foreign countries.

    So, back to our example: how to convert the value 20A7, defined
    in the UNICODE standard as the symbol for the currency
    "Peseta"(s), using the UTF-8 standard?

    20A7 = 0010$0000$1010$0111b

    20A7 is bigger than 7f. It is also bigger than the range 0080-
    07ff used by 2-byte values. So, it must be a 3-byte value (0800-
    ffff)? Yes.

    So, the first byte will start with a 1110, saying that this is
    the start of a 3-byte value, and will be followed by 2 bytes
    starting with 10, saying that they are the following bytes.

    Let us try to represent this with only the ASCII characters:

    +----+----+ +----+----+ +----+----+
    |1110|xxxx| |10xx|xxxx| |10xx|xxxx|
    +----+----+ +----+----+ +----+----+
    |Nib4| |Nib3|NB| |NA|Nib1|
    +----+ +----+--+ +--+----+

    Remember that 20A7 = 0010$0000$1010$0111b ?

    So, we take its 4 nibbles (in the case of the 2nd Nibble, it is
    further divided into a "Nibble A" and "Nibble B" for lack of a
    better name) and insert them in the following "drawing":

    2 0 A 7
    +----+ +----+--+ +--+----+
    |Nib4| |Nib3|NB| |NA|Nib1|
    +----+ +----+--+ +--+----+
    |0010| |0000|10| |10|0111|
    +----+----+----+----+----+----+
    |1110|0010|1000|0010|1010|0111|
    +----+----+----+----+----+----+
    E 2 8 2 A 7

    That is to say, during the UTF-8 encoding, our 20A7 value has
    become a E2.82.A7 triplet...

    (Note that the first number, 2, and the last number, 7, remain
    the same. In practice, they are the only numbers that don't
    change. A quick look to the above drawing will explain why,
    since the Nib4 and Nab1 are copied vertically without change,
    while the Nib3 and Nib2 ("NB" and "NA") are split and preceded
    by the "following byte" markers (10).

    This is how UTF-8 works.

    Now, I did not understood it instantly...

    You may remember that I published an HTML file in the
    comp.os.cpm Newsgroup (for which I was flamed...). I was
    expecting Google Groups (the newsreader that I use) to recognize
    the HTML commands and display correctly the "extended
    characters" that I had coded using the ሴ scheme.

    Instead, the full HTML source code appeared on the screen... It
    was only if I was "clicking" on the screen that a new "window"
    was opening, containing all the characters in their grid. What
    did happen?

    I was puzzled, to say the least.

    As usual, impossible to find anybody with the answer.

    It is, one day, when I decided to dump the display at home, on
    my computer, that a breakthrought happened.

    So, I "selected" the contents of the window, then opened my
    floppy disk. When offered to give a name, I chose ASCIGRID.TXT.
    Surprise! WinDoze opened a warning window, telling me that the
    file contained UTF-8 characters, and that I would lose them if I
    did not use the "File Format" pull-down menu.

    So, I followed its suggestion (it must be the first time that I
    experienced a WinDoze computer being useful!), and saved the
    mysterious file on my floppy.

    Back home, I dumped and printed the contents of the file.
    Opening WordStar, I could see the grid containing all the
    characters on the screen. All I had to do was to mark, on the
    printout, what was in hex for each character. It is then that I
    understood that some characters were coded with one byte, some
    with two bytes, and all the "Box drawing" characters with three
    bytes.

    Since this file contained absolutely no HTML commands, I wrote
    the WS4UTF.BAS program that follows. Every day, I was going to
    the cybercafe, was printing the UTF file that it had created,
    then was going back home to examine character by character the
    224 characters printed.

    When there was a problem, I also searched many, many hours in
    the UNICODE web site, for the missing characters.

    Since I see no more problem, I finally release it. Of course, if
    you see any problem, warn me.

    The only known problems are:

    1) The char for DEL used to be a "triangle" for my CP/M
    computers which were able to display it and the IBM PC. However,
    all my IBM Clowns now display it as something called "home
    symbol". Since this is also the graphics printed on all my
    printers, this is the char to be found.

    2) The symbol for "Peseta" (9E) was "Pt". But the only Peseta
    symbol I was able to find on the UNICODE web site is "Pts"...

    3) Chars A6 and A7 (superscript lowercase "a" and "o") are
    underlined in all the references that I have, all the printers
    that I have, all the computers that I have... except when I
    print them under WinDoze at the cybercafe! In addition, the
    UNICODE web site display them (in a PDF file) as underlined!!!
    So, could someone tell Micro**** that they are, indeed,
    underlined, and have been so since (at least) 1981? (Else,
    WinDoze would not be IBM PC-compatible...)

    4) I had quite a lot of trouble finding a corresponding
    character for A9, but managed, eventually.

    5) I am not much impressed by the Greek chars displayed by
    WinDoze. In particular, I think that E2 is too similar to a "Box
    drawing" character.

    6) Finally, I had much difficulty with EE. I interpret it as the
    math symbol "belongs to". I found 2 such symbols in the UNICODE
    web site (a small and a big) but they never displayed or printed
    on the WinDoze computer of the cybercafe. As a desperate
    solution, I am using "epsilon", instead.

    7) I have trouble seeing the difference between F9 and FA, which
    were, as far as I can see, different in the "IBM PC Technical
    Reference Manual".

    Conclusion: Despite all those little problems, this is (as far
    as I know) the first time that a "Code Page 850" to UTF-8 file
    converter is offered (I was totally unable to find any table
    explaining this convertion, despite long searches with Google).
    Even better: this version is able to convert the file produced
    by the WordStar word-processor that I have been using for the
    last 20 years. (According to my computers, I have 800+ WS4
    files. Now, every time I will check a WS4 file and notice that
    it contain "ASCII graphics" (usually tables), I will convert it
    to those "Extended characters" that were used, in the good old
    days, to produce those tables with phototypeseters... At the
    cybercafe, the laser printer prints at 600 DPI: I am unable, at
    my age, to see any difference between its output and a book.
    That's enough for me.)

    Ha! By the way, would you be interested in the program?

    10 REM WS4UTF.BAS by Emmanuel ROCHE
    20 :
    30 PRINT
    40 INPUT "WS4-to-UTF> Enter WS4 File Name: " ; file$
    50 PRINT
    60 WHILE FIND$ ("*.WS4") <> ""
    70 found$ = FIND$ (file$ + ".WS4")
    80 IF found$ = "" THEN GOTO 170
    90 ordinal = ordinal + 1
    100 file1$ = FIND$ (file$ + ".WS4", ordinal)
    110 file2$ = LEFT$ (file1$, 8) + ".UTF"
    120 IF file1$ = "" THEN GOTO 150
    130 GOSUB 190
    140 WEND
    150 END
    160 :
    170 PRINT CHR$ (7) "File not found." : PRINT : END
    180 :
    190 OPEN "R", #1, file1$, 1
    200 FIELD #1, 1 AS byte$
    210 OPEN "O", #2, file2$
    220 :
    230 PRINT #2, CHR$ (&HEF) ; ' Byte
    240 PRINT #2, CHR$ (&HBB) ; ' Order
    250 PRINT #2, CHR$ (&HBF) ; ' Mark ("BOM")
    260 :
    270 OPTION BASE 0
    280 DIM hexa$ (255), ncr1$ (255), ncr2$ (255), ncr3$ (255)
    290 FOR i = 0 TO 126
    300 ncr1$ (i) = ""
    310 ncr2$ (i) = ""
    320 ncr3$ (i) = ""
    330 NEXT i
    340 RESTORE 940
    350 FOR i = 127 TO 255
    360 READ hexa$ (i)
    370 READ ncr1$ (i)
    380 READ ncr2$ (i)
    390 READ ncr3$ (i)
    400 NEXT i
    410 :
    420 ' Trick if we use a WHILE NOT EOF...
    430 GET #1
    440 GOSUB 780
    450 :
    460 WHILE NOT EOF (1)
    470 GET #1
    480 IF ASC (byte$) = &H1A THEN PRINT #2, CHR$ (&H1A) ;
    490 IF ASC (byte$) = &H1A THEN CLOSE : RETURN
    500 GOSUB 780
    510 WEND
    520 RETURN
    530 :
    540 ' WS4 text
    550 PRINT #2, STRIP$ (byte$) ;
    560 RETURN
    570 :
    580 ' Extended chars (+DEL)
    590 GET #1
    600 PRINT #2, CHR$ (VAL ("&H" + (ncr1$ (ASC (byte$))))) ;
    610 IF ncr2$ (ASC (byte$) ) = "0" THEN GOTO 650
    620 PRINT #2, CHR$ (VAL ("&H" + (ncr2$ (ASC (byte$))))) ;
    630 IF ncr3$ (ASC (byte$) ) = "0" THEN GOTO 650
    640 PRINT #2, CHR$ (VAL ("&H" + (ncr3$ (ASC (byte$))))) ;
    650 GET #1
    660 RETURN
    670 :
    680 ' Get rid of WS4 internal commands.
    690 i$ = CHR$(9)+CHR$(10)+CHR$(13)+CHR$(27)+CHR$(155)
    700 i = INSTR (i$, byte$)
    710 REM Bytes: 09 0A 0D 1B 9B
    720 ON i GOSUB 550, 550, 550, 590, 590
    730 IF ASC (byte$) = &H82 THEN RETURN
    740 IF ASC (byte$) > &H1F THEN GOSUB 550
    750 IF ASC (byte$) = &H1A THEN CLOSE : RETURN
    760 RETURN
    770 :
    780 IF byte$ = "." THEN GOTO 880
    790 ' WS4 text
    800 GOSUB 690
    810 WHILE ASC (byte$) <> &HA
    820 GET #1
    830 GOSUB 690
    840 IF ASC (byte$) = &H8A THEN RETURN
    850 WEND
    860 :
    870 ' Dot commands
    880 WHILE ASC (byte$) <> &HA
    890 GET #1
    900 IF ASC (byte$) = &H8A THEN RETURN
    910 WEND
    920 RETURN
    930 :
    940 DATA 7F, E2, 8C, 82
    950 DATA 80, C3, 87, 0
    960 DATA 81, C3, BC, 0
    970 DATA 82, C3, A9, 0
    980 DATA 83, C3, A2, 0
    990 DATA 84, C3, A4, 0
    1000 DATA 85, C3, A0, 0
    1010 DATA 86, C3, A5, 0
    1020 DATA 87, C3, A7, 0
    1030 DATA 88, C3, AA, 0
    1040 DATA 89, C3, AB, 0
    1050 DATA 8A, C3, A8, 0
    1060 DATA 8B, C3, AF, 0
    1070 DATA 8C, C3, AE, 0
    1080 DATA 8D, C3, AC, 0
    1090 DATA 8E, C3, 84, 0
    1100 DATA 8F, C3, 85, 0
    1110 DATA 90, C3, 89, 0
    1120 DATA 91, C3, A6, 0
    1130 DATA 92, C3, 86, 0
    1140 DATA 93, C3, B4, 0
    1150 DATA 94, C3, B6, 0
    1160 DATA 95, C3, B2, 0
    1170 DATA 96, C3, BB, 0
    1180 DATA 97, C3, B9, 0
    1190 DATA 98, C3, BF, 0
    1200 DATA 99, C3, 96, 0
    1210 DATA 9A, C3, 9C, 0
    1220 DATA 9B, C2, A2, 0
    1230 DATA 9C, C2, A3, 0
    1240 DATA 9D, C2, A5, 0
    1250 DATA 9E, E2, 82, A7
    1260 DATA 9F, C6, 92, 0
    1270 DATA A0, C3, A1, 0
    1280 DATA A1, C3, AD, 0
    1290 DATA A2, C3, B3, 0
    1300 DATA A3, C3, BA, 0
    1310 DATA A4, C3, B1, 0
    1320 DATA A5, C3, 91, 0
    1330 DATA A6, C2, AA, 0
    1340 DATA A7, C2, BA, 0
    1350 DATA A8, C2, BF, 0
    1360 DATA A9, E2, 8C, 90
    1370 DATA AA, C2, AC, 0
    1380 DATA AB, C2, BD, 0
    1390 DATA AC, C2, BC, 0
    1400 DATA AD, C2, A1, 0
    1410 DATA AE, C2, AB, 0
    1420 DATA AF, C2, BB, 0
    1430 DATA B0, E2, 96, 91
    1440 DATA B1, E2, 96, 92
    1450 DATA B2, E2, 96, 93
    1460 DATA B3, E2, 94, 82
    1470 DATA B4, E2, 94, A4
    1480 DATA B5, E2, 95, A1
    1490 DATA B6, E2, 95, A2
    1500 DATA B7, E2, 95, 96
    1510 DATA B8, E2, 95, 95
    1520 DATA B9, E2, 95, A3
    1530 DATA BA, E2, 95, 91
    1540 DATA BB, E2, 95, 97
    1550 DATA BC, E2, 95, 9D
    1560 DATA BD, E2, 95, 9C
    1570 DATA BE, E2, 95, 9B
    1580 DATA BF, E2, 94, 90
    1590 DATA C0, E2, 94, 94
    1600 DATA C1, E2, 94, B4
    1610 DATA C2, E2, 94, AC
    1620 DATA C3, E2, 94, 9C
    1630 DATA C4, E2, 94, 80
    1640 DATA C5, E2, 94, BC
    1650 DATA C6, E2, 95, 9E
    1660 DATA C7, E2, 95, 9F
    1670 DATA C8, E2, 95, 9A
    1680 DATA C9, E2, 95, 94
    1690 DATA CA, E2, 95, A9
    1700 DATA CB, E2, 95, A6
    1710 DATA CC, E2, 95, A0
    1720 DATA CD, E2, 95, 90
    1730 DATA CE, E2, 95, AC
    1740 DATA CF, E2, 95, A7
    1750 DATA D0, E2, 95, A8
    1760 DATA D1, E2, 95, A4
    1770 DATA D2, E2, 95, A5
    1780 DATA D3, E2, 95, 99
    1790 DATA D4, E2, 95, 98
    1800 DATA D5, E2, 95, 92
    1810 DATA D6, E2, 95, 93
    1820 DATA D7, E2, 95, AB
    1830 DATA D8, E2, 95, AA
    1840 DATA D9, E2, 94, 98
    1850 DATA DA, E2, 94, 8C
    1860 DATA DB, E2, 96, 88
    1870 DATA DC, E2, 96, 84
    1880 DATA DD, E2, 96, 8C
    1890 DATA DE, E2, 96, 90
    1900 DATA DF, E2, 96, 80
    1910 DATA E0, CE, B1, 0
    1920 DATA E1, CE, B2, 0
    1930 DATA E2, CE, 93, 0
    1940 DATA E3, CF, 80, 0
    1950 DATA E4, CE, A3, 0
    1960 DATA E5, CF, 83, 0
    1970 DATA E6, C2, B5, 0
    1980 DATA E7, CF, 84, 0
    1990 DATA E8, CE, A6, 0
    2000 DATA E9, CE, 98, 0
    2010 DATA EA, CE, A9, 0
    2020 DATA EB, CE, B4, 0
    2030 DATA EC, E2, 88, 9E
    2040 DATA ED, CF, 86, 0
    2050 DATA EE, CE, B5, 0
    2060 DATA EF, E2, 88, A9
    2070 DATA F0, E2, 89, A1
    2080 DATA F1, C2, B1, 0
    2090 DATA F2, E2, 89, A5
    2100 DATA F3, E2, 89, A4
    2110 DATA F4, E2, 8C, A0
    2120 DATA F5, E2, 8C, A1
    2130 DATA F6, C3, B7, 0
    2140 DATA F7, E2, 89, 88
    2150 DATA F8, C2, B0, 0
    2160 DATA F9, E2, 88, 99
    2170 DATA FA, C2, B7, 0
    2180 DATA FB, E2, 88, 9A
    2190 DATA FC, E2, 81, BF
    2200 DATA FD, C2, B2, 0
    2210 DATA FE, E2, 96, A0
    2220 DATA FF, C2, A0, 0

    (The only thing not explained is the "BOM" bytes, that I
    discovered in my dump. It is only a few days later, when reading
    stuff about UTF-8, that I saw a passing reference to a "Byte
    Order Mark"... That's all I needed to know. As long as this
    program translates my WS4 files into files displayed and printed
    correctly under Internet programs, I am happy.)


    Yours Sincerely,
    Mr Emmanuel Roche


    EOF


  7. Re: WANTED: Volunteer to Scan Old Programs

    HOPPER.WS4
    ----------

    Grace Hopper, in

    - "History of Programming Languages"
    Edited by Richard Wexelblat
    Academic Press, 1981

    (Retyped by Emmanuel ROCHE.)

    said, page 19: "... before 1956, the concentration was on
    meeting user needs, the concentration was not on the languages:
    it was on building the tools which later began to support the
    languages: the compilers, the generators, the supporting
    elements which now back up our definitions of languages.
    Languages came into use. People began to use them. There was
    another stage that had to occur. I think that, to some extent,
    we had paid little attention to it in the early days. And that
    was that the implementers interpreted things differently. This
    particularly occurred in the case of COBOL. The Navy became very
    much interested in trying to make all the COBOL compilers give
    the same answers, even though they were on different computers.
    And it was for that reason that I was called back and in the
    late 1960s at the Navy Department. A set of programs was built,
    which would validate a COBOL compiler. They would compare the
    execution of the compiler against the standard, and monitor the
    behavior of the actions of the compiler. It was the first set of
    programs that was built to try to use software to check
    software.

    I think this is an important element we have omitted. If we are
    going to have a language, it certainly should have the same
    answers on all the different computers. The set of COBOL
    validation routines was the first such set of routines to prove
    whether or not a compiler did in fact correctly implement the
    standard. I have the papers that were published on the "Federal
    COBOL Compiler Testing System". Recently, they have also
    produced a set of routines for testing FORTRAN.

    I think this is something we overlooked in our development of
    our languages. We overlooked the fact that we must see to it
    that those compilers properly implemented the language; that
    they did give common answers on different computers. A language
    is not very useful if you get different answers on different
    computers. At least it isn't to an organization like the Navy
    which, at any given moment, has at least one of every kind of
    computer, and we *would* like to get the same answers from the
    same program on the different computers."


    EOF


+ Reply to Thread