TODATA.TXT
----------

"To add or to remove a line number, that is the subject."
by Emmanuel ROCHE

Tonight, there is a storm where I live on the Atlantic Coast, so
it is difficult to sleep.

This week-end, while trying to put some order into my stuff, I
encountered the listing of an old BASIC program to transfer
files between computers. I had a look. Since the programmer was
using a line editor, he was expecting, of course, that the
recipient would send him, too, files with line numbers at the
beginning of each line. This was perfectly normal, back then.

However, in case the sender had used ED or WordStar to write his
message, there was an option, in this old comm program in BASIC,
to add line numbers in front of each line. This got me
thinking. This is almost exactly what I do from time to time,
and have not find a nice simple way of doing. (PIP has an option
(in fact, 2) to add line numbers to a file, but they are not of
the form that I needed.)

For example, a few months ago, when explaining the chain of
thoughts that went into the creation of "Computer-Aided Family
Trees (CAFT)", I wrote: "I used one of my old BASIC program,
TODATA.BAS, to add a number and a DATA statement in front of
each line. (Since this program is ugly, I decided not to show
it to you. It is old, and should be rewritten, but I only use
it from time to time, so have never invested the time to improve
it. As long as it works without error, that's the most
important thing, before being readable or pretty.)"

Is it, finally, the opportunity to improve it? Time will tell.
Anyway, I have nothing to read, and the storm is noisy.
Programming will concentrate my thoughts so much that I will no
longer hear it.

The old comm program in BASIC used the STR$ function. The less
that can be said is that I very rarely use this function. It
would be interesting to check, among the two dozens of BASIC
programs that I have published on the comp.os.cpm Newsgroup, how
many times I have used it.

According to my manual, "The value of the numeric-expression is
converted to a decimal string in the same form as used in a
PRINT command. Note that positive values yield a string with a
leading space, whereas negative values have a leading minus
sign."

This is precisely that leading space that has often bothered me.
For instance, in a dump program like the recently-published
ENTCOM.BAS, you need to print in nice column the (hex) address,
then rows after rows of (hex) bytes, neatly arranged vertically,
while BASIC, it is said, "works in decimal"...

Well, this is not exactly true because, right from the start,
the "math packages" that are to be found inside BASIC
interpreters separated them into several kinds of BASIC
interpreters: those able to compute only byte values (that is to
say: only up to 255, like the original Tiny BASIC!), most that
were using "integers" (that is to say: one sign bit and 15-bit
numbers, enabling them to compute up to 32,767. After that, for
example if you wanted to reach the top of the memory, you had to
do some strange math operation.), some BASIC interpreters that
used fixed-point maths, some that used BCD, and finally one
well-known BASIC interpreter that used not one, but two
differents formats of floating-point numbers. (Since then, some
BASIC interpreters, while still being compatible with the
"standard", also use internally UNT values; that is to say:
"Unsigned iNTeger", or 16-bits values: no more strange math
operation to access the full 64K of memory of an 8-bit system.)

So, in short, while BASIC appears to input and output decimal
numbers, internally it uses at least 4 (very) different versions
of those numbers, with a quantity of subroutines to convert from
one format to the next, and the problems of rounding and
truncation that can occur while doing those convertions.

We, we are only concerned with the format used to output
numbers, but the above explanation was made to explain that,
internally, things can be pretty complex. To give you an idea,
the "Falconer Floating-Point Package" that I published on the
comp.os.cpm Newsgroup takes 2.5K... 2.5 KiloBytes: that is to
say: as big as a fully working Tiny BASIC interpreter! All that
just to print the "correct" value on screen...

By the way, let us go back to the problem: displaying a number
on screen. According to the manual, "Positive numbers are
preceded by a space; negative numbers by a minus sign. All
numbers are followed by a space."

Well, a quick session with BASIC in command mode should confirm
this:

? "|"8"|"
| 8 |

Yes, indeed, BASIC adds a space BEFORE and AFTER the number.

Long, long time ago, when I was printing a hex number, I first
checked if it was above 0FH (that is to say: was printed with 2
hex numbers). Else, I was first printing a leading "0" before
the single hex number, since the famous BASIC had no option
specifying the width of printing hex numbers.

Later, I found the PRINT USING command, but settled using a
variant (that must be portable, since I have used it ever
since). For example, to print an address:

PRINT RIGHT$ ("000" + HEX$ (adr), 4) "| " etc

At first, it looks strange, but the hex address is correctly
displayed using 4 hex numbers.

Finally, I encountered Mallard BASIC, probably... No! Certainly
the best MBASIC-compatible interpreter ever written. It is a
pleasure to use it. Its line editor, being designed for a
screen, is much simpler than MBASIC's line editor, which was
designed for an ASR-33 Teletype, but was never modified, even
when sold for the IBM Clown...

Regarding HEX$, Mallard BASIC has a very nice option:

PRINT HEX$ (adr, 4)

will print the hex address using 4 numbers (When listing memory
above 0900H (like the BDOS), I use 5 numbers, to get a
PL/M-compatible hex address that I end, of course, with a "H".).

Now, let us see the action of STR$:

? "|"STR$(8)"|"
| 8|

Ha? There is a change: no space after the number.

Just to be sure, I rechecked the subroutine used in the old comm
program in BASIC: it outputted a space before each line number.
Apparently, this old BASIC had no trouble reading lines starting
with spaces. However, me, I found them quite unpleasant: BASIC
and ED, and all the line editors that I know, display the line
number starting at column one, without any leading space. So,
how to print this number without this leading space?

Suddenly, I had a flash of light: STR$ converts a number into a
string! So, instead of adding PRINT USING or PRINT RIGHT$
commands, why not consider the number as a string, and see how
to print it on screen not from the first character (the space)
but from the second (the first number of the number).

Of course, I knew the answer, but I had a check, nevertheless,
in the manual. The most-often used string functions are LEFT$,
RIGHT$, and MID$. Since we have no idea how long a string will
be (for instance, when getting a line from a file), we cannot
use LEFT$ or RIGHT$. Only MID$ remains. Its form is: MID$
(string, start-position, facultative-sub-string-length). In our
case, we don't care about the facultative sub-string length, and
we already have a string (our number): only start-position
remains.

"The start-position is an integer-expression which specifies the
character in string which is to be the first character of the
sub-string. The integer-expression must yield a value in the
range 1 to 255." That should be enough. My widest daisy-wheel
printer is 136-columns wide, and I only know of one printer (for
a Mainframe computer) that was 256-columns wide. Later, it is
also written: "The first character of the original string is at
position 1." That is to say: the leading space is at position 1,
so the first number of the number is at position 2... Rush to
the BASIC interpreter:

? "|" MID$ (STR$ (8), 2) "|"
|8|

Wahoo! No more leading space!

? "|" MID$ (STR$ (88), 2) "|"
|88|

? "|" MID$ (STR$ (888), 2) "|"
|888|

(etc.)

It works!

So, I quickly patched TODATA.BAS and decided to make a good
test. I copied my biggest ASM file into the RAMdisk, then
changed the default input filetype to be ASM (instead of ASC),
then typed "run":

TODATA> Enter ASC File Name: ? asm86

ASM86.DAT is 19582-lines long.

Using WordStar 4, I then opened the DAT file, then jumped to the
last line. No doubt about it: the DAT file was really 19582
lines long!

One last thing. Line editors used to be standards on computers.
That's why BASIC has one, since screens did not exist when it
was created (in 1964, long before microcomputers, when the
ASR-33 Teletype was the standard terminal, hence the word
"PRINT" to... really, physically print characters on the paper
roll of the Teletype). The only difference between a line
editor and a full-screen file editor is that it adds line
numbers at the beginning of each line. Once you know this, line
editors are perfectly usable. But, what size of file can they
edit? To answer this question, I loaded Good Old Mallard BASIC
Version 1.29, as originally discovered on the Amstrad PCW8256:

M>basic

Mallard-80 BASIC with Jetsam Version 1.29
(c) Copyright 1984 Locomotive Software Ltd
All rights reserved

30061 free bytes

Ok

I then tried to load the above DAT file, knowing full well that
it was too big:

load"asm86.dat
Memory full
Ok

and saved the portion that it had been able to load. It was 30K
big, and the last DATA line number was 11670: that is to say:
this 8-bit BASIC had loaded a 1167-lines long ASM file, which
was chosen at random to be typical of ASM files.

Now, one page, in the USA, happens to be 55 lines tall, so 1167
/ 55 = 21 pages. Now, 21 pages (and 30K) maybe does not seem
much now, but, on an 8-bit CP/M system, this happens to be also
the size of the source code of Palo Alto Tiny BASIC... That is
to say: using BASIC, you can create a COMmand file as powerful
as a Tiny BASIC 3K big. (Generally speaking, a COM file is 10 to
20 times smaller than its ASM file, depending upon the quantity
of comments.)

(Of course, if 21 pages is not enough, you can use INCLUDE or
MACLIB pseudo-ops in your ASM file. That's why so many old
programs of the CP/M User Group are cut in several files.)

When using BASIC to create source code files for an assembler,
you then need a program to remove the line numbers and REM (or '
character). Assembly language programs have (almost universally)
the ASM file type. It would be better if our assembler
understood several file types, thus enabling us to write source
code with BASIC, ED, or WordStar. Unfortunately, this is not the
case.

For historical reasons, ASM and MAC accept line numbers and "*"
in column one as indicating a comment -- they were used by the
Processor Technology "Software Package #1" assembler -- but
they only accept an ASM file type. Of course, if we wrote an
8080 assembler in BASIC, it would be simpler if it could
assemble source code written for ASM (for compatibility) and
files with line numbers created with BASIC. Let us call this
hypothetical "BASIC assembler" BSM.

To edit a file with BASIC, it needs to have line numbers. To
assemble a file with ASM, it needs to have the line numbers
removed. So, we need at least 2 small utilities. Let us call
them TOBSM when it add line numbers, and TOASM when it remove
line numbers.

10 REM TOBSM.BAS by Emmanuel ROCHE
20 :
30 ' BSM line = 1(2345)0 + " '" + ASM line
40 :
50 PRINT
60 INPUT "TOBSM> Enter ASM File Name: " ; file$
70 PRINT
80 file1$ = file$ + ".ASM"
90 OPEN "I", #1, file1$
100 file2$ = file$ + ".BSM"
110 OPEN "O", #2, file2$
120 li = 0
130 ' lin = 0
140 WHILE NOT EOF (1)
150 li = li + 1
160 LINE INPUT #1, line$
170 ' PRINT MID$ (STR$ (li), 2) "0 '" LEFT$ (line$, 68)
180 PRINT #2, MID$ (STR$ (li), 2) "0 '" line$
190 ' lin = lin + 1
200 ' IF lin = 23 THEN lin = 0 : PRINT : PRINT "Press RETURN
to Continue " ; : WHILE INKEY$ = "" : WEND : PRINT
210 WEND
220 CLOSE
230 ' PRINT
240 PRINT UPPER$ (file2$) " is" STR$ (li) "-lines long."
250 PRINT
260 END


10 REM TOASM.BAS by Emmanuel ROCHE
20 :
30 ' BSM line = 1(2345)0 + " '" + ASM line
40 :
50 PRINT
60 INPUT "TOASM> Enter BSM File Name: " ; file$
70 PRINT
80 file1$ = file$ + ".BSM"
90 OPEN "I", #1, file1$
100 file2$ = file$ + ".ASM"
110 OPEN "O", #2, file2$
120 li = 0
130 ' lin = 0
140 WHILE NOT EOF (1)
150 LINE INPUT #1, line$
160 ' PRINT LEFT$ (line$, 78)
170 li = li + 1
180 ' lin = lin + 1
190 ptr = INSTR (line$, " '")
200 line2$ = MID$ (line$, ptr+2)
210 ' PRINT LEFT$ (line2$, 78)
220 PRINT #2, line2$
230 ' IF lin = 23 THEN lin = 0 : PRINT : PRINT "Press RETURN
to Continue " ; : WHILE INKEY$ = "" : WEND : PRINT
240 WEND
250 CLOSE
260 ' PRINT
270 PRINT UPPER$ (file2$) " is" STR$ (li) "-lines long."
280 PRINT
290 END

The proper functionning of those 2 BASIC programs was checked by
comparing the output file with the original file (ASM --> BSM
--> BIS). My COMPARE.BAS program found them (ASM and BIS) to be
identical, no matter how long the files were.

By the way, for the record, the Amstrad PCW8256 was furnished
with only one BASIC program: RPED (later, I was told this is the
acronym of "Roland Perry's EDitor"). This BASIC program was
defining a 200-lines array, where the user could load/save any
ASCII file. 200 lines is roughly 4 pages, while we have already
seen that 1167 lines are 21 pages...

(And TOASM is as long as RPED... but RPED was "full-screen",
while TOASM is line-oriented. I have never seen any program
like TOASM published in the English magazines catering to the
PCW, but I only read a few from time to time, since I was not a
subscriber to them.)

Ha! By the way, here is TODATA, which provided the impetus for
all those thoughts.

10 REM TODATA.BAS by Emmanuel ROCHE
20 :
30 ' DATA line = 1(2345)0 + " DATA " + line read
40 :
50 PRINT
60 INPUT "TODATA> Enter ASC File Name: " ; file$
70 PRINT
80 file1$ = file$ + ".ASC"
90 OPEN "I", #1, file1$
100 file2$ = file$ + ".DAT"
110 OPEN "O", #2, file2$
120 li = 0
130 ' lin = 0
140 WHILE NOT EOF (1)
150 li = li + 1
160 LINE INPUT #1, line$
170 ' PRINT MID$ (STR$ (li), 2) "0 DATA " LEFT$ (line$, 68)
180 PRINT #2, MID$ (STR$ (li), 2) "0 DATA " line$
190 ' lin = lin + 1
200 ' IF lin = 23 THEN lin = 0 : PRINT : PRINT "Press RETURN
to Continue " ; : WHILE INKEY$ = "" : WEND : PRINT
210 WEND
220 CLOSE
230 ' PRINT
240 PRINT UPPER$ (file2$) " is" STR$ (li) "-lines long."
250 PRINT
260 END

system

A>That's all, folks! (Time to go to bed...)

Yours Sincerely,
"French Luser"


EOF