collapseos/notes.txt

Collapse OS' Forth implementation notes

*** EXECUTION MODEL

After having read a line through readln, we want to interpret it. As a general
rule, we go like this:

1. read single word from line
2. Can we find the word in dict?
3. If yes, execute that word, goto 1
4. Is it a number?
5. If yes, push that number to PS, goto 1
6. Error: undefined word.

*** EXECUTING A WORD

At it's core, executing a word is pushing the wordref on PS and calling EXECUTE.
Then, we let the word do its things. Some words are special, but most of them
are of the compiledWord type, and that's their execution that we describe here.

First of all, at all time during execution, the Interpreter Pointer (IP) points
to the wordref we're executing next.

When we execute a compiledWord, the first thing we do is push IP to the Return
Stack (RS). Therefore, RS' top of stack will contain a wordref to execute next,
after we EXIT.

At the end of every compiledWord is an EXIT. This pops RS, sets IP to it, and
continues.

*** Stack management

The Parameter stack (PS) is maintained by SP and the Return stack (RS) is
maintained by IX. This allows us to generally use push and pop freely because PS
is the most frequently used. However, this causes a problem with routine calls:
because in Forth, the stack isn't balanced within each call, our return offset,
when placed by a CALL, messes everything up. This is one of the reasons why we
need stack management routines below. IX always points to RS' Top Of Stack (TOS)

This return stack contain "Interpreter pointers", that is a pointer to the
address of a word, as seen in a compiled list of words.

*** Dictionary

A dictionary entry has this structure:

- Xb name. Arbitrary long number of character (but can't be bigger than
  input buffer, of course). not null-terminated
- 2b prev offset
- 1b size + IMMEDIATE flag
- 2b code pointer
- Parameter field (PF)

The prev offset is the number of bytes between the prev field and the previous
word's code pointer.

The size + flag indicate the size of the name field, with the 7th bit being the
IMMEDIATE flag.

The code pointer point to "word routines". These routines expect to be called
with IY pointing to the PF. They themselves are expected to end by jumping to
the address at (IP). They will usually do so with "jp next".

That's for "regular" words (words that are part of the dict chain). There are
also "special words", for example NUMBER, LIT, FBR, that have a slightly
different structure. They're also a pointer to an executable, but as for the
other fields, the only one they have is the "flags" field.

*** System variables

There are some core variables in the core system that are referred to directly
by their address in memory throughout the code. The place where they live is
configurable by the RAMSTART constant in conf.fs, but their relative offset is
not. In fact, they're mostlly referred to directly as their numerical offset
along with a comment indicating what this offset refers to.

This system is a bit fragile because every time we change those offsets, we
have to be careful to adjust all system variables offsets, but thankfully,
there aren't many system variables. Here's a list of them:

RAMSTART   INITIAL_SP
+02        CURRENT
+04        HERE
+06        IP
+08        FLAGS
+0a        PARSEPTR
+0c        CINPTR
+0e        WORDBUF
+2e        BOOT C< PTR
+4e        INTJUMP
+51        CURRENTPTR
+53        readln's IN(
+55        readln's IN)
+57        readln's IN>
+59        z80a's ORG
+5b        z80a's L1
+5d        z80a's L2
+5f        z80a's L3
+61        z80a's L4
+63        z80a's L5
+65        z80a's L6
+67        FUTURE USES
+70        DRIVERS
+80        RAMEND

INITIAL_SP holds the initial Stack Pointer value so that we know where to reset
it on ABORT

CURRENT points to the last dict entry.

HERE points to current write offset.

IP is the Interpreter Pointer

FLAGS holds global flags. Only used for prompt output control for now.

PARSEPTR holds routine address called on (parse)

CINPTR holds routine address called on C<

WORDBUF is the buffer used by WORD

BOOT C< PTR is used when Forth boots from in-memory source. See "Initialization
sequence" below.

INTJUMP All RST offsets (well, not *all* at this moment, I still have to free
those slots...) in boot binaries are made to jump to this address. If you use
one of those slots for an interrupt, write a jump to the appropriate offset in
that RAM location.

CURRENTPTR points to current CURRENT. The Forth CURRENT word doesn't return
RAM+2 directly, but rather the value at this address. Most of the time, it
points to RAM+2, but sometimes, when maintaining alternative dicts (during
cross compilation for example), it can point elsewhere.

FUTURE USES section is unused for now.

DRIVERS section is reserved for recipe-specific drivers. Here is a list of
known usages:

* 0x70-0x78: ACIA buffer pointers in RC2014 recipes.

*** Word routines

This is the description of all word routine you can encounter in this Forth
implementation. That is, a wordref will always point to a memory offset
containing one of these numbers.

0x17: nativeWord. This words PFA contains native binary code and is jumped to
directly.

0x0e: compiledWord. This word's PFA contains an atom list and its execution is
described in "EXECUTION MODEL" above.

0x0b: cellWord. This word is usually followed by a 2-byte value in its PFA.
Upon execution, the *address* of the PFA is pushed to PS.

0x2b: doesWord. This word is created by "DOES>" and is followed by a 2-byte
value as well as the adress where "DOES>" was compiled. At that address is an
atom list exactly like in a compiled word. Upon execution, after having pushed
its cell addr to PSP, it execute its reference exactly like a compiledWord.

0x20: numberWord. No word is actually compiled with this routine, but atoms are.
Atoms with a reference to the number words routine are followed, *in the atom
list*, of a 2-byte number. Upon execution, that number is fetched and IP is
avdanced by an extra 2 bytes.

0x24: addrWord. Exactly like a numberWord, except that it is treated
differently by meta-tools.

0x22: litWord. Similar to a number word, except that instead of being followed
by a 2 byte number, it is followed by a null-terminated string. Upon execution,
the address of that null-terminated string is pushed on the PSP and IP is
advanced to the address following the null.

*** Initialization sequence

On boot, we jump to the "main" routine in boot.fs which does very few things.
It sets up the SP register, CURRENT and HERE to LATEST (saved in stable ABI),
then look for the BOOT word and calls it.

In a normal system, BOOT is in icore and does a few things:

1. Find "(parse)" and set "(parse*)" to it.
2. Find "(c<)" a set CINPTR to it (what C< calls).
3. Write LATEST in SYSTEM SCRATCHPAD ( see below )
4. Find "INIT". If found, execute. Otherwise, execute "INTERPRET"

On a bare system (only boot+icore), this sequence will result in "(parse)"
reading only decimals and (c<) reading characters from memory starting from
CURRENT (this is why we put CURRENT in SYSTEM SCRATCHPAD, it tracks current
pos ).

This means that you can put initialization code in source form right into your
binary, right after your last compiled dict entry and it's going to be executed
as such until you set a new (c<).

Note that there is no EMIT in a bare system. You have to take care of supplying
one before your load core.fs and its higher levels.

Also note that this initialization code is fighting for space with HERE: New
entries to the dict will overwrite that code! Also, because we're barebone, we
can't have comments. This leads to peculiar code in this area. If you see weird
whitespace usage, it's probably because not using those whitespace would result
in dict entry creation overwriting the code before it has the chance to be
interpreted.
forth: move stable ABI stuff at the top of forth.asm Now we're having a real nice and tidy forth.asm... 2020-03-30 21:02:19 -04:00			`Collapse OS' Forth implementation notes`

			`*** EXECUTION MODEL`

			`After having read a line through readln, we want to interpret it. As a general`
			`rule, we go like this:`

			`1. read single word from line`
			`2. Can we find the word in dict?`
			`3. If yes, execute that word, goto 1`
			`4. Is it a number?`
			`5. If yes, push that number to PS, goto 1`
			`6. Error: undefined word.`

			`*** EXECUTING A WORD`

			`At it's core, executing a word is pushing the wordref on PS and calling EXECUTE.`
			`Then, we let the word do its things. Some words are special, but most of them`
			`are of the compiledWord type, and that's their execution that we describe here.`

			`First of all, at all time during execution, the Interpreter Pointer (IP) points`
			`to the wordref we're executing next.`

			`When we execute a compiledWord, the first thing we do is push IP to the Return`
			`Stack (RS). Therefore, RS' top of stack will contain a wordref to execute next,`
			`after we EXIT.`

			`At the end of every compiledWord is an EXIT. This pops RS, sets IP to it, and`
			`continues.`

			`*** Stack management`

			`The Parameter stack (PS) is maintained by SP and the Return stack (RS) is`
			`maintained by IX. This allows us to generally use push and pop freely because PS`
			`is the most frequently used. However, this causes a problem with routine calls:`
			`because in Forth, the stack isn't balanced within each call, our return offset,`
			`when placed by a CALL, messes everything up. This is one of the reasons why we`
			`need stack management routines below. IX always points to RS' Top Of Stack (TOS)`

			`This return stack contain "Interpreter pointers", that is a pointer to the`
			`address of a word, as seen in a compiled list of words.`

			`*** Dictionary`

			`A dictionary entry has this structure:`

			`- Xb name. Arbitrary long number of character (but can't be bigger than`
			`input buffer, of course). not null-terminated`
			`- 2b prev offset`
			`- 1b size + IMMEDIATE flag`
			`- 2b code pointer`
			`- Parameter field (PF)`

			`The prev offset is the number of bytes between the prev field and the previous`
			`word's code pointer.`

			`The size + flag indicate the size of the name field, with the 7th bit being the`
			`IMMEDIATE flag.`

			`The code pointer point to "word routines". These routines expect to be called`
			`with IY pointing to the PF. They themselves are expected to end by jumping to`
			`the address at (IP). They will usually do so with "jp next".`

			`That's for "regular" words (words that are part of the dict chain). There are`
			`also "special words", for example NUMBER, LIT, FBR, that have a slightly`
			`different structure. They're also a pointer to an executable, but as for the`
			`other fields, the only one they have is the "flags" field.`

forth: Remove RAM offsets from stable ABI Doing this was a bit stupid. These offsets are constants. Moreover, having them in stable ABI had us construct the boot binary from the stable ABI of the host, making it very difficult to change RAMSTART for a new system. 2020-04-02 09:58:02 -04:00			`*** System variables`

			`There are some core variables in the core system that are referred to directly`
			`by their address in memory throughout the code. The place where they live is`
			`configurable by the RAMSTART constant in conf.fs, but their relative offset is`
			`not. In fact, they're mostlly referred to directly as their numerical offset`
			`along with a comment indicating what this offset refers to.`

			`This system is a bit fragile because every time we change those offsets, we`
			`have to be careful to adjust all system variables offsets, but thankfully,`
			`there aren't many system variables. Here's a list of them:`

			`RAMSTART INITIAL_SP`
			`+02 CURRENT`
			`+04 HERE`
			`+06 IP`
			`+08 FLAGS`
			`+0a PARSEPTR`
			`+0c CINPTR`
			`+0e WORDBUF`
Remove (sysv) Replace its usages with direct RAM+ offsets. The (sysv) mechanism was incompatible with cross-compilation of a full interpreter. 2020-04-10 14:57:00 -04:00			`+2e BOOT C< PTR`
wip 2020-04-02 23:21:53 -04:00			`+4e INTJUMP`
Add CURRENT* and simplify xcomp 2020-04-09 12:01:08 -04:00			`+51 CURRENTPTR`
Remove (sysv) Replace its usages with direct RAM+ offsets. The (sysv) mechanism was incompatible with cross-compilation of a full interpreter. 2020-04-10 14:57:00 -04:00			`+53 readln's IN(`
			`+55 readln's IN)`
			`+57 readln's IN>`
			`+59 z80a's ORG`
			`+5b z80a's L1`
			`+5d z80a's L2`
			`+5f z80a's L3`
			`+61 z80a's L4`
			`+63 z80a's L5`
			`+65 z80a's L6`
			`+67 FUTURE USES`
			`+70 DRIVERS`
Reserve some RAM for future features 2020-04-07 17:32:04 -04:00			`+80 RAMEND`
forth: Remove RAM offsets from stable ABI Doing this was a bit stupid. These offsets are constants. Moreover, having them in stable ABI had us construct the boot binary from the stable ABI of the host, making it very difficult to change RAMSTART for a new system. 2020-04-02 09:58:02 -04:00
			`INITIAL_SP holds the initial Stack Pointer value so that we know where to reset`
			`it on ABORT`

			`CURRENT points to the last dict entry.`

			`HERE points to current write offset.`

			`IP is the Interpreter Pointer`

			`FLAGS holds global flags. Only used for prompt output control for now.`

			`PARSEPTR holds routine address called on (parse)`

			`CINPTR holds routine address called on C<`

			`WORDBUF is the buffer used by WORD`

Remove (sysv) Replace its usages with direct RAM+ offsets. The (sysv) mechanism was incompatible with cross-compilation of a full interpreter. 2020-04-10 14:57:00 -04:00			`BOOT C< PTR is used when Forth boots from in-memory source. See "Initialization`
			`sequence" below.`
wip 2020-04-02 23:21:53 -04:00
			`INTJUMP All RST offsets (well, not all at this moment, I still have to free`
			`those slots...) in boot binaries are made to jump to this address. If you use`
			`one of those slots for an interrupt, write a jump to the appropriate offset in`
			`that RAM location.`
Add in-memory bootstrapping system This should help with the bootstrapping of non-emulated environment. For example, I have a problem with the RC2014: I can't send it bootstrap info until the ACIA is up. I need to find a way... 2020-04-03 08:31:30 -04:00
Add CURRENT* and simplify xcomp 2020-04-09 12:01:08 -04:00			`CURRENTPTR points to current CURRENT. The Forth CURRENT word doesn't return`
			`RAM+2 directly, but rather the value at this address. Most of the time, it`
			`points to RAM+2, but sometimes, when maintaining alternative dicts (during`
			`cross compilation for example), it can point elsewhere.`

Remove (sysv) Replace its usages with direct RAM+ offsets. The (sysv) mechanism was incompatible with cross-compilation of a full interpreter. 2020-04-10 14:57:00 -04:00			`FUTURE USES section is unused for now.`
rc2014: aaalmost there... Red 5 standing by. 2020-04-04 17:07:35 -04:00
Remove (sysv) Replace its usages with direct RAM+ offsets. The (sysv) mechanism was incompatible with cross-compilation of a full interpreter. 2020-04-10 14:57:00 -04:00			`DRIVERS section is reserved for recipe-specific drivers. Here is a list of`
			`known usages:`

			`* 0x70-0x78: ACIA buffer pointers in RC2014 recipes.`
Add in-memory bootstrapping system This should help with the bootstrapping of non-emulated environment. For example, I have a problem with the RC2014: I can't send it bootstrap info until the ACIA is up. I need to find a way... 2020-04-03 08:31:30 -04:00
Add word "LITA" 2020-04-11 13:13:20 -04:00			`*** Word routines`

			`This is the description of all word routine you can encounter in this Forth`
			`implementation. That is, a wordref will always point to a memory offset`
			`containing one of these numbers.`

			`0x17: nativeWord. This words PFA contains native binary code and is jumped to`
			`directly.`

			`0x0e: compiledWord. This word's PFA contains an atom list and its execution is`
			`described in "EXECUTION MODEL" above.`

			`0x0b: cellWord. This word is usually followed by a 2-byte value in its PFA.`
			`Upon execution, the address of the PFA is pushed to PS.`

			`0x2b: doesWord. This word is created by "DOES>" and is followed by a 2-byte`
			`value as well as the adress where "DOES>" was compiled. At that address is an`
			`atom list exactly like in a compiled word. Upon execution, after having pushed`
			`its cell addr to PSP, it execute its reference exactly like a compiledWord.`

			`0x20: numberWord. No word is actually compiled with this routine, but atoms are.`
			`Atoms with a reference to the number words routine are followed, *in the atom`
			`list*, of a 2-byte number. Upon execution, that number is fetched and IP is`
			`avdanced by an extra 2 bytes.`

			`0x24: addrWord. Exactly like a numberWord, except that it is treated`
			`differently by meta-tools.`

			`0x22: litWord. Similar to a number word, except that instead of being followed`
			`by a 2 byte number, it is followed by a null-terminated string. Upon execution,`
			`the address of that null-terminated string is pushed on the PSP and IP is`
			`advanced to the address following the null.`

Add in-memory bootstrapping system This should help with the bootstrapping of non-emulated environment. For example, I have a problem with the RC2014: I can't send it bootstrap info until the ACIA is up. I need to find a way... 2020-04-03 08:31:30 -04:00			`*** Initialization sequence`

			`On boot, we jump to the "main" routine in boot.fs which does very few things.`
			`It sets up the SP register, CURRENT and HERE to LATEST (saved in stable ABI),`
			`then look for the BOOT word and calls it.`

			`In a normal system, BOOT is in icore and does a few things:`

			`1. Find "(parse)" and set "(parse*)" to it.`
			`2. Find "(c<)" a set CINPTR to it (what C< calls).`
			`3. Write LATEST in SYSTEM SCRATCHPAD ( see below )`
			`4. Find "INIT". If found, execute. Otherwise, execute "INTERPRET"`

			`On a bare system (only boot+icore), this sequence will result in "(parse)"`
			`reading only decimals and (c<) reading characters from memory starting from`
			`CURRENT (this is why we put CURRENT in SYSTEM SCRATCHPAD, it tracks current`
			`pos ).`

			`This means that you can put initialization code in source form right into your`
			`binary, right after your last compiled dict entry and it's going to be executed`
			`as such until you set a new (c<).`

			`Note that there is no EMIT in a bare system. You have to take care of supplying`
			`one before your load core.fs and its higher levels.`

			`Also note that this initialization code is fighting for space with HERE: New`
			`entries to the dict will overwrite that code! Also, because we're barebone, we`
			`can't have comments. This leads to peculiar code in this area. If you see weird`
			`whitespace usage, it's probably because not using those whitespace would result`
			`in dict entry creation overwriting the code before it has the chance to be`
			`interpreted.`