204 lines
7.6 KiB
Plaintext
204 lines
7.6 KiB
Plaintext
Collapse OS' Forth implementation notes
|
|
|
|
*** EXECUTION MODEL
|
|
|
|
After having read a line through readln, we want to interpret it. As a general
|
|
rule, we go like this:
|
|
|
|
1. read single word from line
|
|
2. Can we find the word in dict?
|
|
3. If yes, execute that word, goto 1
|
|
4. Is it a number?
|
|
5. If yes, push that number to PS, goto 1
|
|
6. Error: undefined word.
|
|
|
|
*** EXECUTING A WORD
|
|
|
|
At it's core, executing a word is pushing the wordref on PS and calling EXECUTE.
|
|
Then, we let the word do its things. Some words are special, but most of them
|
|
are of the compiledWord type, and that's their execution that we describe here.
|
|
|
|
First of all, at all time during execution, the Interpreter Pointer (IP) points
|
|
to the wordref we're executing next.
|
|
|
|
When we execute a compiledWord, the first thing we do is push IP to the Return
|
|
Stack (RS). Therefore, RS' top of stack will contain a wordref to execute next,
|
|
after we EXIT.
|
|
|
|
At the end of every compiledWord is an EXIT. This pops RS, sets IP to it, and
|
|
continues.
|
|
|
|
*** Stack management
|
|
|
|
The Parameter stack (PS) is maintained by SP and the Return stack (RS) is
|
|
maintained by IX. This allows us to generally use push and pop freely because PS
|
|
is the most frequently used. However, this causes a problem with routine calls:
|
|
because in Forth, the stack isn't balanced within each call, our return offset,
|
|
when placed by a CALL, messes everything up. This is one of the reasons why we
|
|
need stack management routines below. IX always points to RS' Top Of Stack (TOS)
|
|
|
|
This return stack contain "Interpreter pointers", that is a pointer to the
|
|
address of a word, as seen in a compiled list of words.
|
|
|
|
*** Dictionary
|
|
|
|
A dictionary entry has this structure:
|
|
|
|
- Xb name. Arbitrary long number of character (but can't be bigger than
|
|
input buffer, of course). not null-terminated
|
|
- 2b prev offset
|
|
- 1b size + IMMEDIATE flag
|
|
- 2b code pointer
|
|
- Parameter field (PF)
|
|
|
|
The prev offset is the number of bytes between the prev field and the previous
|
|
word's code pointer.
|
|
|
|
The size + flag indicate the size of the name field, with the 7th bit being the
|
|
IMMEDIATE flag.
|
|
|
|
The code pointer point to "word routines". These routines expect to be called
|
|
with IY pointing to the PF. They themselves are expected to end by jumping to
|
|
the address at (IP). They will usually do so with "jp next".
|
|
|
|
That's for "regular" words (words that are part of the dict chain). There are
|
|
also "special words", for example NUMBER, LIT, FBR, that have a slightly
|
|
different structure. They're also a pointer to an executable, but as for the
|
|
other fields, the only one they have is the "flags" field.
|
|
|
|
*** System variables
|
|
|
|
There are some core variables in the core system that are referred to directly
|
|
by their address in memory throughout the code. The place where they live is
|
|
configurable by the RAMSTART constant in conf.fs, but their relative offset is
|
|
not. In fact, they're mostly referred to directly as their numerical offset
|
|
along with a comment indicating what this offset refers to.
|
|
|
|
This system is a bit fragile because every time we change those offsets, we
|
|
have to be careful to adjust all system variables offsets, but thankfully,
|
|
there aren't many system variables. Here's a list of them:
|
|
|
|
RAMSTART INITIAL_SP
|
|
+02 CURRENT
|
|
+04 HERE
|
|
+06 IP
|
|
+08 FLAGS
|
|
+0a PARSEPTR
|
|
+0c CINPTR
|
|
+0e WORDBUF
|
|
+2e BOOT C< PTR
|
|
+4e INTJUMP
|
|
+51 CURRENTPTR
|
|
+53 readln's variables
|
|
+55 adev's variables
|
|
+57 blk's variables
|
|
+59 z80a's variables
|
|
+5b FUTURE USES
|
|
+70 DRIVERS
|
|
+80 RAMEND
|
|
|
|
INITIAL_SP holds the initial Stack Pointer value so that we know where to reset
|
|
it on ABORT
|
|
|
|
CURRENT points to the last dict entry.
|
|
|
|
HERE points to current write offset.
|
|
|
|
IP is the Interpreter Pointer
|
|
|
|
FLAGS holds global flags. Only used for prompt output control for now.
|
|
|
|
PARSEPTR holds routine address called on (parse)
|
|
|
|
CINPTR holds routine address called on C<
|
|
|
|
WORDBUF is the buffer used by WORD
|
|
|
|
BOOT C< PTR is used when Forth boots from in-memory source. See "Initialization
|
|
sequence" below.
|
|
|
|
INTJUMP All RST offsets (well, not *all* at this moment, I still have to free
|
|
those slots...) in boot binaries are made to jump to this address. If you use
|
|
one of those slots for an interrupt, write a jump to the appropriate offset in
|
|
that RAM location.
|
|
|
|
CURRENTPTR points to current CURRENT. The Forth CURRENT word doesn't return
|
|
RAM+2 directly, but rather the value at this address. Most of the time, it
|
|
points to RAM+2, but sometimes, when maintaining alternative dicts (during
|
|
cross compilation for example), it can point elsewhere.
|
|
|
|
FUTURE USES section is unused for now.
|
|
|
|
DRIVERS section is reserved for recipe-specific drivers. Here is a list of
|
|
known usages:
|
|
|
|
* 0x70-0x78: ACIA buffer pointers in RC2014 recipes.
|
|
|
|
*** Word routines
|
|
|
|
This is the description of all word routine you can encounter in this Forth
|
|
implementation. That is, a wordref will always point to a memory offset
|
|
containing one of these numbers.
|
|
|
|
0x17: nativeWord. This words PFA contains native binary code and is jumped to
|
|
directly.
|
|
|
|
0x0e: compiledWord. This word's PFA contains an atom list and its execution is
|
|
described in "EXECUTION MODEL" above.
|
|
|
|
0x0b: cellWord. This word is usually followed by a 2-byte value in its PFA.
|
|
Upon execution, the *address* of the PFA is pushed to PS.
|
|
|
|
0x2b: doesWord. This word is created by "DOES>" and is followed by a 2-byte
|
|
value as well as the adress where "DOES>" was compiled. At that address is an
|
|
atom list exactly like in a compiled word. Upon execution, after having pushed
|
|
its cell addr to PSP, it execute its reference exactly like a compiledWord.
|
|
|
|
0x20: numberWord. No word is actually compiled with this routine, but atoms are.
|
|
Atoms with a reference to the number words routine are followed, *in the atom
|
|
list*, of a 2-byte number. Upon execution, that number is fetched and IP is
|
|
avdanced by an extra 2 bytes.
|
|
|
|
0x24: addrWord. Exactly like a numberWord, except that it is treated
|
|
differently by meta-tools.
|
|
|
|
0x22: litWord. Similar to a number word, except that instead of being followed
|
|
by a 2 byte number, it is followed by a null-terminated string. Upon execution,
|
|
the address of that null-terminated string is pushed on the PSP and IP is
|
|
advanced to the address following the null.
|
|
|
|
*** Initialization sequence
|
|
|
|
On boot, we jump to the "main" routine in boot.fs which does very few things.
|
|
|
|
1. Set SP to 0x10000-6
|
|
2. Sets HERE to RAMEND (RAMSTART+0x80).
|
|
3. Sets CURRENT to value of LATEST field in stable ABI.
|
|
4. Look for the word "BOOT" and calls it.
|
|
|
|
In a normal system, BOOT is in icore and does a few things:
|
|
|
|
1. Find "(parse)" and set "(parse*)" to it.
|
|
2. Find "(c<)" a set CINPTR to it (what C< calls).
|
|
3. Write LATEST in SYSTEM SCRATCHPAD ( see below )
|
|
4. Find "INIT". If found, execute. Otherwise, execute "INTERPRET"
|
|
|
|
On a bare system (only boot+icore), this sequence will result in "(parse)"
|
|
reading only decimals and (c<) reading characters from memory starting from
|
|
CURRENT (this is why we put CURRENT in SYSTEM SCRATCHPAD, it tracks current
|
|
pos ).
|
|
|
|
This means that you can put initialization code in source form right into your
|
|
binary, right after your last compiled dict entry and it's going to be executed
|
|
as such until you set a new (c<).
|
|
|
|
Note that there is no EMIT in a bare system. You have to take care of supplying
|
|
one before your load core.fs and its higher levels.
|
|
|
|
In the "/emul" binaries, "HERE" is readjusted to "CURRENT @" so that we don't
|
|
have to relocate compiled dicts. Note that in this context, the initialization
|
|
code is fighting for space with HERE: New entries to the dict will overwrite
|
|
that code! Also, because we're barebone, we can't have comments. This can lead
|
|
to peculiar code in this area where we try to "waste" space in initialization
|
|
code.
|