documentation of some value
This commit is contained in:
parent
84a5d503dc
commit
900d7ecf7e
@ -1,11 +1,11 @@
|
||||
# Abstraction
|
||||
+---------------------+
|
||||
| |
|
||||
| |
|
||||
| State register |
|
||||
| |
|
||||
| |
|
||||
+---------------------+
|
||||
+---------------------+
|
||||
| |
|
||||
| |
|
||||
| State register |
|
||||
| |
|
||||
| |
|
||||
+---------------------+
|
||||
|
||||
|
||||
+---------------------------------+
|
||||
@ -16,3 +16,87 @@
|
||||
+---------------------------------+
|
||||
| Fallback transition table |
|
||||
+---------------------------------+
|
||||
|
||||
---
|
||||
State transition table look up
|
||||
+ success --> continue
|
||||
+ fail --> look up fallback table
|
||||
+ success --> continue
|
||||
+ fail --> return
|
||||
? EOS --> look up fallback table
|
||||
+ success --> is 0 width?
|
||||
+ success --> continue
|
||||
+ fail --> return
|
||||
+ fail --> return
|
||||
---
|
||||
##### HALT\_AND\_CATCH\_FIRE
|
||||
H&C is a special state signalling that we have hit a dead end.
|
||||
The reason why need it and we cant just instanly quick is backtracking.
|
||||
|
||||
---
|
||||
##### [^example]
|
||||
This is a negative range.
|
||||
```
|
||||
let myNegativeRange = {'e', 'x', 'a', 'm', 'p', 'l'}
|
||||
```
|
||||
None of the characters in $myNegativeRange must be accepted.
|
||||
The way this is a compiled is that we first hook all chars in $myNegativeRange to H&C,
|
||||
then define an OFFSHOOT of width 1.
|
||||
Put differently:
|
||||
if we read something illegal we abort this branch,
|
||||
if what we read was not illegal, we deduct that it must have been legal and we continue.
|
||||
|
||||
Handling "negatives" this way allows us to be "alphabet agnostic" in a sense.
|
||||
Many implementations will presume ASCII, with its fixed 7/8 bit width
|
||||
and create look up tables.
|
||||
Which is fast and cute, but this strategy becomes a giant memory hog
|
||||
if we ever wanted to use it on, say UTF-8 (from 256 te/c (table entries per char) to 4'294'967'295 te/c).
|
||||
|
||||
|
||||
#### .
|
||||
This is the dot operator.
|
||||
It matches any 1 char.
|
||||
|
||||
Similar how negative ranges are implemented,
|
||||
it takes advantage of the fallback table.
|
||||
It simply ignores the state transition table and rather unconditionally hooks itself to the next state.
|
||||
|
||||
|
||||
#### ^
|
||||
This is the carrot operator.
|
||||
It matches the SOS (start of the string).
|
||||
|
||||
For explanation purposes multilining (match '\n') is irrelevant.
|
||||
That behaves just like a literal.
|
||||
|
||||
What is more interesting is how SOS is recognized.
|
||||
Since `regex_assert()` is recursive the current state is continuesly passed along,
|
||||
however at out first frame, it's not just always 0.
|
||||
`regex_match()` decides depending on the current position of the string.
|
||||
Basically we have the first 2 states (0, 1) reserved and always missing from the state transmission table.
|
||||
+ 0 - SOS
|
||||
+ 1 - !SOS
|
||||
Normally both are _hooked_ to state 2,
|
||||
and we pretend nothing has ever happened.
|
||||
But when carrot operator is compiled, it sets a special compiler flag FORCE\_START\_OF\_STRING,
|
||||
which forbids the hooking of state 1 to 2,
|
||||
therefor when `regex_match()` calls from, say position 2,
|
||||
it passes in 1 as the starting state,
|
||||
no state transition table entry will be found since thats forbidden to begin with,
|
||||
no jumps are found(!),
|
||||
the machine checks whether the current state (1) is the accepting state (>=2)
|
||||
and finally returns failiour.
|
||||
|
||||
|
||||
#### \<
|
||||
This is the SOW (start of word) operator.
|
||||
SOW must match:
|
||||
```
|
||||
^myword
|
||||
[^\h]myword
|
||||
```
|
||||
Not only that, this combination is key,
|
||||
either it has to be the start of the string
|
||||
or there has to be at least something which is not a symbol char.
|
||||
With out the last condition "eexample" would match "\\\<exaplme\\\>"
|
||||
as the iteration of `regex_match()` reaches "example".
|
||||
|
Loading…
Reference in New Issue
Block a user