The Kollos code
Table of contents
* [About Kollos](#about-kollos)
* [Kollos semantics](#kollos-semantics)
* [VM operations](#vm-operations)
* [VM debug operation](#vm-debug-operation)
* [VM no-op operation](#vm-no-op-operation)
* [VM bail operation](#vm-bail-operation)
* [VM result operations](#vm-result-operations)
* [VM "result is undef" operation](#vm-result-is-undef-operation)
* [VM "result is token value" operation](#vm-result-is-token-value-operation)
* [VM "result is N of RHS" operation](#vm-result-is-n-of-rhs-operation)
* [VM "result is N of sequence" operation](#vm-result-is-n-of-sequence-operation)
* [VM operation: result is constant](#vm-operation-result-is-constant)
* [Operation of the values array](#operation-of-the-values-array)
* [VM "push undef" operation](#vm-push-undef-operation)
* [VM "push one" operation](#vm-push-one-operation)
* [Find current token literal](#find-current-token-literal)
* [VM "push values" operation](#vm-push-values-operation)
* [VM operation: push start location](#vm-operation-push-start-location)
* [VM operation: push length](#vm-operation-push-length)
* [VM operation: push G1 start location](#vm-operation-push-g1-start-location)
* [VM operation: push G1 length](#vm-operation-push-g1-length)
* [VM operation: push constant onto values array](#vm-operation-push-constant-onto-values-array)
* [VM operation: set the array blessing](#vm-operation-set-the-array-blessing)
* [VM operation: result is array](#vm-operation-result-is-array)
* [VM operation: callback](#vm-operation-callback)
* [Run the virtual machine](#run-the-virtual-machine)
* [Find and perform the VM operations](#find-and-perform-the-vm-operations)
* [VM-related utilities for use in the Perl code](#vm-related-utilities-for-use-in-the-perl-code)
* [Return operation key given its name](#return-operation-key-given-its-name)
* [Return operation name given its key](#return-operation-name-given-its-key)
* [Register a constant](#register-a-constant)
* [Register semantics for a token](#register-semantics-for-a-token)
* [Register semantics for a nulling symbol](#register-semantics-for-a-nulling-symbol)
* [Register semantics for a rule](#register-semantics-for-a-rule)
* [Return the top index of the stack](#return-the-top-index-of-the-stack)
* [Return the value of a stack entry](#return-the-value-of-a-stack-entry)
* [Set the value of a stack entry](#set-the-value-of-a-stack-entry)
* [The Kollos valuator](#the-kollos-valuator)
* [Initialize a valuator](#initialize-a-valuator)
* [Reset a valuator](#reset-a-valuator)
* [Preliminaries to the main code](#preliminaries-to-the-main-code)
bf2a558f14c663091120f585ef578ac9
This is the code for Kollos, the "middle layer" of Marpa.
Below it is Libmarpa, a library written in
the C language which contains the actual parse engine.
Above it is code in a higher level language -- at this point Perl.
This document is evolving. Most of the "middle layer" is still
in the Perl code or in the Perl XS, and not represented here.
This document only contains those portions converted to Lua or
to Lua-centeric C code.
0588207666332b8ab79ee6ce927acf39
```
-- miranda: section C function declarations
struct kollos_extraspace {
int ref_count;
int (*warn)(const char* format, ...);
};
```
1173d082650b8da97d45d0ceae562b13
A Kollos object is a Lua interpreter.
It keeps its own reference count, in its Lua registry.
`kollos_newstate()`,
the constructor, creates one reference
and gives its caller ownership.
When
the reference count falls to zero,
the interpreter (Kollos object) is destroyed.
```
f762992f41110c0c078f3c1c436e9e91
```
Add a recce to the Kollos object, returning its
"lua_id".
The inner SLR C structure is passed in for now,
because it uses a lot of PERL/XS data structures.
```
-- miranda: section+ C function declarations
#define MT_NAME_RECCE "Marpa_recce"
int kollos_recce_new(lua_State* L, void* slr);
-- miranda: section+ Lua interpreter management
int kollos_recce_new(lua_State* L, void* slr)
{
int lua_id;
const int base_of_stack = marpa_lua_gettop(L);
marpa_lua_checkstack(L, 20);
/* Lua stack: [] */
/* Create a table for this recce */
marpa_lua_newtable(L);
/* Lua stack: [ recce_table ] */
/* No lock held -- SLR must delete recce table in its */
/* destructor. */
/* Set the metatable for the recce table */
marpa_luaL_setmetatable(L, MT_NAME_RECCE);
/* Lua stack: [ recce_table ] */
6eedf5417da179eb71571c000aa3d4b2
```
```
-- miranda: section+ Lua interpreter management
static int default_warn(const char *format, ...)
{
va_list args;
va_start (args, format);
vfprintf (stderr, format, args);
va_end (args);
fputs("\n", stderr);
return 1;
}
```
`kollos_refinc()`
creates a new reference
to a Kollos interpreter,
and takes ownership of it.
```
c931b63385dafd21a62eb0e8709756a7
```
Give up ownership of a reference to a Kollos interpreter.
Deletes the interpreter if the reference count drops to zero.
```
cb713645523ea88be87f42361e0dba6a
```
Set the warning function of the Kollos interpreter.
```
528da25dd58dfd12adaaaced9e09c9fa
```
Write a warning message using Kollos's warning handler.
```
019847ce5a4f72f384e52f423c3fb2ab
```
`kollos_tblrefinc()`
creates a new reference
to a Kollos interpreter,
and takes ownership of it.
```
09b07510ddc72177a3a3120856377a8c
```
Give up ownership of a reference to a Kollos interpreter.
Deletes the interpreter if the reference count drops to zero.
```
ec06c93253d8e2aea2bfb67899873444
```
5b9a9112f57b37f75fd55930c211df74
Initially, Marpa's semantics were performed using a VM (virtual machine)
of about two dozen
operations. I am converting them to Lua, one by one. Once they are in
Lua, the flexibility in defining operations becomes much greater than when
they were in C/XS. The set of operations which can be defined becomes
literally open-ended.
With Lua replacing C, the constraints which dictated the original design
of this VM are completely altered.
It remains an open question what becomes of this VM and its operation
set as Marpa evolves.
For example,
at the extreme end, every program in the old VM could be replaced with
one that is a single instruction long, with that single instruction
written entirely in Lua.
If this were done, there no longer would be a VM, in any real sense of the
word.
4ae16c025d4dc446cd1f3a6f1caf8f52
A return value of -1 indicates this should be the last VM operation.
A return value of 0 or greater indicates this is the last VM operation,
and that there is a Perl callback with the contents of the values array
as its arguments.
A return value of -2 or less indicates that the reading of VM operations
should continue.
Note the use of tails calls in the Lua code.
Maintainters should be aware that these are finicky.
In particular, while `return f(x)` is turned into a tail call,
`return (f(x))` is not.
2f313f5ac2063d40305558f28f692d57
Was used for development.
Perhaps I should delete this.
```
-- miranda: section VM operations
c4904e2f72cc7db901151f0779f32e68
```
44a4004e70e89025a1c728637ca1d67f
This is to be kept after development,
even if not used.
It may be useful in debugging.
```
-- miranda: section+ VM operations
9722a601245229d5e245e63bc47d98ff
```
e3f53a85136413a47ccd6a60af813a59
This is to used for development.
Its intended use is as a dummy argument,
which, if it is used by accident
as a VM operation,
fast fails with a clear message.
```
-- miranda: section+ VM operations
b2ddb036f551a285ac82152397e4ae2f
```
34aa84263d21d3e220f21ceec0d7a2c7
If an operation in the VM returns -1, it is a
"result operation".
The actual result is expected to be in the stack
at index `recce.v.step.result`.
The result operation is not required to be the
last operation in a sequence,
and
a sequence of operations does not have to contain
a result operation.
If there are
other operations after the result operation,
they will not be performed.
If a sequence ends without encountering a result
operation, an implicit "no-op" result operation
is assumed and, as usual,
the result is the value in the stack
at index `recce.v.step.result`.
32f82d9c075283bbb3cb4174a8c16470
Perhaps the simplest operation.
The result of the semantics is a Perl undef.
```
-- miranda: section+ VM operations
f705c48395e0763e5c2ecd34571c941f
```
fe90344d864bba24a43f406f6d3af84d
The result of the semantics is the value of the
token at the current location.
It's assumed to be a MARPA_STEP_TOKEN step --
if not the value is an undef.
```
-- miranda: section+ VM operations
6c1506ea8934f792b7481a36a07bb608
```
3b34f3dc418a7f6e4b2205e6c5422596
```
-- miranda: section+ VM operations
function op_fn_result_is_n_of_rhs(recce, rhs_ix)
if recce.v.step.type ~= 'MARPA_STEP_RULE' then
return op_fn_result_is_undef(recce)
end
local stack = recce.v.stack
local result_ix = recce.v.step.result
repeat
if rhs_ix == 0 then break end
local fetch_ix = result_ix + rhs_ix
if fetch_ix > recce.v.step.arg_n then
stack[result_ix] = marpa.sv.undef()
break
end
stack[result_ix] = stack[fetch_ix]
until 1
return -1
end
```
cd80be994a4857cc082c703e5179040d
In `stack`,
set the result to the `item_ix`'th item of a sequence.
`stack` is a 0-based Perl AV.
Here "sequence" means a sequence in which the separators
have been kept.
For those with separators discarded,
the "N of RHS" operation should be used.
```
-- miranda: section+ VM operations
function op_fn_result_is_n_of_sequence(recce, item_ix)
if recce.v.step.type ~= 'MARPA_STEP_RULE' then
return op_fn_result_is_undef(recce)
end
local result_ix = recce.v.step.result
local fetch_ix = result_ix + item_ix * 2
if fetch_ix > recce.v.step.arg_n then
return op_fn_result_is_undef(recce)
end
local stack = recce.v.stack
if item_ix > 0 then
stack[result_ix] = stack[fetch_ix]
end
return -1
end
```
40e7018a85d565854fdb79cd9ef85870
Returns a constant result.
```
-- miranda: section+ VM operations
function op_fn_result_is_constant(recce, constant_ix)
local constants = recce:constants()
local constant = constants[constant_ix]
local stack = recce.v.stack
local result_ix = recce.v.step.result
stack[result_ix] = constant
if recce.trace_values > 0 and recce.v.step.type == 'MARPA_STEP_TOKEN' then
local top_of_queue = #recce.trace_values_queue
recce.trace_values_queue[top_of_queue+1] =
{ "valuator unknown step", recce.v.step.type, recce.token, constant}
-- io.stderr:write('valuator unknown step: ', inspect(recce))
end
return -1
end
```
213d43d4daa1b8176e29a5e16b995b06
The following operations add elements to the `values` array.
This is a special array which may eventually be the result of the
sequence of operations.
c97efefd1aeba097cce106af5310b8ac
Push an undef on the values array.
```
-- miranda: section+ VM operations
58963e0854b0c375e2c6b40b308b2992
```
3105ae29bd58291bc331b2d8337f713f
Push one of the RHS child values onto the values array.
```
-- miranda: section+ VM operations
c094031e860a42f1c89b5355fd3aa8a5
```
607f9d33c8519236b071d773a415bdc3
`current_token_literal` return the literal
equivalent of the current token.
It assumes that there *is* a current token,
that is,
it assumes that the caller has ensured that
`recce.v.step.type ~= 'MARPA_STEP_TOKEN'`.
```
-- miranda: section+ VM operations
function current_token_literal(recce)
if recce.token_is_literal == recce.v.step.value then
local start_es = recce.v.step.start_es_id
local end_es = recce.v.step.es_id
return recce:literal_of_es_span(start_es, end_es)
end
return recce.token_values[recce.v.step.value]
end
```
9eac4c7ab8443d2b081c4a567504ce44
Push the child values onto the `values` list.
If it is a token step, then
the token at the current location is pushed onto the `values` list.
If it is a nulling step, the nothing is pushed.
Otherwise the values of the RHS children are pushed.
`increment` is 2 for sequences where separators must be discarded,
1 otherwise.
```
-- miranda: section+ VM operations
7c77d53f881ebede53b16ce6e2a313fe
```
bd96c22212c42eeb154d27741a7c3907
The current start location in input location terms -- that is,
in terms of the input string.
```
-- miranda: section+ VM operations
function op_fn_push_start(recce)
local values = recce:values()
local start_es = recce.v.step.start_es_id
local end_es = recce.v.step.es_id
local next_ix = marpa.sv.top_index(values) + 1;
local start, l = recce:span(start_es, end_es)
values[next_ix], _ = recce:span(start_es, end_es)
return -2
end
```
9881048d9a6628ba3918cec5c11799a7
The length of the current step in input location terms --
that is, in terms of the input string
```
-- miranda: section+ VM operations
function op_fn_push_length(recce)
local values = recce:values()
local start_es = recce.v.step.start_es_id
local end_es = recce.v.step.es_id
local next_ix = marpa.sv.top_index(values) + 1;
_, values[next_ix] = recce:span(start_es, end_es)
return -2
end
```
5060000ef023dbe5de55262804395b4e
The current start location in G1 location terms -- that is,
in terms of G1 Earley sets.
```
-- miranda: section+ VM operations
function op_fn_push_g1_start(recce)
local values = recce:values()
local next_ix = marpa.sv.top_index(values) + 1;
values[next_ix] = recce.v.step.start_es_id
return -2
end
```
a076ffdd247bf6503b3cefd8f3e7aa9a
The length of the current step in G1 terms --
that is, in terms of G1 Earley sets.
```
-- miranda: section+ VM operations
function op_fn_push_g1_length(recce)
local values = recce:values()
local next_ix = marpa.sv.top_index(values) + 1;
values[next_ix] = (recce.v.step.es_id
- recce.v.step.start_es_id) + 1
return -2
end
```
20c9eb68722cd214070737fa5ca649a5
```
-- miranda: section+ VM operations
function op_fn_push_constant(recce, constant_ix)
local constants = recce:constants()
-- io.stderr:write('constants: ', inspect(constants), "\n")
-- io.stderr:write('constant_ix: ', constant_ix, "\n")
-- io.stderr:write('constants top ix: ', marpa.sv.top_index(constants), "\n")
e5b1392dc0c6abacb78b2d015aba714f
```
24ea833cc539cfcdcd2eaa2cc836ac32
The blessing is registered in a constant, and this operation
lets the VM know its index. The index is cleared at the beginning
of every sequence of operations
```
-- miranda: section+ VM operations
function op_fn_bless(recce, blessing_ix)
recce.v.step.blessing_ix = blessing_ix
return -2
end
```
c9752cce1826bbf8bd52c0d93bc788b6
This operation tells the VM that the current `values` array
is the result of this sequence of operations.
```
-- miranda: section+ VM operations
function op_fn_result_is_array(recce)
local blessing_ix = recce.v.step.blessing_ix
local values = recce:values()
if blessing_ix then
local constants = recce:constants()
local blessing = constants[blessing_ix]
marpa.sv.bless(values, blessing)
end
local stack = recce.v.stack
local result_ix = recce.v.step.result
stack[result_ix] = values
return -1
end
```
e38e80875d78f9d94f7aea75359401de
Tells the VM to create a callback to Perl, with
the `values` array as an argument.
The return value of 3 is a vestige of an earlier
implementation, which returned the size of the
`values` array.
```
-- miranda: section+ VM operations
function op_fn_callback(recce)
local step_type = recce.v.step.type
if step_type ~= 'MARPA_STEP_RULE'
and step_type ~= 'MARPA_STEP_NULLING_SYMBOL'
then
io.stderr:write(
'Internal error: callback for wrong step type ',
step_type
)
os.exit(false)
end
local blessing_ix = recce.v.step.blessing_ix
if blessing_ix then
local values = recce:values()
local constants = recce:constants()
local blessing = constants[blessing_ix]
marpa.sv.bless(values, blessing)
end
return 3
end
```
73a6d1e740a3922624acc288aa423f48
```
-- miranda: section+ VM operations
function do_ops(recce, ops)
local op_ix = 1
while op_ix <= #ops do
local op_code = ops[op_ix]
if op_code == 0 then return -1 end
if op_code ~= op_lua then
end
-- io.stderr:write('op_code: ', inspect(op_code), '\\n')
-- io.stderr:write('op_lua: ', inspect(op_lua), '\\n')
local fn_key = ops[op_ix+1]
-- io.stderr:write('ops: ', inspect(ops), '\\n')
-- io.stderr:write('fn_key: ', inspect(fn_key), '\\n')
-- io.stderr:write('fn name: ', recce.op_fn_key[fn_key], '\\n')
local arg = ops[op_ix+2]
-- io.stderr:write('arg: ', inspect(arg), '\\n')
if recce.trace_values >= 3 then
local queue = recce.trace_values_queue
local tag = 'starting lua op'
queue[#queue+1] = {'starting op', recce.v.step.type, 'lua'}
queue[#queue+1] = {tag, recce.v.step.type, recce.op_fn_key[fn_key]}
-- io.stderr:write('starting op: ', inspect(recce))
end
op_fn = recce[fn_key]
result = op_fn(recce, arg)
if result >= -1 then return result end
op_ix = op_ix + 3
end
return -1
end
```
6140eb2bf9dee8f905549d49d3ea6377
Determine the appropriate VM operations for this
step, and perform them.
Return codes are
c774493032c8e76e2ac94bf262124d1c
The mnemonic for these codes is
that they represent the size of the list returned to Perl,
with "trace" and "do not return" being special cases.
```
-- miranda: section+ VM operations
function find_and_do_ops(recce)
local ops = {}
recce:step()
if recce.v.step.type == 'MARPA_STEP_INACTIVE' then
return 0
end
if recce.v.step.type == 'MARPA_STEP_RULE' then
ops = recce.rule_semantics[recce.v.step.rule]
if not ops then
ops = recce.rule_semantics.default
end
goto DO_OPS
end
if recce.v.step.type == 'MARPA_STEP_TOKEN' then
ops = recce.token_semantics[recce.v.step.symbol]
if not ops then
ops = recce.token_semantics.default
end
goto DO_OPS
end
if recce.v.step.type == 'MARPA_STEP_NULLING_SYMBOL' then
ops = recce.nulling_semantics[recce.v.step.symbol]
if not ops then
ops = recce.nulling_semantics.default
end
goto DO_OPS
end
if true then return 1 end
::DO_OPS::
if not ops then
error(string.format('No semantics defined for %s', recce.v.step.type))
end
local do_ops_result = do_ops(recce, ops)
local stack = recce.v.stack
-- truncate stack
local above_top = recce.v.step.result + 1
for i = above_top,#stack do stack[i] = nil end
if do_ops_result > 0 then return 3 end
if #recce.trace_values_queue > 0 then return -1 end
return -2
end
```
794c0d2d40add6139468882ae5dbcaf1
The following operations are used by the higher-level Perl code
to set and discover various Lua values.
0024e3157be76a7c016404cb7d408f19
```
-- miranda: section Utilities for Perl code
function get_op_fn_key_by_name(recce, op_name_sv)
local op_name = tostring(op_name_sv)
return recce.op_fn_key[op_name]
end
```
8a24df87bd9b12aa7ec6c4cddddbb11e
```
-- miranda: section+ Utilities for Perl code
function get_op_fn_name_by_key(recce, op_key_sv)
local op_key = op_key_sv + 0
return recce.op_fn_key[op_key]
end
```
ba87cf8544db8bb5e9bcdb3dd936c060
Register a constant, returning its key.
```
-- miranda: section+ Utilities for Perl code
function constant_register(recce, constant_sv)
local constants = recce:constants()
local next_constant_key = marpa.sv.top_index(constants) + 1
constants[next_constant_key] = constant_sv
return next_constant_key
end
```
f1c0d0d498b68bbd08dfdaf213f231ff
Register the semantic operations, `ops`, for the token
whose id is `id`.
```
-- miranda: section+ Utilities for Perl code
function token_register(...)
local args = {...}
local recce = args[1]
local id = args[2]+0
local ops = {}
for ix = 3, #args do
ops[#ops+1] = args[ix]+0
end
recce.token_semantics[id] = ops
end
```
51dda6c2d8175b580a563748f50efaa2
Register the semantic operations, `ops`, for the nulling symbol
whose id is `id`.
```
-- miranda: section+ Utilities for Perl code
function nulling_register(...)
local args = {...}
local recce = args[1]
local id = args[2]+0
local ops = {}
for ix = 3, #args do
ops[#ops+1] = args[ix]+0
end
recce.nulling_semantics[id] = ops
end
```
e1bc5b0da11d1f83680c5c20e585e76d
Register the semantic operations, `ops`, for the rule
whose id is `id`.
```
-- miranda: section+ Utilities for Perl code
function rule_register(...)
local args = {...}
local recce = args[1]
local id = args[2]+0
local ops = {}
for ix = 3, #args do
ops[#ops+1] = args[ix]+0
end
recce.rule_semantics[id] = ops
end
```
657ce8475e95a259230444aeea05d9c3
```
-- miranda: section+ Utilities for Perl code
function stack_top_index(recce)
return recce.v.step.result
end
```
f776cb87d6732a84fa565bcb393aefec
```
-- miranda: section+ Utilities for Perl code
function stack_get(recce, ix)
local stack = recce.v.stack
return stack[ix+0]
end
```
5d2cf9e4a56c7aa8cd3b2600a5bbc0c5
```
-- miranda: section+ Utilities for Perl code
function stack_set(recce, ix, v)
local stack = recce.v.stack
stack[ix+0] = v
end
```
93bd16199227703f4bb77f151470ab35
The "valuator" portion of Kollos produces the
value of a
Kollos parse.
f68f883f3bab97e21a4e28c792ff41fc
Called when a valuator is set up.
```
-- miranda: section value_init()
2674ae7e02b665352a038689838411a1
```
e0b664f8a57aa6845fb38fa50f79792b
A function to be called whenever a valuator is reset.
```
2a6a5a35b18fed4f18b29869200ac0c6
```
0bb205e52ab5d9d6f92cfbf378ea5a5c
```
-- miranda: section main
-- miranda: insert preliminaries to main
-- miranda: insert VM operations
-- miranda: insert value_init()
-- miranda: insert value_reset()
-- miranda: insert Utilities for Perl code
1d4f7117746e39d7c6b8c5915a3e863b
```
008c8d0e2055f208839d69c9a1299458
Licensing, etc.
```
ecb25d23c5db0f269c2310a1776e5ac5
```
b9aa31f9c830844a9f7ecc8b53ff2e58
```
-- miranda: section kollos_c
-- miranda: language c
-- miranda: insert preliminaries to the c library code
-- miranda: insert kollos Lua library
-- miranda: insert Lua interpreter management
/* vim: set expandtab shiftwidth=4: */
```
```
-- miranda: section kollos Lua library
static int marpa_luaopen_kollos(lua_State *L)
{
/* Create the main kollos object */
marpa_lua_newtable(L);
/* [ kollos ] */
return 1;
}
```
c1dc4fef417f78f9c22d924760102c3e
```
-- miranda: section preliminaries to the c library code
/*
** Permission is hereby granted, free of charge, to any person obtaining
** a copy of this software and associated documentation files (the
** "Software"), to deal in the Software without restriction, including
** without limitation the rights to use, copy, modify, merge, publish,
** distribute, sublicense, and/or sell copies of the Software, and to
** permit persons to whom the Software is furnished to do so, subject to
** the following conditions:
**
** The above copyright notice and this permission notice shall be
** included in all copies or substantial portions of the Software.
**
** THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
** SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
**
** [ MIT license: http://www.opensource.org/licenses/mit-license.php ]
*/
6f532d11c2dd03308f7b1a7755dbb97f
```
3e8cf17bd7fc69b11cbb03bc92fa958a
```
-- miranda: section kollos_h
-- miranda: language c
-- miranda: insert preliminary comments of the c header file
63307eac5875bc8ac69a0b2da5e32644
```
6cde91623079bafdce3fae120cee966a
```
-- miranda: section preliminary comments of the c header file
15bec93404073dd1e5ce3a24f9ac1ebd
```