The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

The Kollos code

Table of contents

* [About Kollos](#about-kollos) * [Kollos semantics](#kollos-semantics) * [VM operations](#vm-operations) * [VM debug operation](#vm-debug-operation) * [VM no-op operation](#vm-no-op-operation) * [VM bail operation](#vm-bail-operation) * [VM result operations](#vm-result-operations) * [VM "result is undef" operation](#vm-result-is-undef-operation) * [VM "result is token value" operation](#vm-result-is-token-value-operation) * [VM "result is N of RHS" operation](#vm-result-is-n-of-rhs-operation) * [VM "result is N of sequence" operation](#vm-result-is-n-of-sequence-operation) * [VM operation: result is constant](#vm-operation-result-is-constant) * [Operation of the values array](#operation-of-the-values-array) * [VM "push undef" operation](#vm-push-undef-operation) * [VM "push one" operation](#vm-push-one-operation) * [Find current token literal](#find-current-token-literal) * [VM "push values" operation](#vm-push-values-operation) * [VM operation: push start location](#vm-operation-push-start-location) * [VM operation: push length](#vm-operation-push-length) * [VM operation: push G1 start location](#vm-operation-push-g1-start-location) * [VM operation: push G1 length](#vm-operation-push-g1-length) * [VM operation: push constant onto values array](#vm-operation-push-constant-onto-values-array) * [VM operation: set the array blessing](#vm-operation-set-the-array-blessing) * [VM operation: result is array](#vm-operation-result-is-array) * [VM operation: callback](#vm-operation-callback) * [Run the virtual machine](#run-the-virtual-machine) * [Find and perform the VM operations](#find-and-perform-the-vm-operations) * [VM-related utilities for use in the Perl code](#vm-related-utilities-for-use-in-the-perl-code) * [Return operation key given its name](#return-operation-key-given-its-name) * [Return operation name given its key](#return-operation-name-given-its-key) * [Register a constant](#register-a-constant) * [Register semantics for a token](#register-semantics-for-a-token) * [Register semantics for a nulling symbol](#register-semantics-for-a-nulling-symbol) * [Register semantics for a rule](#register-semantics-for-a-rule) * [Return the top index of the stack](#return-the-top-index-of-the-stack) * [Return the value of a stack entry](#return-the-value-of-a-stack-entry) * [Set the value of a stack entry](#set-the-value-of-a-stack-entry) * [The Kollos valuator](#the-kollos-valuator) * [Initialize a valuator](#initialize-a-valuator) * [Reset a valuator](#reset-a-valuator) * [Preliminaries to the main code](#preliminaries-to-the-main-code) bf2a558f14c663091120f585ef578ac9 This is the code for Kollos, the "middle layer" of Marpa. Below it is Libmarpa, a library written in the C language which contains the actual parse engine. Above it is code in a higher level language -- at this point Perl. This document is evolving. Most of the "middle layer" is still in the Perl code or in the Perl XS, and not represented here. This document only contains those portions converted to Lua or to Lua-centeric C code. 0588207666332b8ab79ee6ce927acf39 ``` -- miranda: section C function declarations struct kollos_extraspace { int ref_count; int (*warn)(const char* format, ...); }; ``` 1173d082650b8da97d45d0ceae562b13 A Kollos object is a Lua interpreter. It keeps its own reference count, in its Lua registry. `kollos_newstate()`, the constructor, creates one reference and gives its caller ownership. When the reference count falls to zero, the interpreter (Kollos object) is destroyed. ``` f762992f41110c0c078f3c1c436e9e91 ``` Add a recce to the Kollos object, returning its "lua_id". The inner SLR C structure is passed in for now, because it uses a lot of PERL/XS data structures. ``` -- miranda: section+ C function declarations #define MT_NAME_RECCE "Marpa_recce" int kollos_recce_new(lua_State* L, void* slr); -- miranda: section+ Lua interpreter management int kollos_recce_new(lua_State* L, void* slr) { int lua_id; const int base_of_stack = marpa_lua_gettop(L); marpa_lua_checkstack(L, 20); /* Lua stack: [] */ /* Create a table for this recce */ marpa_lua_newtable(L); /* Lua stack: [ recce_table ] */ /* No lock held -- SLR must delete recce table in its */ /* destructor. */ /* Set the metatable for the recce table */ marpa_luaL_setmetatable(L, MT_NAME_RECCE); /* Lua stack: [ recce_table ] */ 6eedf5417da179eb71571c000aa3d4b2 ``` ``` -- miranda: section+ Lua interpreter management static int default_warn(const char *format, ...) { va_list args; va_start (args, format); vfprintf (stderr, format, args); va_end (args); fputs("\n", stderr); return 1; } ``` `kollos_refinc()` creates a new reference to a Kollos interpreter, and takes ownership of it. ``` c931b63385dafd21a62eb0e8709756a7 ``` Give up ownership of a reference to a Kollos interpreter. Deletes the interpreter if the reference count drops to zero. ``` cb713645523ea88be87f42361e0dba6a ``` Set the warning function of the Kollos interpreter. ``` 528da25dd58dfd12adaaaced9e09c9fa ``` Write a warning message using Kollos's warning handler. ``` 019847ce5a4f72f384e52f423c3fb2ab ``` `kollos_tblrefinc()` creates a new reference to a Kollos interpreter, and takes ownership of it. ``` 09b07510ddc72177a3a3120856377a8c ``` Give up ownership of a reference to a Kollos interpreter. Deletes the interpreter if the reference count drops to zero. ``` ec06c93253d8e2aea2bfb67899873444 ``` 5b9a9112f57b37f75fd55930c211df74 Initially, Marpa's semantics were performed using a VM (virtual machine) of about two dozen operations. I am converting them to Lua, one by one. Once they are in Lua, the flexibility in defining operations becomes much greater than when they were in C/XS. The set of operations which can be defined becomes literally open-ended. With Lua replacing C, the constraints which dictated the original design of this VM are completely altered. It remains an open question what becomes of this VM and its operation set as Marpa evolves. For example, at the extreme end, every program in the old VM could be replaced with one that is a single instruction long, with that single instruction written entirely in Lua. If this were done, there no longer would be a VM, in any real sense of the word. 4ae16c025d4dc446cd1f3a6f1caf8f52 A return value of -1 indicates this should be the last VM operation. A return value of 0 or greater indicates this is the last VM operation, and that there is a Perl callback with the contents of the values array as its arguments. A return value of -2 or less indicates that the reading of VM operations should continue. Note the use of tails calls in the Lua code. Maintainters should be aware that these are finicky. In particular, while `return f(x)` is turned into a tail call, `return (f(x))` is not. 2f313f5ac2063d40305558f28f692d57 Was used for development. Perhaps I should delete this. ``` -- miranda: section VM operations c4904e2f72cc7db901151f0779f32e68 ``` 44a4004e70e89025a1c728637ca1d67f This is to be kept after development, even if not used. It may be useful in debugging. ``` -- miranda: section+ VM operations 9722a601245229d5e245e63bc47d98ff ``` e3f53a85136413a47ccd6a60af813a59 This is to used for development. Its intended use is as a dummy argument, which, if it is used by accident as a VM operation, fast fails with a clear message. ``` -- miranda: section+ VM operations b2ddb036f551a285ac82152397e4ae2f ``` 34aa84263d21d3e220f21ceec0d7a2c7 If an operation in the VM returns -1, it is a "result operation". The actual result is expected to be in the stack at index `recce.v.step.result`. The result operation is not required to be the last operation in a sequence, and a sequence of operations does not have to contain a result operation. If there are other operations after the result operation, they will not be performed. If a sequence ends without encountering a result operation, an implicit "no-op" result operation is assumed and, as usual, the result is the value in the stack at index `recce.v.step.result`. 32f82d9c075283bbb3cb4174a8c16470 Perhaps the simplest operation. The result of the semantics is a Perl undef. ``` -- miranda: section+ VM operations f705c48395e0763e5c2ecd34571c941f ``` fe90344d864bba24a43f406f6d3af84d The result of the semantics is the value of the token at the current location. It's assumed to be a MARPA_STEP_TOKEN step -- if not the value is an undef. ``` -- miranda: section+ VM operations 6c1506ea8934f792b7481a36a07bb608 ``` 3b34f3dc418a7f6e4b2205e6c5422596 ``` -- miranda: section+ VM operations function op_fn_result_is_n_of_rhs(recce, rhs_ix) if recce.v.step.type ~= 'MARPA_STEP_RULE' then return op_fn_result_is_undef(recce) end local stack = recce.v.stack local result_ix = recce.v.step.result repeat if rhs_ix == 0 then break end local fetch_ix = result_ix + rhs_ix if fetch_ix > recce.v.step.arg_n then stack[result_ix] = marpa.sv.undef() break end stack[result_ix] = stack[fetch_ix] until 1 return -1 end ``` cd80be994a4857cc082c703e5179040d In `stack`, set the result to the `item_ix`'th item of a sequence. `stack` is a 0-based Perl AV. Here "sequence" means a sequence in which the separators have been kept. For those with separators discarded, the "N of RHS" operation should be used. ``` -- miranda: section+ VM operations function op_fn_result_is_n_of_sequence(recce, item_ix) if recce.v.step.type ~= 'MARPA_STEP_RULE' then return op_fn_result_is_undef(recce) end local result_ix = recce.v.step.result local fetch_ix = result_ix + item_ix * 2 if fetch_ix > recce.v.step.arg_n then return op_fn_result_is_undef(recce) end local stack = recce.v.stack if item_ix > 0 then stack[result_ix] = stack[fetch_ix] end return -1 end ``` 40e7018a85d565854fdb79cd9ef85870 Returns a constant result. ``` -- miranda: section+ VM operations function op_fn_result_is_constant(recce, constant_ix) local constants = recce:constants() local constant = constants[constant_ix] local stack = recce.v.stack local result_ix = recce.v.step.result stack[result_ix] = constant if recce.trace_values > 0 and recce.v.step.type == 'MARPA_STEP_TOKEN' then local top_of_queue = #recce.trace_values_queue recce.trace_values_queue[top_of_queue+1] = { "valuator unknown step", recce.v.step.type, recce.token, constant} -- io.stderr:write('valuator unknown step: ', inspect(recce)) end return -1 end ``` 213d43d4daa1b8176e29a5e16b995b06 The following operations add elements to the `values` array. This is a special array which may eventually be the result of the sequence of operations. c97efefd1aeba097cce106af5310b8ac Push an undef on the values array. ``` -- miranda: section+ VM operations 58963e0854b0c375e2c6b40b308b2992 ``` 3105ae29bd58291bc331b2d8337f713f Push one of the RHS child values onto the values array. ``` -- miranda: section+ VM operations c094031e860a42f1c89b5355fd3aa8a5 ``` 607f9d33c8519236b071d773a415bdc3 `current_token_literal` return the literal equivalent of the current token. It assumes that there *is* a current token, that is, it assumes that the caller has ensured that `recce.v.step.type ~= 'MARPA_STEP_TOKEN'`. ``` -- miranda: section+ VM operations function current_token_literal(recce) if recce.token_is_literal == recce.v.step.value then local start_es = recce.v.step.start_es_id local end_es = recce.v.step.es_id return recce:literal_of_es_span(start_es, end_es) end return recce.token_values[recce.v.step.value] end ``` 9eac4c7ab8443d2b081c4a567504ce44 Push the child values onto the `values` list. If it is a token step, then the token at the current location is pushed onto the `values` list. If it is a nulling step, the nothing is pushed. Otherwise the values of the RHS children are pushed. `increment` is 2 for sequences where separators must be discarded, 1 otherwise. ``` -- miranda: section+ VM operations 7c77d53f881ebede53b16ce6e2a313fe ``` bd96c22212c42eeb154d27741a7c3907 The current start location in input location terms -- that is, in terms of the input string. ``` -- miranda: section+ VM operations function op_fn_push_start(recce) local values = recce:values() local start_es = recce.v.step.start_es_id local end_es = recce.v.step.es_id local next_ix = marpa.sv.top_index(values) + 1; local start, l = recce:span(start_es, end_es) values[next_ix], _ = recce:span(start_es, end_es) return -2 end ``` 9881048d9a6628ba3918cec5c11799a7 The length of the current step in input location terms -- that is, in terms of the input string ``` -- miranda: section+ VM operations function op_fn_push_length(recce) local values = recce:values() local start_es = recce.v.step.start_es_id local end_es = recce.v.step.es_id local next_ix = marpa.sv.top_index(values) + 1; _, values[next_ix] = recce:span(start_es, end_es) return -2 end ``` 5060000ef023dbe5de55262804395b4e The current start location in G1 location terms -- that is, in terms of G1 Earley sets. ``` -- miranda: section+ VM operations function op_fn_push_g1_start(recce) local values = recce:values() local next_ix = marpa.sv.top_index(values) + 1; values[next_ix] = recce.v.step.start_es_id return -2 end ``` a076ffdd247bf6503b3cefd8f3e7aa9a The length of the current step in G1 terms -- that is, in terms of G1 Earley sets. ``` -- miranda: section+ VM operations function op_fn_push_g1_length(recce) local values = recce:values() local next_ix = marpa.sv.top_index(values) + 1; values[next_ix] = (recce.v.step.es_id - recce.v.step.start_es_id) + 1 return -2 end ``` 20c9eb68722cd214070737fa5ca649a5 ``` -- miranda: section+ VM operations function op_fn_push_constant(recce, constant_ix) local constants = recce:constants() -- io.stderr:write('constants: ', inspect(constants), "\n") -- io.stderr:write('constant_ix: ', constant_ix, "\n") -- io.stderr:write('constants top ix: ', marpa.sv.top_index(constants), "\n") e5b1392dc0c6abacb78b2d015aba714f ``` 24ea833cc539cfcdcd2eaa2cc836ac32 The blessing is registered in a constant, and this operation lets the VM know its index. The index is cleared at the beginning of every sequence of operations ``` -- miranda: section+ VM operations function op_fn_bless(recce, blessing_ix) recce.v.step.blessing_ix = blessing_ix return -2 end ``` c9752cce1826bbf8bd52c0d93bc788b6 This operation tells the VM that the current `values` array is the result of this sequence of operations. ``` -- miranda: section+ VM operations function op_fn_result_is_array(recce) local blessing_ix = recce.v.step.blessing_ix local values = recce:values() if blessing_ix then local constants = recce:constants() local blessing = constants[blessing_ix] marpa.sv.bless(values, blessing) end local stack = recce.v.stack local result_ix = recce.v.step.result stack[result_ix] = values return -1 end ``` e38e80875d78f9d94f7aea75359401de Tells the VM to create a callback to Perl, with the `values` array as an argument. The return value of 3 is a vestige of an earlier implementation, which returned the size of the `values` array. ``` -- miranda: section+ VM operations function op_fn_callback(recce) local step_type = recce.v.step.type if step_type ~= 'MARPA_STEP_RULE' and step_type ~= 'MARPA_STEP_NULLING_SYMBOL' then io.stderr:write( 'Internal error: callback for wrong step type ', step_type ) os.exit(false) end local blessing_ix = recce.v.step.blessing_ix if blessing_ix then local values = recce:values() local constants = recce:constants() local blessing = constants[blessing_ix] marpa.sv.bless(values, blessing) end return 3 end ``` 73a6d1e740a3922624acc288aa423f48 ``` -- miranda: section+ VM operations function do_ops(recce, ops) local op_ix = 1 while op_ix <= #ops do local op_code = ops[op_ix] if op_code == 0 then return -1 end if op_code ~= op_lua then end -- io.stderr:write('op_code: ', inspect(op_code), '\\n') -- io.stderr:write('op_lua: ', inspect(op_lua), '\\n') local fn_key = ops[op_ix+1] -- io.stderr:write('ops: ', inspect(ops), '\\n') -- io.stderr:write('fn_key: ', inspect(fn_key), '\\n') -- io.stderr:write('fn name: ', recce.op_fn_key[fn_key], '\\n') local arg = ops[op_ix+2] -- io.stderr:write('arg: ', inspect(arg), '\\n') if recce.trace_values >= 3 then local queue = recce.trace_values_queue local tag = 'starting lua op' queue[#queue+1] = {'starting op', recce.v.step.type, 'lua'} queue[#queue+1] = {tag, recce.v.step.type, recce.op_fn_key[fn_key]} -- io.stderr:write('starting op: ', inspect(recce)) end op_fn = recce[fn_key] result = op_fn(recce, arg) if result >= -1 then return result end op_ix = op_ix + 3 end return -1 end ``` 6140eb2bf9dee8f905549d49d3ea6377 Determine the appropriate VM operations for this step, and perform them. Return codes are c774493032c8e76e2ac94bf262124d1c The mnemonic for these codes is that they represent the size of the list returned to Perl, with "trace" and "do not return" being special cases. ``` -- miranda: section+ VM operations function find_and_do_ops(recce) local ops = {} recce:step() if recce.v.step.type == 'MARPA_STEP_INACTIVE' then return 0 end if recce.v.step.type == 'MARPA_STEP_RULE' then ops = recce.rule_semantics[recce.v.step.rule] if not ops then ops = recce.rule_semantics.default end goto DO_OPS end if recce.v.step.type == 'MARPA_STEP_TOKEN' then ops = recce.token_semantics[recce.v.step.symbol] if not ops then ops = recce.token_semantics.default end goto DO_OPS end if recce.v.step.type == 'MARPA_STEP_NULLING_SYMBOL' then ops = recce.nulling_semantics[recce.v.step.symbol] if not ops then ops = recce.nulling_semantics.default end goto DO_OPS end if true then return 1 end ::DO_OPS:: if not ops then error(string.format('No semantics defined for %s', recce.v.step.type)) end local do_ops_result = do_ops(recce, ops) local stack = recce.v.stack -- truncate stack local above_top = recce.v.step.result + 1 for i = above_top,#stack do stack[i] = nil end if do_ops_result > 0 then return 3 end if #recce.trace_values_queue > 0 then return -1 end return -2 end ``` 794c0d2d40add6139468882ae5dbcaf1 The following operations are used by the higher-level Perl code to set and discover various Lua values. 0024e3157be76a7c016404cb7d408f19 ``` -- miranda: section Utilities for Perl code function get_op_fn_key_by_name(recce, op_name_sv) local op_name = tostring(op_name_sv) return recce.op_fn_key[op_name] end ``` 8a24df87bd9b12aa7ec6c4cddddbb11e ``` -- miranda: section+ Utilities for Perl code function get_op_fn_name_by_key(recce, op_key_sv) local op_key = op_key_sv + 0 return recce.op_fn_key[op_key] end ``` ba87cf8544db8bb5e9bcdb3dd936c060 Register a constant, returning its key. ``` -- miranda: section+ Utilities for Perl code function constant_register(recce, constant_sv) local constants = recce:constants() local next_constant_key = marpa.sv.top_index(constants) + 1 constants[next_constant_key] = constant_sv return next_constant_key end ``` f1c0d0d498b68bbd08dfdaf213f231ff Register the semantic operations, `ops`, for the token whose id is `id`. ``` -- miranda: section+ Utilities for Perl code function token_register(...) local args = {...} local recce = args[1] local id = args[2]+0 local ops = {} for ix = 3, #args do ops[#ops+1] = args[ix]+0 end recce.token_semantics[id] = ops end ``` 51dda6c2d8175b580a563748f50efaa2 Register the semantic operations, `ops`, for the nulling symbol whose id is `id`. ``` -- miranda: section+ Utilities for Perl code function nulling_register(...) local args = {...} local recce = args[1] local id = args[2]+0 local ops = {} for ix = 3, #args do ops[#ops+1] = args[ix]+0 end recce.nulling_semantics[id] = ops end ``` e1bc5b0da11d1f83680c5c20e585e76d Register the semantic operations, `ops`, for the rule whose id is `id`. ``` -- miranda: section+ Utilities for Perl code function rule_register(...) local args = {...} local recce = args[1] local id = args[2]+0 local ops = {} for ix = 3, #args do ops[#ops+1] = args[ix]+0 end recce.rule_semantics[id] = ops end ``` 657ce8475e95a259230444aeea05d9c3 ``` -- miranda: section+ Utilities for Perl code function stack_top_index(recce) return recce.v.step.result end ``` f776cb87d6732a84fa565bcb393aefec ``` -- miranda: section+ Utilities for Perl code function stack_get(recce, ix) local stack = recce.v.stack return stack[ix+0] end ``` 5d2cf9e4a56c7aa8cd3b2600a5bbc0c5 ``` -- miranda: section+ Utilities for Perl code function stack_set(recce, ix, v) local stack = recce.v.stack stack[ix+0] = v end ``` 93bd16199227703f4bb77f151470ab35 The "valuator" portion of Kollos produces the value of a Kollos parse. f68f883f3bab97e21a4e28c792ff41fc Called when a valuator is set up. ``` -- miranda: section value_init() 2674ae7e02b665352a038689838411a1 ``` e0b664f8a57aa6845fb38fa50f79792b A function to be called whenever a valuator is reset. ``` 2a6a5a35b18fed4f18b29869200ac0c6 ``` 0bb205e52ab5d9d6f92cfbf378ea5a5c ``` -- miranda: section main -- miranda: insert preliminaries to main -- miranda: insert VM operations -- miranda: insert value_init() -- miranda: insert value_reset() -- miranda: insert Utilities for Perl code 1d4f7117746e39d7c6b8c5915a3e863b ``` 008c8d0e2055f208839d69c9a1299458 Licensing, etc. ``` ecb25d23c5db0f269c2310a1776e5ac5 ``` b9aa31f9c830844a9f7ecc8b53ff2e58 ``` -- miranda: section kollos_c -- miranda: language c -- miranda: insert preliminaries to the c library code -- miranda: insert kollos Lua library -- miranda: insert Lua interpreter management /* vim: set expandtab shiftwidth=4: */ ``` ``` -- miranda: section kollos Lua library static int marpa_luaopen_kollos(lua_State *L) { /* Create the main kollos object */ marpa_lua_newtable(L); /* [ kollos ] */ return 1; } ``` c1dc4fef417f78f9c22d924760102c3e ``` -- miranda: section preliminaries to the c library code /* ** Permission is hereby granted, free of charge, to any person obtaining ** a copy of this software and associated documentation files (the ** "Software"), to deal in the Software without restriction, including ** without limitation the rights to use, copy, modify, merge, publish, ** distribute, sublicense, and/or sell copies of the Software, and to ** permit persons to whom the Software is furnished to do so, subject to ** the following conditions: ** ** The above copyright notice and this permission notice shall be ** included in all copies or substantial portions of the Software. ** ** THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, ** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. ** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY ** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, ** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE ** SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ** ** [ MIT license: http://www.opensource.org/licenses/mit-license.php ] */ 6f532d11c2dd03308f7b1a7755dbb97f ``` 3e8cf17bd7fc69b11cbb03bc92fa958a ``` -- miranda: section kollos_h -- miranda: language c -- miranda: insert preliminary comments of the c header file 63307eac5875bc8ac69a0b2da5e32644 ``` 6cde91623079bafdce3fae120cee966a ``` -- miranda: section preliminary comments of the c header file 15bec93404073dd1e5ce3a24f9ac1ebd ```