r{min,delta,greedy}The problem with general quantifiers is that their compilation
expands the regex min amount of times in the bytecode. With
an example such as ((a{20}){20}){20}, this generates 20^3
instructions.
Implementing general quantifiers in the PikeVM can be approached from two perspectives, either,
? and ?? quantifiers to the PikeVM
mechanization and prove equivalence between general
quantifiers and a regex AST rewrite + usage of
*/*?/?/??,
orr min times, then doing either of
*/*?/?/??.Since the complexity proof of linearity is in terms of the bytecode length, the second approach would work well. Otherwise, the first approach requires incorporating a separate AST rewrite step. It will be easy to incorporate in proofs if it is done in the compile function. That might be also good grounds for supporting proofs of AST rewrites for optimization.
Adding anchors to the PikeVM was a straightforward change. The following changes were needed:
Add a CheckAnchor bytecode instruction and define
its semantics
PikeVM’s small step semantics were extended to process this
instruction. This is done by handling the CheckAnchor
instruction when computing the epsilon step of a thread. When handling
it, we check if the anchor is satisfied by using the
anchor_satisfied function. It is exactly the same check
performed by the backtracking tree semantics.
Compile the Anchor AST node to a
CheckAnchor instruction. The next instruction to be
executed is stored at the program counter following
CheckAnchor’s.
Add Anchor to the subset of regexes the PikeVM
supports. Add AnchorPass to the subset of backtracking
trees the PikeVM supports.
Handle AnchorPass in the PikeTree algorithm by
simply continuing exploration of the backtracking tree.
UNCLEAR WHAT THIS IS: Add anchor support to
tree_rep in TreeRep.v
State that a representation of the bytecode can start with a CheckAnchor instruction.
Finally, update all proofs to handle the new cases.