2025_10

Collected notes

PikeVM bytecode optimizer

when generating r* we do a ResetRegs. We can remove all ResetRegs where r has no groups.
when generating r* when the minimum matching length of r is 1 or larger, the BeginLoop and EndLoop is not needed (the boolean check will always succeed)

Prefix acceleration once with abstract engines

We define what it means to be an engine. It simply should accept a regex and an input, and return the result of the matching. The only axiom it must follow is that the returned result from the execution is the same as the one from the backtracking tree semantics first_leaf. We additionally allow engines to constrain the supported subset of regexes (for instance, the PikeVM does not support backreferences).

Then, we prove that the generic algorithm of

for i in 0..len(input):
    if match(r, input[i..]):
        return result

gives exactly the same result as

start_pos = input_search(extract_literal(r), input)
if start_pos == None:
    return None

for i in start_pos..len(input):
    if match(r, input[i..]):
        return result

Namely, this talks about allowing the skip some initial input, but also talks about if the literal is not present in the input, then we can skip the search all together.

This proof works for any engine having the match function!

Future work: an additional proof that the for i algorithm is equivalent to just running the .*?r regex (which, might be significantly more performant in some cases) yields the proof that we can run the engine just once.

Implementation of the PikeVM algorithm with prefix acceleration

The previous result is generic to any engine. But it means we can run prefix acceleration only a single time. To run it more than once, we must tightly integrate with the way the PikeVM works.

The idea is two fold:

Simulate the .*? directly in the PikeVM rather than actually compiling .*?
Whenever the active/blocked threads of the PikeVM are in a favorable arrangement, run the prefix acceleration

(see my paper notes in the drawer for a more in depth exploration of this topic)

To discuss

Computing the fuel needed for the functional version, shouldn’t this be the number of steps proved in Complexity? -> Yes, we leave it as an axiom for now. Will be proven later.

Action items

Clean up and finish proof about running prefix acc once
Try to add a proof that lit = Impossible means that there is no match
Implement a non-deterministic tree that simulates .*?r. Then prove that there exists a tree of this sort that computes this .*?r. At a given position when starts_with is true, it should always try r. But if it is false, it should non-deterministically try r or not. Every tree should result in the same first_leaf result.
Start thinking about look-arounds in the PikeVM