Collected notes

Porting Warblre to Rocq 9

Firstly, I must say, it is very nice that “Coq” is not a substring of “Rocq”. Doing a grep of “coq” returns only things that need to be migrated.

It is quite unfortunate how big of an undertaking renaming Coq to Rocq is. Anyway, to port Linden to Rocq (which should be a very small effort), we first must port Warblre. I have sent a PR: https://github.com/LindenRegex/Warblre/pull/6. There are a few blockers, alectryon using SerAPI (which does not work with Rocq 9) being the main one. Rocq 9 seems to have fixed quite a few bugs that were hit by Warblre which allowed to clear some LATER’s.

\square of the prefix accelerated PikeVM!

PikeTree acceleration

Before we were missing a step in the PikeTree which would be analogous to PikeVM’s _acc step (one that jumps to the next input position where the prefix matches). This was now added and is the following:

| pts_acc:
  (* if there are no more active or blocked trees and we have some nextt, *)
  (* we accelerate by non-deterministically skipping branches with no results *)
    forall inp best seen nextinp nextt acc t
      (ACC: pike_tree_acc inp nextt nextinp acc t),
      pike_tree_step (PTS inp [] best [] (Some nextt) seen) (PTS nextinp [pike_tree_initial_tree t] best [] (Some acc) initial_seentrees)

pike_tree_acc is a non-deterministic skipping of head branches that do not contain a match. The definition is the following:

Inductive pike_tree_acc : input -> tree -> input -> tree -> tree -> Prop :=
| acc_keep:
    forall inp c next pref nextt t1 t2
    (INPUT: inp = Input (c::next) pref)
    (NEXTT: nextt = Read c
      (Progress
        (Choice
          t1
          (GroupAction (Reset []) t2)))
    ),
    pike_tree_acc inp nextt (Input next (c::pref)) t2 t1
| acc_skip:
    forall inp c next pref nextt t1 t2 nextinp acc t
    (INPUT: inp = Input (c::next) pref)
    (NEXTT: nextt = Read c
      (Progress
        (Choice
          t1
          (GroupAction (Reset []) t2)))
    )
    (LEAF: first_leaf t1 (Input next (c::pref)) = None)
    (TRANS: pike_tree_acc (Input next (c::pref)) t2 nextinp acc t),
    pike_tree_acc inp nextt nextinp acc t.

The transitive rule exists only if we skipped a branch, we do not want to skip some branches in the middle of the tree. So the final nextinp is the input point to which we accelerated, acc is the next nextt, and t is the backtracking tree of r at nextinp.

The proof of preservation of the PikeTree invariant for the _acc step requires induction over pike_tree_acc. We relate pts_acc to the PikeVM by skipping exactly nextprefix amount of head branches. Since nextprefix indicates that there is no prefix at the next nextprefix input positions, we know the first_leaf will be None.

Additionally, the pts_final rule had to be updated:

| pts_final:
  (* moving to a final state when there are no more active or blocked trees *)
    forall inp best nextt seen
      (LEAF: option_flat_map (fun t => first_leaf t inp) nextt = None),
      pike_tree_step (PTS inp [] best [] nextt seen) (PTS_final best)

This option should be used only whenever nextt contains no results. We cannot just say that we use this rule whenever nextt = None as in the PikeVM. The reason is that nextprefix = None -> nextt = None does not hold, more details in the discussion of the PikeEquiv invariant.

PikeEquiv invariant

We want to prove that for every execution of the PikeVM, there exists one of the PikeTree which preserves some invariant. Until now it talked about how we can related active threads to active trees, blocked threads to blocked trees, etc. Now we must relate nextt to nextprefix. The intuitive idea is that we want to say two things:

The invariant is split into five cases:

  1. When nextprefix = None, nextt = None

  2. When nextprefix = None, nextt = Some Mismatch (we exhausted nextt)

  3. When nextprefix = None, nextt = Some (Read -> Choice t1 t2) where t1 contains no results

    Case 2 and 3 is needed because the PikeTree never turns a Some nextt into a None. It will skip head branches one by one. On the other hand, if the PikeVM does not find the next position of a prefix, nextprefix will turn into None.

  4. When nextprefix = Some 0, nextt = Some (Read -> Choice t1 t2) where t1 and t2 have the correct shape

  5. When nextprefix = Some (S n), nextt = Some (Read -> Choice t1 t2) where t2 has the correct shape and t1 contains no results

This invariant is proven to hold at the initial state and is preserved on steps. This concludes the proof of correctness of the prefix accelerated PikeVM!

Look-arounds?

A part of my thesis was supposed to be around formalizing look-arounds in the PikeVM. There is a bit less than 2 months left of my thesis. I will dedicate a day to think about what kind of lemmas we need (https://github.shilangyu.dev/pikevm-systemf/meeting-notes/2025_11_12.html#look-arounds-in-pikevm) and see if there is some self contained section that can be tackled. It is unlikely I can complete the entire formalization in the time that is left.

Instead, I will most likely focus on improving the meta engine. This will keep the topic of the thesis more contained. Here are possible next extensions:

PRs to Linden

Having finished the first big proof of a prefix accelerated PikeVM, we feel more secure about the changes I introduced and we can start merging my work into Linden. The work is split into three parts:

  1. Substring searches, literal extraction and their relation to regex matching
  2. Small meta engine (running prefix acceleration once at the start, handling the Impossible literal)
  3. Prefix accelerated PikeVM and its correctness proof

First two have been now prepared and requested for merging:

  1. https://github.com/LindenRegex/Linden/pull/15
  2. https://github.com/LindenRegex/Linden/pull/16

Before the third part is merged we want to explore some possible refactorings of the proof. Mainly around the invariant in PikeEquiv.

To discuss

Action items