Collected notes

StrictOrder solver

As a collaborative effort of the \pi room, we have completed the implementation of a StrictOrder (a relation that is both transitive and irreflexive) solver. It finds proofs by looking for paths where cycles lead to a contradiction.

It has been made public here: https://github.com/epfl-systemf/StrictOrderSolver

Highlights:

Rust regex “filtering”

I continued looking at the generated assembly of the new prefix acceleration strategy to try to find the cause for that 2% performance regression. My understanding is that we very unfortunately hit a limit in some optimization heuristic of the Rust compiler causing one if statement to be compiled differently causing the regression.

I spent a lot of time experimenting with different implementations that would maybe avoid this issue, but none of them worked. Instead, I opted to parametrize the function with a const generic (think of it as a C++ template argument) that would decide which prefix acceleration strategy is used. Then, the compiler will monomorphise (or “specialize”) the function to the two possible strategies, generating optimal code for both. The drawback is that the code looks a bit weird and it generates a slightly larger binary.

The PR to the regex crate is here: https://github.com/rust-lang/regex/pull/1339

Reverse direction PikeVM

I added to the PikeVM a parameter describing the direction in which the PikeVM reads the input. Then, I proved a result connecting the forward and backward direction of the PikeVM:

Lemma pikevm_step_reverse :
  forall c dir pvs pvs_next1 pvs_next2,
    pike_vm_step c dir pvs pvs_next1 ->
    pike_vm_step c (direction_reverse dir) (pvs_reverse pvs) pvs_next2 ->
    pvs_next1 = pvs_reverse pvs_next2.

This will be useful because for lookarounds, we will want to run the PikeVM on a reversed regex and a reversed input. But reversing an input is an expensive operation (imagine reading a 1GB file, reversing it would allocate another 1GB and would take linear time). So instead, we will simply run the PikeVM in the backward direction. Per the lemma, this will return a result that is the same (modulo some mapping at the end) as if we ran the PikeVM forward on the reversed input.

Lookaround phases

To prove lookarounds there will be three phases for the formalization

  1. Add oracle queries to the PikeVM and a one-step lookaround check for PikeTree. We then relate both Pike algorithms. The correctness theorems have to be changed to say that the group map values of the final result is not correct for groups inside of lookarounds
  2. Create the oracles. This requires extending the PikeVM with something that can find all possible end positions for matches
  3. Reconstruct lookaround groups. After we are done matching, we know at which positions the lookarounds matched and thus we re-run the PikeVM at those positions to find the capture group values

To discuss

Action items