2025_09

Collected notes

Leftmost longest vs leftmost greedy

Both prioritize finding a match as early in the haystack as possible. Then, the second thing to consider is how long the match is. Given a choice in the matching (from e1|e2 or e* etc), the longest semantics picks choices such that the final match is the longest possible. For greedy, we pick choices such that will lead to the match the quickest.

Nullable quantifiers

If e can match the empty string, iterations of e{0,n} cannot match the empty string.

Capture reset

After each quantifier iteration, the values of all capture groups inside are reset to undefined. So /(?:(a)|b)*/ on “ab” will match the entirety of “ab”, but the group (a) will be undefined.

Backereferences to a group that did not match anything

Then the backreference matches the empty string. This applied to both backreferences to a group which did not match anything preceding the reference, and to a group following the reference.

To discuss

Getting started with Linden requires installing Warblre, would be good to include in the README. Or, if possible make the dune project reference the github repo with a pinned version

opam pin add warblre https://github.com/LindenRegex/Warblre.git
Does the backtracking tree allow to reason about leftmost longest semantics as well?
- Although, look-arounds in the semantics (namely lk_result) returns the groups of the first branch that matched.
Why the nullable quantifiers are only for min = 0?
Why does the Warblre matching function return the end index only? What about the start? -> see below the “full” algorithm of matching

Action items

Setup VSCode to work with Rocq, talk with Victor
Pin the Warblre version in Linden and link it to the github URL
Continue reading the paper
Start working on anchors in PikeVM
Do the exercise from branch aurele/exercise
Currently, matching is always anchored at some position in the haystack, and the general matching algorithm is defined as:
```
for i in 0..len(s):
    if anchored_match(r, s[i:]):
        return Matched
```
We want to establish an equivalence between the above algorithm an a rewrite of the regex into .*?(r)_#0: the regex is wrapped in a group indexed zero and is prefixed by a lazy dot-star.

Once this is proven, we can prove that the prefix acceleration is also equivalent to this algorithm.