Both prioritize finding a match as early in the haystack as possible.
Then, the second thing to consider is how long the match is. Given a
choice in the matching (from e1|e2 or e* etc),
the longest semantics picks choices such that the final match is the
longest possible. For greedy, we pick choices such that will lead to the
match the quickest.
If e can match the empty string, iterations of
e{0,n} cannot match the empty string.
After each quantifier iteration, the values of all capture groups
inside are reset to undefined. So /(?:(a)|b)*/ on “ab” will
match the entirety of “ab”, but the group (a) will be
undefined.
Then the backreference matches the empty string. This applied to both backreferences to a group which did not match anything preceding the reference, and to a group following the reference.
Getting started with Linden requires installing Warblre, would be good to include in the README. Or, if possible make the dune project reference the github repo with a pinned version
opam pin add warblre https://github.com/LindenRegex/Warblre.git
Does the backtracking tree allow to reason about leftmost longest semantics as well?
lk_result) returns the groups of the first branch that
matched.Why the nullable quantifiers are only for
min = 0?
Why does the Warblre matching function return the end index only? What about the start? -> see below the “full” algorithm of matching
Setup VSCode to work with Rocq, talk with Victor
Pin the Warblre version in Linden and link it to the github URL
Continue reading the paper
Start working on anchors in PikeVM
Do the exercise from branch aurele/exercise
Currently, matching is always anchored at some position in the haystack, and the general matching algorithm is defined as:
for i in 0..len(s):
if anchored_match(r, s[i:]):
return Matched
We want to establish an equivalence between the above algorithm an a
rewrite of the regex into .*?(r)_#0: the regex is wrapped
in a group indexed zero and is prefixed by a lazy dot-star.
Once this is proven, we can prove that the prefix acceleration is also equivalent to this algorithm.