2025_11

Collected notes

PLMW & POPL

POPL 2026 happens in Rennes, which is not that far from Lausanne. Given that I am at the end of my Masters, I believe attending POPL now is the perfect timing. What’s more, PLMW which focuses on mentoring people trying to find their way in the PL space is the exact experience I need. My main issue is that it happens at the end of my thesis, which I imagine will be the most stressful period. I am afraid attending POPL might not be responsible due to this. On the other hand, losing a week of work on my thesis compared to potentially figuring out what to do in my future seems like an easy sacrifice to make.

I have applied to Student Volunteering (which would cover the cost of the conference ticket (~1000 euro!)) and to PLMW funding (which would cover the conference and PLMW ticket).

PikeTree empty trees

To help proving equivalence between the (dot-starred) PikeTree and the prefixed PikeVM, we add a new rule to the PikeTree which allows it to skip trees that do not have any result in them.

This has to maintain the invariant that when the PikeTree performs a step, the final result of the algorithm does not change. Proving so is simple, as it only requires to argue that the result of the PikeTree on some active trees (t, gm) :: rest is the same as the result on rest whenever first_leaf t inp = None.

This extra PikeTree rule should make it easier to argue the prefixed PikeVM which also skips input positions. At these positions inp we know ~ starts_with (prefix (extract_literal r)) inp and thus first_leaf (compute_tree r) inp = None.

Look-arounds in PikeVM

Using ideas from https://aurele-barriere.github.io/papers/linearjs.pdf.

Matching look-arounds is done in 3 stages:

Construct the oracle table which tells us at which position the look-around matches
Match the main expression by consulting the table
Reconstruct capture groups

Look-around matching positions

The goal of the first phase is to find all positions where the look-around matches. To do so we construct a set of input positions P(\ell_i) where the look-around \ell_i matches.

For look-behinds we need to know the ending positions (what is the position of the end of a match of a look-behind)
For look-aheads we need to know the starting positions (what is the position of the start of a match of a look-ahead)

Remember that we wish to preserve linear time complexity. Thus to construct a set P(\ell_i) we cannot run \ell_i at every input position (that takes O(|r| \cdot |s|^2)). We need to differently find all overlapping matches. To do so we will compile each look-around \ell_i separately while replacing it’s Accept instruction with WriteLK i, and replacing it’s inner look-arounds \ell_j with a CheckLK j. Let compileLK \ell_i be such a compilation of a look-around.

We now have two approaches to use the compiled look-arounds to compute all P(\ell_i):

- For look-behinds run compileLK \ell_i prefixed by /.*?/. Upon WriteOracle i, extend P(\ell_i) with the current position (ending position of the look-behind).
- For look-aheads run compileLK rev(\ell_i) prefixed by /.*?/ on the reversed haystack. Upon WriteOracle i, extend P(\ell_i) with the current position (starting position of the look-ahead).
This option allows to skip formalization of rev:
- For look-behinds run compileLK \ell_i wrapped by a new group with index k prefixed by /.*?/. Upon WriteOracle i, extend P(\ell_i) with the current position (ending position of the look-behind), but also store on the side the start position of the group k.
- For look-aheads run compileLK \ell_i wrapped by a new group with index k prefixed by /.*?/. Upon WriteOracle i, extend P(\ell_i) with the start position of group k starting position of the look-ahead), but also store on the side the current position.

We repeat the above for every look-around. Collected group maps are discarded. Now all P(\ell_i) are filled. It is important however to fill P for more nested look-arounds first. The P’s of more inner look-arounds might be queried with CheckLK by their outer look-arounds. A given query CheckLK i succeeds if:

pos(\ell_i) = positive and the current position is in P(\ell_i), or
pos(\ell_i) = negative and the current position is not in P(\ell_i),

where pos(\ell_i) maps a look-around to it’s positivity, that is both /(?=)/ and /(?<=)/ are positive while /(?!)/ and /(?<!)/ are not.

Formalization

Reverse of regexes
Reverse of inputs === backwards direction
forall p in P, r matches at p

Main execution

We run the main regex with top-level look-arounds replaced with CheckLK as described before. We additionally note in last(\ell_i) at which position we last executed CheckLK for each look-around. In the second approach we also store the other position saved on the side.

Formalization

This is quite analogous to step 1, maybe can be generalized to share the idea?

Reconstructing capture groups

Thanks to the capture reset property, we use last(\ell_i) to see where to rerun the look-around this time saving the capture groups. This can be skipped for look-arounds that contain no groups. Depending on the chosen strategy from step 1 we:

- For look-behinds last(\ell_i) is the end position. So we run rev(\ell_i) from position last(\ell_i) on the reversed input
- For look-aheads last(\ell_i) is the start position. So we run \ell_i from position last(\ell_i) on the input
- For look-behinds last(\ell_i) has both the start and end position. So we run \ell_i from position the start position of last(\ell_i) on the input
- For look-aheads last(\ell_i) has both the start and end position. So we run \ell_i from position the start position of last(\ell_i) on the input

While doing the above we might need to execute CheckLK again for more inner look-arounds. We again save the positions in last and reconstruct groups for the inner look-arounds as well.

The produced group maps are merged together with the main regex group map. Step 2 determined the match range, step 3 produces the final group map and thus we matched the entire regex.

Formalization

Capture reset
Running those look-arounds always succeeds
We might need to require that in the regex each group has a unique index

The second strategy is broken

Yeah. No work. Consider /.*?(a*)/. On the string “aaaa” this should fill P with all indices since it matches everywhere. What will be the starting indices though? It will be always say it started one character before. We must record the leftmost first, which this method does not capture.

Linden CI

A barebones CI added to Linden to check if every proof passes: https://github.com/LindenRegex/Linden/pull/13

To discuss

Streaming alg for look-behinds with captures? Just log the group map on every match of the look-behind? -> No, consider /(?<=(a*)(a*))t/ on “aaaaat”. This would register the first group as “aaaaa” and the second as ““. However, the correct semantics is to have the first group be”” and the second “aaaaa”.
we can completely remove input_search from the PikeVM. Using the generic algorithm with prefix acc works! Yes? -> well, yes. But it would be quite difficult to formalize what is a PikeVM that starts a thread at every position but stops working once we run out of threads

Action items

Write down which theorems are needed for the look-around formalization
Implement the new prefix accelerated PikeVM proof idea (more on it in the next weekly notes)