POPL 2026 happens in Rennes, which is not that far from Lausanne. Given that I am at the end of my Masters, I believe attending POPL now is the perfect timing. What’s more, PLMW which focuses on mentoring people trying to find their way in the PL space is the exact experience I need. My main issue is that it happens at the end of my thesis, which I imagine will be the most stressful period. I am afraid attending POPL might not be responsible due to this. On the other hand, losing a week of work on my thesis compared to potentially figuring out what to do in my future seems like an easy sacrifice to make.
I have applied to Student Volunteering (which would cover the cost of the conference ticket (~1000 euro!)) and to PLMW funding (which would cover the conference and PLMW ticket).
To help proving equivalence between the (dot-starred) PikeTree and the prefixed PikeVM, we add a new rule to the PikeTree which allows it to skip trees that do not have any result in them.
This has to maintain the invariant that when the PikeTree performs a
step, the final result of the algorithm does not change. Proving so is
simple, as it only requires to argue that the result of the PikeTree on
some active trees (t, gm) :: rest is the same as the result
on rest whenever first_leaf t inp = None.
This extra PikeTree rule should make it easier to argue the prefixed
PikeVM which also skips input positions. At these positions
inp we know
~ starts_with (prefix (extract_literal r)) inp and thus
first_leaf (compute_tree r) inp = None.
Using ideas from https://aurele-barriere.github.io/papers/linearjs.pdf.
Matching look-arounds is done in 3 stages:
The goal of the first phase is to find all positions where the look-around matches. To do so we construct a set of input positions P(\ell_i) where the look-around \ell_i matches.
Remember that we wish to preserve linear time complexity. Thus to construct a set P(\ell_i) we cannot run \ell_i at every input position (that takes O(|r| \cdot |s|^2)). We need to differently find all overlapping matches. To do so we will compile each look-around \ell_i separately while replacing it’s Accept instruction with WriteLK i, and replacing it’s inner look-arounds \ell_j with a CheckLK j. Let compileLK \ell_i be such a compilation of a look-around.
We now have two approaches to use the compiled look-arounds to compute all P(\ell_i):
/.*?/.
Upon WriteOracle i, extend P(\ell_i) with the current position (ending
position of the look-behind)./.*?/ on the reversed haystack. Upon WriteOracle
i, extend P(\ell_i) with the
current position (starting position of the look-ahead).rev:
/.*?/. Upon
WriteOracle i, extend P(\ell_i) with the current position (ending
position of the look-behind), but also store on the side the start
position of the group k./.*?/. Upon
WriteOracle i, extend P(\ell_i) with the start position of group
k starting position of the look-ahead),
but also store on the side the current position.We repeat the above for every look-around. Collected group maps are discarded. Now all P(\ell_i) are filled. It is important however to fill P for more nested look-arounds first. The P’s of more inner look-arounds might be queried with CheckLK by their outer look-arounds. A given query CheckLK i succeeds if:
where pos(\ell_i) maps a
look-around to it’s positivity, that is both /(?=)/ and
/(?<=)/ are positive while /(?!)/ and
/(?<!)/ are not.
We run the main regex with top-level look-arounds replaced with CheckLK as described before. We additionally note in last(\ell_i) at which position we last executed CheckLK for each look-around. In the second approach we also store the other position saved on the side.
Thanks to the capture reset property, we use last(\ell_i) to see where to rerun the look-around this time saving the capture groups. This can be skipped for look-arounds that contain no groups. Depending on the chosen strategy from step 1 we:
While doing the above we might need to execute CheckLK again for more inner look-arounds. We again save the positions in last and reconstruct groups for the inner look-arounds as well.
The produced group maps are merged together with the main regex group map. Step 2 determined the match range, step 3 produces the final group map and thus we matched the entire regex.
Yeah. No work. Consider /.*?(a*)/. On the string “aaaa”
this should fill P with all indices
since it matches everywhere. What will be the starting indices though?
It will be always say it started one character before. We must record
the leftmost first, which this method does not capture.
A barebones CI added to Linden to check if every proof passes: https://github.com/LindenRegex/Linden/pull/13
/(?<=(a*)(a*))t/ on “aaaaat”. This would register the
first group as “aaaaa” and the second as ““. However, the correct
semantics is to have the first group be”” and the second “aaaaa”.