Formalize substring searches on string. The
definition should be roughly:
str_search : string -> string -> option nat.
This function should follow some axioms. Given
Definition starts_with (p s : string) := p = List.firstn (length p) s.
forall s ss i, str_search s ss = Some i -> starts_with ss (List.skipn i s)forall s ss i, str_search s ss = Some i -> forall i', i' < i -> ~ (starts_with ss (List.skipn i' s))forall s ss, str_search s ss = None -> forall i, i < length s -> ~ (starts_with ss (List.skipn i s))Connect these to operating on input. To do so, a wrapper
function is defined in terms of str_search:
input_search : input -> string -> option input where
the result is an advanced input. The following can be proven about it
using only the axiomatization of str_search:
forall i1 i2 p, input_search i1 p = Some i2 -> strict_suffix i1 i2 forwardforall a1 b1 a2 b2 p, input_search (a1, b1) p = Some (a2, b2) -> starts_with p a2forall a1 b1 a2 b2 p, input_search (a1, b1) p = Some (a2, b2) -> forall s1 s2, a1 = s1 ++ s2 ++ a2 -> ~ (starts_with p s2)forall a1 b1 p, input_search (a1, b1) = None -> forall s1 s2, a1 = s1 ++ s2 -> ~ (starts_with p s2)Some of these axioms might be not needed for the proofs. They encompass everything about the substring search, but tells us more than what we need to know.
Formalize literal extraction. The definition should be roughly:
extract_literal : regex -> literal. Where
Inductive literal : Type :=
| Exact (s : string) (* the entire match is exactly `s` *)
| Prefix (s : string) (* the match starts with `s` *)
| None. (* this indicates the match cannot exist, as opposed to Prefix [] which means we do not know anything about the match *)
Definition prefix (l : literal) := match l with
| Exact s => s
| Prefix s => s
| None => []
end.
This function should follow these theorems:
forall r tree a b result,
priotree_inp r (a, b) tree ->
first_leaf tree (a, b) = Some result ->
starts_with (prefix (extract_literal r)) a
Exact s:admit.
TODO: how to specify it and is it useful?
None literal means there is no match:forall r tree inp,
priotree_inp r inp tree ->
extract_literal r = None ->
first_leaf tree inp = NoneWrite a function prefix_accelerated_pikevm which
completes an entire (unanchored) search. Prove equivalence between the
prefix accelerated version and the full search algorithm.
[^]*?(r)_#0 and the for-loop
algorithm of running r anchored at each index. This has to
be done in Warblre, out of scope for this thesis, assume it is
proven.[^]*?(r)_#0 where
r is restricted to the PikeVM regex features is equivalent
to Warblre’s regex.prefix_accelerated_pikevm to the
[^]*?(r)_#0 regex.To do 3., we first can prove equivalence with respect to
first_leaf of a backtracking tree and a “prefix tree” which
skips branches that will never match due to lacking of a prefix.
Instead of a prefix_accelerated_pikevm I think a more
general full-search algorithm that will be continuously improved is
better. Especially since we want to extract it to a function at some
point. This can be done in the future.
The accelerated version can skip running the pikevm completely if the prefix extraction returns an Exact.