Arithmetic circuits for substring matching (proposed by Duru)
Can you write down an efficient arithmetic circuit to prove that a bunch of strings are substrings of a given string ?
Arithmetic circuits (with nondeterminate witnesses)
We have chosen a large fixed prime number . An arithmetic circuit is a circuit (composed of wires and gates); each wire carries a value in the finite field , and each gate is either addition or multiplication. (Arithmetic circuits are analogous to Boolean circuits, which you may be familiar with.)
A string is a sequence of integers in, say, (note that ).
The arithmetic circuit should take inputs (the characters in ), for and (the characters of the purported substrings; in practice, we have ); and some additional “nondeterminate witnesses” . The circuit should compute a single field element as output.
The circuit should have the following property, for any fixed ’s and ’s:
- There exist such that the circuit evaluates to if and only if each is a substring of .
We wish our arithmetic circuit to have the lowest possible complexity (which is defined as the number of multiplication gates).
Motivation
“Arithmetic circuits” are the basic model of computation used in zero-knowledge proofs. A solution to the problem will let us prove that some strings are substrings of a string — while keeping the string (and potentially the substrings as well) secret.
As a particular application, is a “mobile driver’s license” (MDL) — a serialization of the data on a California driver’s license — and the ’s are substrings that one wants to publicize (e.g. name). Typical lengths for and the ’s are 3000 and 32; the number of substrings might be 15.
Possible solutions
In the specific context of MDL pods, all the small strings are the same length – you can do it with Merkle trees of hashes of all possible substrings of that length
One possible solution: For each , if is its position in , we can shift by in operations to be aligned with and do comparisons. This is costly when and are large: its complexity is .
Merkle tree solution: Here we prove that exists in the Merkle tree of at contiguous indices. It can be done with complexity where is the complexity of a single hash function.
What is a Merkle tree?
_A Merkle tree is a cryptographic binary-tree data structure that allows us to prove that an element exists in a list of size $n$ in complexity $O(\log n)$_ _For a deeper explanation see [this link](https://decentralizedthoughts.github.io/2020-12-22-what-is-a-merkle-tree/)_Extension problem: What if you want subsequences instead of substrings?