PCP – A Toy Protocol

(Ingredient 4 of 5 in the PCP overview.)

A toy PCP protocol for Quad-SAT

We now have enough tools to describe a Quad-SAT protocol that will break the hearts of FedEx drivers everywhere. In summary, the overview of this protocol is going to be the following:

Peggy prints $q^E$ phone books, one phone book each for each linear combination of the given Quad-SAT equations. We’ll describe the details of the phone book contents later.

Peggy additionally prints the two posters corresponding to a low-degree polynomial extension of $A$ (we describe this exactly in the next section).

Victor picks a random phone book and runs sum-check on it.

Victor runs a low-degree test on the posters.

Victor makes sure that the phone book value he read is consistent with the posters.

Let’s dive in.

Setup

In sum-check, we saw we needed a bijection of $[N]$ into $\mathbb{H}^m$ . So let’s fix this notation now (it is annoying, I’m sorry). We’ll let $\mathbb{H}$ be a set of size $|\mathbb{H}| := \log (N)$ and set $m = \log_{|\mathbb{H}|} N$ . This means we have a bijection from $\{1, \dots, N\} \to \mathbb{H}^m$ , so we can rewrite the type-signature of $A$ to be

$A \colon \mathbb{H}^m \to \mathbb{F}_q.$

The contents of the phone books will take us a while to describe, but we can actually describe the posters right now, and we’ll do so. Earlier when describing sum-check, we alluded to the following theorem, but we’ll state it explicitly now:

Theorem (Existence of the low-degree extension)

Suppose $\phi \colon \mathbb{H}^n \to \mathbb{F}_q$ is any function. Then there exists a unique polynomial $\widetilde\phi \colon \mathbb{F}_q^n \to \mathbb{F}_q$ , which agrees with $\phi$ on the values of $\mathbb{H}^n$ and has degree at most $|\mathbb{H}|+1$ in each coordinate. Moreover, this polynomial $\widetilde \phi$ can be easily computed given the values of $\phi$ .

Proof: Lagrange interpolation and induction on $m$ . $\blacksquare$

We saw this earlier in the special case $\mathbb{H}=\{0,1\}$ and $n=3$ , where we constructed the multilinear polynomial $5xyz+9xy+7z+8$ out of eight initial values.

In any case, the posters are generated as follows. Peggy takes her known assignment $A \colon \mathbb{H}^m \to \mathbb{F}_q$ and extends it to a polynomial

$\widetilde A \in \mathbb{F}_q [T_1, \dots, T_m]$

using the above theorem; by abuse of notation, we’ll also write $\widetilde A \colon \mathbb{F}_q^m \to \mathbb{F}_q$ . She then prints the two posters we described earlier for the point-versus-line test.

Taking a random linear combination

The first step of the reduction is to try and generate just a single equation to check, rather than have to check all of them. There is a straightforward (but inefficient; we’ll improve it later) way to do this: take a random linear combination of the equations (there are $q^E$ possible combinations).

To be really verbose, if $\mathcal{E}_1$ , …, $\mathcal{E}_E$ were the equations, Victor picks random weights $\lambda_1$ , …, $\lambda_E$ in $\mathbb{F}_q$ and takes the equation $\lambda_1 \mathcal{E}_1 + \dots + \lambda_E \mathcal{E}_E$ . In fact, imagine the title on the cover of the phone book is given by the weights $(\lambda_1, \dots, \lambda_E) \in \mathbb{F}_q^m$ . Since both parties know $\mathcal E_1$ , …, $\mathcal E_E$ , they agree on which equation is referenced by the weights.

We’ll just check one such random linear combination. This is good enough because, in fact, if an assignment of the variables fails even one of the $E$ equations, it will fail the collated equation with probability $1 - 1/q$ — exactly! (To see this, suppose that equation $\mathcal E_1$ was failed by the assignment $A$ . Then, for any fixed choice of $\lambda_2, \dots, \lambda_E$ , there is always exactly one choice of $\lambda_1$ which makes the collated equation true, while the other $q-1$ all fail.)

To emphasize again: Peggy is printing $q^E$ phone books right now and we only use one. Look, I’m sorry, okay?

Sum-checking the equation (or: how to print the phone book)

Let’s zoom in on one linear combination to use sum-check on. (In other words, pick only one of the phone books at random.) Let’s agree to describe the equation using the notation

$c = \sum_{\vec\imath \in \mathbb{H}^m} \sum_{\vec\jmath \in \mathbb{H}^m} a_{\vec\imath, \vec\jmath} x_{\vec\imath} \cdot x_{\vec\jmath} + \sum_{\vec\imath \in \mathbb{H}^m} b_{\vec\imath} x_{\vec\imath}.$

In other words, we’ve changed notation so both the variables and the coefficients are indexed by vectors in $\mathbb{H}^m$ . When we actually implement this protocol, the coefficients need to be actually computed: they came out of $\lambda_1 \mathcal{E}_1 + \dots + \lambda_E {\mathcal E}_E$ . (So for example, the value of $c$ above is given by $\lambda_1$ times the constant term of $\mathcal E_1$ , plus $\lambda_2$ times the constant term of $\mathcal E_2$ , etc.)

Our sum-check protocol that we talked about earlier used a sequence $(r_1, \dots, r_n) \in \{0,1\}^n$ . For our purposes, we have these quadratic equations, and so it’ll be convenient for us if we alter the protocol to use pairs $(\vec\imath, \vec\jmath) \in \mathbb{F}_q^m \times \mathbb{F}_q^m$ instead. In other words, rather than $f(\vec v)$ our variables will be indexed instead in the following way:

$\begin{aligned} f &\colon \mathbb{H}^m \times \mathbb{H}^m \to \mathbb{F}_q \\ f(\vec\imath, \vec\jmath) & := a_{\vec\imath, \vec\jmath} A(\vec\imath) A(\vec\jmath) + \frac{1}{|\mathbb{H}|^m} b_{\vec\imath} A(\vec\imath). \end{aligned}$

Hence Peggy is trying to convince Victor that

$\sum_{\vec\imath \in \mathbb{F}_q^m} \sum_{\vec\jmath \in \mathbb{F}_q^m} f(\vec\imath, \vec\jmath) = c.$

In this modified sum-check protocol, Victor picks the indices two at a time. So in the step where Victor picked $r_1$ in the previous step, he instead picks $i_1$ and $j_1$ at once. Then instead of picking an $r_2$ , he picks a pair $(i_2, j_2)$ and so on.

Then, to run the protocol, the entries of the phone book are going to correspond to

$\begin{aligned} P &\in \mathbb{F}_q [T_1, \dots, T_m, U_1, \dots, U_m] \\ P(T_1, \dots, T_m, U_1, \dots, U_m) & := {\widetilde a}(T_1, \dots, T_m, U_1, \dots, U_m) \widetilde A(T_1, \dots, T_m) \widetilde A(U_1, \dots, U_m) \\ &\qquad + \frac{1}{|\mathbb{H}|^m} {\widetilde b}(T_1, \dots, T_m) \widetilde A(T_1, \dots, T_m) \end{aligned}$

in place of what we called $P(x,y,z)$ in the sum-check section.

I want to stress now the tildes above are actually hiding a lot of work. Let’s unpack it a bit: what does $\widetilde a$ mean? After all, when you unwind this notational mess we wrote, we realize that the $a$ ’s and $b$ ’s came out of the coefficients of the original equations $\mathcal E_k$ .

The answer is that both Victor and Peggy have a lot of arithmetic to do. Specifically, for Peggy, when she’s printing this phone book for $(\lambda_1, \dots, \lambda_E)$ , needs to apply the extension result three times:

Peggy views $a_{\vec\imath, \vec\jmath}$ as a function $\mathbb{H}^{2m} \to \mathbb{F}_q$ and extends it to a polynomial using the above; this lets us define $\widetilde a \in \mathbb{F}_q [T_1, \dots, T_m, U_1, \dots, U_m]$ as a bona fide $2m$ -variate polynomial.

Peggy does the same for $\widetilde b_{\vec\imath}$ .

Finally, Peggy does the same on $A \colon \mathbb{H}^m \to \mathbb{F}_q$ , extending it to $\widetilde A \in \mathbb{F}_q [T_1, \dots, T_m]$ . (However, this step is the same across all the phone books, so it only happens once.)

Victor has to do the same work for $a_{\vec\imath, \vec\jmath}$ and $b_{\vec\imath}$ . Victor can do this, because he picked the $\lambda$ ’s, as he computed the coefficients of his linear combination too. But Victor does not do the last step of computing $\widetilde A$ : for that, he just refers to the poster Peggy gave him, which conveniently happens to have a table of values of $\widetilde A$ .

Now we can actually finally describe the full contents of the phone book. It’s not simply a table of values of $P$ ! We saw in the sum-check protocol that we needed a lot of intermediate steps too (like the $23T+46$ , $161U+23$ , $112V+197$ ). So the contents of this phone book include, for every index $k$ , every single possible result that Victor would need to run sum-check at the $k$ th step. That is, the $k$ th part of this phone book is a big directory where, for each possible choice of indices $(i_1, \dots, i_{k-1}, j_1, \dots, j_{k-1})$ , Peggy has printed the two-variable polynomial in $\mathbb{F}_q [T,U]$ that arises from sum-check. (There are two variables rather than one now, because $(i_k, j_k)$ are selected in pairs.)

This gives Victor a non-interactive way to run sum-check. Rather than ask Peggy, consult the already printed phone book. Inefficient? Yes. Works? Also yes.

Finishing up

Once Victor runs through the sum-check protocol, at the end he has a random $(\vec\imath, \vec\jmath)$ and received the checked the phone book for $P(\vec\imath, \vec\jmath)$ .

Assuming it checks out, his other task is to verify that the accompanying posters that Peggy sent — that is, the table of values $B_0$ and $B_2$ associated to $\widetilde A$ — look like they mostly come from a low-degree polynomial. Unlike the sum-check step where we needed to hack the earlier procedure, this step is a direct application of line-versus-point test, without modification.

Up until now the phone book and posters haven’t interacted. So Victor has to do one more check: he makes sure that the value of $P(\vec\imath, \vec\jmath)$ he got from the phone book in fact matches the value corresponding to the poster $B_0$ . In other words, he does the arduous task of computing the extensions $\widetilde a$ and $\widetilde b$ , and finally verifies that

$P(\vec\imath, \vec\jmath) = {\widetilde a}(\vec\imath, \vec\jmath) B_0[\vec\imath] B_0[\vec\jmath] + \frac{1}{|\mathbb{H}|^m} \widetilde b(\vec\imath) B_0[\vec\imath]$

is actually true.

Soundness analysis

Let’s think through what happens if Peggy tries to cheat. So we have some Quad-SAT instance ( $E$ quadratic equations $\mathcal{E}_1, \ldots, \mathcal{E}_E$ in $N$ variables), and Peggy has some tuple $(x_1, \ldots, x_n)$ that’s not a solution. Or… who knows, maybe Peggy doesn’t even have values $(x_1, \dots, x_n)$ at all, just a bunch of phone books and posters full of numbers.

Well, first off, if Peggy doesn’t actually have values $(x_1, \dots, x_n)$ , the posters $B_0$ and $B_1$ can’t possibly represent a low-degree polynomial. So the line-versus-point test will fail (with very high probability).

Now assuming the line-versus-point test passes, we can assume Peggy actually has some values $(x_1, \ldots, x_n)$ – but it’s not a solution. And at least 99\% of values on the poster $B_0$ are honestly the values of this polynomial.

Victor chose some random coefficients $\lambda_1, \ldots, \lambda_E$ – these tell him which phone book to look at. There is a $1/q$ chance that the random linear combination $\lambda_1 \mathcal{E}_1 + \dots \lambda_E \mathcal{E}_E = 0$ happens to be satisfied, even though the individual equations aren’t. And of course if the equation is satisfied, Victor’s check will pass.

If not, Victor is going to run the sum-check protocol to try to prove a false claim. We already saw (in the sum-check discussion) why this check is almost certain to fail.

So in summary, if Peggy tries to send Victor anything other than an honest solution, Victor’s check will discover it with very high probability. And this is why our toy PCP protocol works.

Reasons to not be excited by the above protocol

The previous section describes a long procedure that has a PCP flavor, but it suffers from several issues (which is why we say it’s as a toy example).

Amount of reading: The amount of reading on Victor’s part is not $O(1)$ like we promised. The low-degree testing step with the posters used $O(1)$ entries, but the sum-check required reading roughly

$O(|\mathbb H|^2) \cdot (m+O(1)) \approx \frac{(\log N)^3}{\log \log N}$

entries from the phone book. The PCP theorem promises we can get that down to $O(1)$ (and in fact not just $O(1)$ elements of $\mathbb{F}_q$ , but actually $O(1)$ bits) but that’s beyond this post’s scope.

Length of proof: The procedure above involved mailing $q^E$ phone books, which is what we in the business call either “unacceptably inefficient” or “fucking terrible”, depending on whether you’re in polite company or not. The next section will show how to get this down to $O(E)$ .

Time complexity: Even though Victor doesn’t read much, Peggy and Victor both do quite a bit of computation. For example,
- Victor has to compute $\widetilde a_{\vec\imath, \vec\jmath}$ for his one phone book.
- Peggy needs to do it for every phone book.

One other weird thing about this result is that, even though Victor has to read only a small part of Peggy’s proof, he still has to read the entire problem statement, that is, the entire system of equations from the original Quad-SAT. This can feel strange because for Quad-SAT, the problem statement is of similar length to the satisfying assignment!

But there’s nothing to be done about this! Victor has to read the whole problem statement, because there’s no other way for him to know what is actually being proven. Otherwise, a cheating Peggy could change a single symbol in the statement, and turn it into something trivial or irrelevant.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search