The sum-check protocol

(Ingredient 1 of 5 in the PCP overview.)

Before worrying about systems of equations, let’s imagine we just have a single sum

$Z_1 + Z_2 + \dots + Z_{\text{big}} = H$

for some values $Z_i$ and $H \in \mathbb{F}_q$ , and assume further that $q$ is not too small.

Imagine a prover Peggy knows all the values $Z_i$ , and is asserting to the verifier Victor that the $Z_i$ ’s sum to $H$ . Victor wants to know that Peggy computed the sum $H$ correctly, but Victor doesn’t want to actually read all the values of $Z_i$ .

Well, at face value, this is an obviously impossible task. Even if Victor knew all but one of Peggy’s $Z_i$ ’s, that wouldn’t be good enough.

So to get anywhere, we need to give Victor at least one magic power.

An oracle to a multilinear polynomial

Assume for convenience that the number of $Z$ ’s happens to be $2^n$ and change notation to a function $f \colon \{0,1\}^n \to \mathbb{F}_q$ , so our equation becomes

$\sum_{\vec v \in \{0,1\}^n} f(\vec v) = H.$

In other words, we have changed notation so that our variables are indexed over a hypercube: from $f(0, \dots, 0)$ to $f(1, \dots, 1)$ .

Now here’s the magic power we’re granting. By polynomial interpolation, no matter what function $f$ we had initially, we can view it as multilinear polynomial $P \in \mathbb{F}_q [X_1, \dots, X_n]$ . For example, suppose $n=3$ and the eight (arbitrary) variable values were given

$\begin{aligned} f(0,0,0) &= 8 \\ f(0,0,1) &= 15 \\ f(0,1,0) &= 8 \\ f(0,1,1) &= 15 \\ f(1,0,0) &= 8 \\ f(1,0,1) &= 15 \\ f(1,1,0) &= 17 \\ f(1,1,1) &= 29. \end{aligned}$

(So $H = 8+15+8+15+8+15+17+29 = 115$ .) Then we’d be trying to fill in the blanks in the equation

$P(x,y,z) = {\square} + {\square} x + {\square} y + {\square} z + {\square} xy + {\square} yz + {\square} zx + {\square} xyz$

so that $P$ agrees with $f$ on the cube. This comes down to solving a system of linear equations; in this case it turns out that $P(x,y,z) = 5xyz + 9xy + 7z + 8$ works, and I’ve cherry-picked the numbers so a lot of the coefficients work out to $0$ for convenience, but math majors should be able to verify that $P$ exists and is unique no matter what eight initial numbers I would have picked (by induction on $n$ ).

Now here’s the magic power: we are going to let Victor make one call to a magic oracle that can tell Victor the value of $P(r_1,\dots,r_n)$ , for his choice of $(r_1, \dots, r_n) \in \mathbb{F}_q^n$ . Note importantly that the $r_i$ ’s do not have to be $0$ / $1$ , in fact we will say Victor just chooses them randomly from the much larger $\mathbb{F}_q$ . But he can only ask the oracle for that single value of $P$ , and otherwise has no idea what any of the $Z_i$ ’s are. The punch line of the protocol is that this single oracle call is good enough. If Victor has this oracle, he only needs to read one value for Peggy to convince him that $H$ was computed correctly.

A playthrough of the sum-check protocol

Let’s use the example above with $n=3$ : Peggy has chosen those eight values with $H = 115$ , and wants to convince Victor without actually sending all eight values. Peggy has done her homework and computed the coefficients of $P$ as well (after all, she chose the values of $f$ ), so Peggy can evaluate $P$ anywhere she wants. But Victor can only ask the oracle about a single value of the polynomial $P$ on a point (probably) outside the hypercube.

Here’s how they do it. (All the information sent by Peggy to Victor is boxed.)

Peggy announces her claim $\boxed{H = 115}$ .
They now discuss the first coordinate:
- Victor asks Peggy to evaluate the linear one-variable polynomial
  
  $g_1(T) := P(T,0,0) + P(T,0,1) + P(T,1,0) + P(T,1,1)$
  
  and send the result. In our example, it equals
  
  $g_1(T) = 8 + 15 + (9T+8) + (14T+15) = \boxed{23T+46}.$
- Victor then checks that this $g_1$ is consistent with the claim $H=115$ ; it should satisfy $H = g_1(0) + g_1(1)$ by definition. Indeed, $g_1(0)+g_1(1) = 46+69 = 115 = H$ .
- Finally, Victor commits to a random choice of $r_1 \in \mathbb{F}_q$ ; let’s say $r_1 = 7$ . From now on, he’ll always use $7$ for the first argument to $P$ .
With the first coordinate fixed at $r_1 = 7$ , they talk about the second coordinate:
- Victor asks Peggy to evaluate the linear polynomial
  
  $g_2(U) := P(7,U,0) + P(7,U,1).$
  
  and send the result. In our example, it equals
  
  $g_2(U) = (63U+8) + (98U+15) = \boxed{161U + 23}.$
- Victor makes sure the claimed $g_2$ is consistent with $g_1$ ; it should satisfy $g_1(r_1) = g_2(0)+g_2(1)$ . Indeed, it does: $g_1(7) = 23 \cdot 7 + 46 = 23 + 184 = g_2(0) + g_2(1)$ .
- Finally, Victor commits to a random choice of $r_2 \in \mathbb{F}_q$ ; let’s say $r_1 = 3$ . From now on, he’ll always use $3$ for the second argument to $P$ .
They now settle the last coordinate:
- Victor asks Peggy to evaluate the linear polynomial
  
  $g_3(V) := P(7,3,V)$
  
  and send the result. In our example, it equals
  
  $g_3(V) = \boxed{112V+197}.$
- Victor makes sure the claimed $g_3$ is consistent with $g_2$ ; it should satisfy $g_2(r_2) = g_3(0)+g_3(1)$ . Indeed, it does $g_2(3) = 161 \cdot 3 + 23 = 197 + 309 = g_3(0) + g_3(1)$ .
- Finally, Victor commits to a random choice of $r_3 \in \mathbb{F}_q$ ; let’s say $r_3 = -1$ .
Victor has picked all three coordinates, and is ready to consult the oracle. He gets $P(7,3,-1) = 85$ . This matches $g_3(-1) = 85$ , and the protocol ends.

General procedure

The previous transcript should generalize obviously to any $n > 3$ , but we spell it out anyways. Peggy has already announced $H$ and pre-computed $P$ . Now for $i = 1, \dots, n$ ,

Victor asks Peggy to compute the univariate polynomial $g_i$ corresponding to partial sum, where the $i$ th parameter is a free parameter while all the $r_1$ , \dots, $r_{i-1}$ have been fixed already.
Victor sanity-checks each of Peggy’s answers by making sure $g_i$ is consistent with $g_{i-1}$ (that is, $g_{i-1}(r_{i-1}) = g_i(0) + g_i(1)$ , or for the edge case $i=1$ that $H = g_1(0) + g_1(1)$ ).
Then Victor chooses a random $r_i \in \mathbb{F}_q$ and moves on to the next coordinate.

Once Victor has decided on every $r_i$ , he asks the oracle for $P(r_1, \dots, r_n)$ and makes sure that it matches the value of $g_n(r_n)$ . If so, Victor believes Peggy.

Up until now, we wrote the sum-check protocol as a sum of a multilinear polynomial over $\{0,1\}^n$ .

There is nothing special about multilinear polynomials. It would work equally well to work with a polynomial $P$ of some small degree $d$ in each variable. The only change is that Peggy’s claimed polynomials $g_i$ would have degree $d$ rather than degree $1$ . Everything else stays the same. This will be important for some applications of the sum check protocol.

In fact, there is also nothing special about $\{0,1\}^n$ and it would work equally well with $\mathbb{H}^n$ for any small finite set $\mathbb{H}$ ; then instead of the multilinear extension, you would work with polynomials that have degree $d$ in each variable.

Soundness

Why is the sum-check protocol sound? In other words, what goes wrong if Peggy tries to convince Victor of a false claim?

Let’s suppose Peggy claims a false value for the whole sum $H^{\text{false}}$ . When it comes to the first step $g_1(T)$ , if she reveals the true sum (as a polynomial in $T$ ), the consistency check $H = g_1(0) + g_1(1)$ won’t pass – and Victor won’t accept. So Peggy is forced to give a false polynomial $g_1^{\text{false}}(T)$ as well.

Now $g_1^{\text{false}}(T)$ and $g_1(T)$ are two different linear polynomials, so there is at most one value of $T$ where $g_1^{\text{false}}(T) = g_1(T)$ . If Peggy is very lucky, Victor’s random choice of $r_1$ will happen to be that one value – and Peggy can continue honestly from here on out.

Most likely, that won’t happen. The function $g_1^{\text{false}}(T)$ will have the wrong value at Victor’s random $r_1$ , so Peggy is still stuck trying to prove a false claim, now about a sum over $n-1$ variables.

Now we come to $g_2$ , and Peggy has the same dilemma. She can’t give the true $g_2$ , because the check $g_1^{\text{false}} = g_2(0) + g_2(1)$ wouldn’t pass. So she has to give a false $g_2^{\text{false}}$ . Again, unless Victor’s choice of $r_2$ is very lucky (from Peggy’s point of view), the value of $g_2^{\text{false}}(r_2)$ will again be wrong.

After $n$ steps, Victor will consult the multilinear oracle, and discover that $g_n^{\text{false}}(r_n) \neq P(r_1, \ldots, r_n)$ , and the check will fail – unless, by some very bad luck, he happened to randomly choose the one bad value for some $r_i$ .

There are $q$ possible values for each $r_i$ , so at any step, Victor’s chance of choosing the bad value is at most $1/q$ . Summing it up, since Victor makes $n$ choices, the chance that a cheating Peggy can sneak a false proof past Victor is at most $n/q$ .

Assuming $q$ is very large, this probability is very small. So if the protocol passes, Victor can be confident that Peggy’s calculation was honest.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search