Benchmarking multilinear Polynomial Commitment Schemes

15 minute read

Acknowledgments

This work would not have been possible without an outstanding effort from my team in the last 2 weeks. Huge thank you to Antonio Mejías Gil for implementing Hyrax and to Hossein Moghaddas for developing Brakedown - you guys are true wizards!

Previously

In the previous post we gave the intuition of why efficient commitments to multilinear polynomials are important in the context of lookups in Lasso. We aim for this entry to be self-contained, although the previous write-up provides useful context and a bigger picture.

This post

Here, we present the actual results of our technical contribution: benchmarks and implementation of 4 PC schemes:

  • Ligero: native & recursive implementation; benches,
  • Brakedown: native implementation; benches,
  • Hyrax: native implementation; benches,
  • multilinear KZG (PST13): benches.

To our knowledge, there is no single library to date that implements all the above schemes with the same backend. There exist some academic implementations of e.g. Hyrax in python; multilinear KZG with the arkworks backend in Rust; Ligero/Brakedown with ZCash backend in Rust. As such, comparing the existing implementations for performance is hardly meaningful.

Our core contribution is implementing the first 3 schemes from scratch in arkworks - and adding benchmarks to the existing KZG module (from the amazing EspressoSystems/jellyfish library, which is also built with the arkworks backend). We also implement a halo2 verifier of Ligero to allow for potential on-chain verification.

What should I aim for?

As hinted at in the previous post, our end goal should drive our design choices. We need to know what the constraints on our resources are. As such, some objectives we might wish to consider:

  • Prover time
  • Verifier time
  • Proof size
  • On-chain verification
  • Recursion friendliness

Native benchmarks

Out of the 4 chosen schemes, 3 are by nature interactive. The standard way to render such schemes non-interactive is to apply the Fiat-Shamir transform, which in practice means instantiating these with a cryptographic hash-based sponge. Hashing in-circuit (needed for recursive verification) is a few orders of magnitudes slower for hashes based on bitwise operations such as SHA256, than for algebraic hash functions such as Poseidon. Therefore, for Ligero we consider 2 variants: optimized native with SHA256-based hashing, and recursion supporting (R) with Poseidon. KZG is non-interactive to start with, so no such transformation is necessary. The results for Ligero are strong enough to let us draw some conclusions for non-interactive Brakedown and Hyrax.

Furthermore, as noted in the previous post, for the specific application to Jolt where the committed elements are “small”, we will consider additional variants of Hyrax and KZG using a modified implementation of MSM, hackily optimized for <60 bits. We call this variant small (S) to differentiate from the case of arbitrary (A) coefficients for the commit phase.

Ligero is implemented with the rate $\rho = 0.5$, achieving the fastest prover on FFT-friendly fields. As far as Brakedown parameters go, we implemented the scheme following the 3rd parameter row from Figure 2 of the Brakedown paper (relative distance $d = 0.04$, $\rho = 0.65$), like the benchmark presented in the original paper. Compared with Brakedown using linear codes with larger distance, our version suffers from a larger number of column openings t (hence larger proofs), but benefits from an improved prover time. In both linear-code-based schemes, we assume that our PCS is used within the larger context of a SNARK protocol, where the query point is an output of a random oracle. We are then able to skip the well-formedness check as described in Proximity Testing with Logarithmic Randomness.

The above gives us a total of 7 schemes:

  • KZG (S)
  • KZG (A)
  • Hyrax(S)
  • Hyrax(A)
  • Ligero
  • Ligero (R)
  • Brakedown

The last three are linear-code based, and they share the mode of operation up to the choice of a linear function used to encode each row of the coefficient matrix.

We start with polynomials that have their coefficients in the scalar field of the BN254 curve. We fix this particular curve to support potential verification on Ethereum.

The security of KZG and Hyrax is bound to the choice of the elliptic curve over which we perform group operations, and more specifically to the hardness of the discrete logarithm problem (DLP) in a certain group associated to it.

Our Ligero and Brakedown implementations are parametrized with the security parameter. We aim for at least of 128 bits of security for these schemes.

We note that, although we aim for 128-bit security for the linear-code-based schemes, instead of e.g. 110 bits for a more fair comparison to the DLP on BN254, the performance difference from opening a slightly fewer columns is only a few percentage points. We intentionally skip these to contain our parameter space.

Our contributions for Ligero, Brakedown and Hyrax are PR’ed to https://github.com/arkworks-rs/poly-commit and currently under review.

To run the all benchmarks there, checkout the relevant branch {ligero/brakedown/hyrax}-pp from https://github.com/HungryCatsStudio/poly-commit/ and execute: cargo bench.

Our contribution to Espresso Systems’ jellyfish library used for KZG has been successfully merged: https://github.com/EspressoSystems/jellyfish/pull/390.

Run the jellyfish KZG benchmarks with

  • cargo bench --bench="pcs" --features="test-srs" (for time)
  • cargo bench --bench="pcs-size" --features="test-srs" (for size).

Objective 1: Prover compute cost

All values are given in milliseconds. In all the tables that follow, an empty entry means that the benchmark was not run on purpose (e.g we were only interested in large $n$ for a few schemes). A dash means that we attempted to run the test but ran out of compute resources.

Commit time (ms)

$$ \begin{array}{|l|c|c|c|c|c|} \hline \text{Scheme \ Variables} & 12 & 14 & 16 & 18 & 20 & … & 28 \\ \hline \text{Hyrax(A)} & 4.54 & 13.26 & 45.84 & 157.19 & 528.57 & \\ \hline \text{Hyrax(S)} & 1.54 & 5.23 & 16.44 & 44.75 & 144.05 & \\ \hline \text{KZG(A)} & 4.10 & 13.62 & 43.68 & 136.87 & 474.86 & \\ \hline \text{KZG(S)} & 2.85 & 8.67 & 22.48 & 84.16 & 307.6 & \\ \hline \text{Ligero} & 1.61 & 3.63 & 10.29 & 33.18 & 128.65 & … & 25790\\ \hline \text{Ligero(R)} & 139.34 & 535.47 & 2085 & 8233 & 32704 & \\ \hline \text{Brakedown} & 3.45 & 8.60 & 18.49 & 49.09 & 131.35 & … & 23848 \\ \hline \end{array} $$

Ligero wins in terms of commit time for small-to-medium-sized polynomials. This seems to be in line with the benchmarks from the Brakedown paper. Brakedown has asymptotically linear time and indeed we were able to show that the extra $\log(n)$ factor in Ligero prover starts to be a burden for large enough $n$. Specifically, Brakedown’s asymptotic advantage is of practical interest for $n \geq 28$.

Curiously, while Hyrax(S) takes full advantage of the small coefficients (~4x speedup for coefficients ~$\frac{1}{4}$ the original), the KZG(S) implementation seems to miss out on this optimizations. We are investigating this further.

Ligero(R) performs significantly slower than Ligero for committing (and opening & verification), as expected.

Open time (ms)

Our current Hyrax implementation is fundamentally different from the rest of the schemes considered here in the sense that it achieves zero knowledge with respect to the evaluation. This means that at the end of the prover-verifier interaction, the verifier does not learn the evaluation of the committed polynomial at the requested point, but is instead convinced that a certain evaluation commitment is correct. This choice of Hyrax-PCS is motivated by the Hyrax zkSNARK and it is worth noting that there is a performance cost associated with the opening phase for the prover. A non-zk Hyrax-PCS implementation is in progress, stay tuned.

$$ \begin{array}{|l|c|c|c|c|c|}\hline\text{Scheme \ Variables} & 12 & 14 & 16 & 18 & 20 \\ \hline \text{Hyrax} & 2.01 & 3.93 & 8.69 & 21.51 & 55.86\\ \hline \text{KZG} & 4.55 & 18.04 & 64.74 & 189.95 & 606.14 \\ \hline \text{Ligero} & 14.02 & 29.26 & 60.53 & 148.21 & 385.83 \\ \hline \text{Ligero(R)} & 147.99 & 547.19 & 2118 & 8327.3 & 32892 \\ \hline \text{Brakedown} & 39.091 & 94.077 & 206.11 & 453.70 & 1006.6 \\ \hline\end{array} $$

For opening, the competition is more even but Ligero emerges victorious here for large enough $n$, narrowly overtaking multilinear KZG (which has the edge for smaller polynomials).

We note that our implementations for linear code schemes miss out on a small optimization during the opening phase: the prover shouldn’t need to re-compute the Merkle Tree, since they’ve already done so in the commit phase. Our engineer Hossein has already cut down the hashing work repeated by the prover by a number of $n^2$ (for $n$ columns) by avoiding column re-hashing in this PR to arkworks. The current inability to serialize the entire Merkle Tree leaves an extra $2 \cdot n$ hashes to be optimized at a later stage.

Objective 2: Verifier compute cost

Verify time (ms)

$$ \begin{array}{|l|c|c|c|c|c|} \hline \text{Scheme \ Variables} & 12 & 14 & 16 & 18 & 20 \\ \hline \text{Hyrax} & 1.1638 & 1.3916 & 1.7535 & 2.2111 & 2.9807 \\ \hline \text{KZG} & 2.93 & 3.66 & 3.93 & 3.83 & 3.72 \\ \hline \text{Ligero} & 10.00 & 10.79 & 11.89 & 14.98 & 20.68 \\ \hline \text{Ligero(R)} & 131.09 & 231.49 & 428.38 & 808.24 & 1578.9 \\ \hline \text{Brakedown} & 70.82 & 120.85 & 140.54 & 152.45 & 179.99 \\ \hline \end{array} $$

Ligero is the expected winner in the code-based schemes due to the small number of column openings owing to its comparatively large code distance. The code distance in Brakedown requires opening many more columns for a comparable level of security. We weren’t able to see the similar asymptotic advantage of Brakedown as for commit, at least for the values of $n$ that we benchmarked with. We conjecture that for very large $n$ this might become apparent, but such values were impractical to bench.

Objective 3.1: Proof size (bytes)

$$ \begin{array}{|l|c|c|c|c|c|} \hline \text{Scheme \textbackslash Variables} & 12 & 14 & 16 & 18 & 20 \\ \hline \text{Ligero} & 244297 & 369121 & 606329 & 1068305 & 1979817 \\ \hline \text{Brakedown} & 1901009 & 3229073 & 4232905 & 6064001 & 9549721 \\ \hline \text{Hyrax} & 2360 & 4408 & 8504 & 16696 & 33080 \\ \hline \text{KZG} & 776 & 904 & 1032 & 1160 & 1288 \\ \hline \end{array} $$

Clearly schemes involving elliptic curve operations (KZG & Hyrax) beat the rest by orders of magnitude, with the clear winner being KZG. This is expected, as the proof size for linear code PCS presented here is $\mathcal{O}({2^{n/2}})$.

Objective 3.2: Commitment size (bytes)

While not explicitly stated in the problem definition, we believe it is important to take the commitment size into account. Hyrax is the only scheme with commitment size dependent on the size of the polynomial (one elliptic curve point per row of the coefficient matrix). For all other schemes, this size is a small constant: in the case of linear-code based PCS, it’s the size of the hash digest; and in the case of KZG, it a single group element - which for BN254 happens to also fit into 64 bytes.

$$ \begin{array}{|l|c|c|c|c|c|c|c|}\hline \text{Scheme \textbackslash Variables} & 10 & 12 & 14 & 16 & 18 & 20 & 22 \\ \hline \text{Hyrax} & 2056 & 4104 & 8200 & 16392 & 32776 & 65544 & 131080 \\ \hline \text{Ligero} & 64 & 64 & 64 & 64 & 64 & 64 & 64 \\ \hline \text{Brakedown} & 64 & 64 & 64 & 64 & 64 & 64 & 64 \\ \hline \text{KZG} & 64 & 64 & 64 & 64 & 64 & 64 & 64 \\ \hline \end{array} $$

Recursive benchmarks

When we say that we are using SNARK recursion or composition, we refer to having the verifier of one (inner) scheme represented as a circuit of another (outer) scheme. We first run the prover for the inner circuit to obtain a valid (inner) proof. We then supply this proof as a (usually private) input to the outer circuit, and finally run the prover of the outer circuit, which implicitly means running an accepting inner verifier. This might seem a bit counterproductive, but there can be good reasons for this:

  • We have access to a scheme with a fast prover but large proof sizes, and yet are required to output small proofs. We can “wrap” the fast-prover scheme in a short-proof scheme.
  • We have specific requirements on the verifier logic, e.g. we have an on-chain verifier that only accepts Groth16 proofs.

Now, however, in order to convince the outer verifier of some statement, we have to run both the inner and the outer prover. We must carefully consider when the balance tips in our favor when choosing such an approach.

Note that for recursive verification, we don’t necessarily care about the “native” verifier time, as the inner-verifier is in a way subsumed into the outer prover.

For the recursive benchmarks, we currently only have the empirical results for one scheme, Ligero, which we have implemented using Axiom’s halo2-lib. At the moment, this scheme remains the IP of our friends at Modulus Labs, for whom we’ve implemented the protocol. They will soon be open-sourcing the codebase - we ask to be trusted on the results for the time being. Modulus Labs have kindly agreed to let us use the code for benchmarking, for which we thank them.

We extrapolate the number of columns that need to be opened in Ligero to the number opened in Brakedown. We provide very rough estimates for KZG and Hyrax based on the BN254 MSM benchmarks provided in https://github.com/axiom-crypto/halo2-lib#bn254-msm. Here MSM refers to multi-scalar multiplication (i. e. adding together a number of group elements, each multiplied by a potentially different scalar), which is the core opoeration in those two PCSs.

Objective 4.1: Recursive prover compute cost

This is basically the outer-prover cost described above. When implementing schemes for recursive verification, one needs to take into account the fact that the inner prover, adapted for recursion-friendliness, pays a large overhead in foregoing the efficient hash functions like SHA256 in favor of circuit-friendly Poseidon.

Proof generation time, in seconds (halo2)

$$ \begin{array}{|l|c|c|c|c|} \hline\text{Scheme \textbackslash Variables} & 14 & 16 & 18 & 20 \\ \hline\text{Hyrax*} & 64.28 & 71.18 & 139.98 & 261.66 \\ \hline\text{KZG** (k=17)} & 88.2 & 102.24 & 115.02 & 127.8 \\ \hline\text{Ligero} & 56.76 & 93.04 & 243.10 & - \\ \hline\text{Brakedown***} & - & - & - & - \\ \hline\end{array} $$

*Hyrax costs are estimated by benchmarking the proving time for two MSMs of size $2^{n/2}$ each (for a multilinear polynomial in $n$ variables). We adapt the code from halo2-lib to perform benchmarks over different numbers of base-scalar pairs.

**KZG costs are estimated by benchmarking a single pairing from halo2-lib, and multiplying by $n$. The actual computation of multilinear KZG verification involves a multipairing of $n$ pairs, which in general can be optimized to be faster than $n$ individual pairings as reported here.

***Brakedown was out of reach: we were already unable to run the Ligero verifier for large polynomials with the hardware we selected for benchmarking. This is due to over 250k+ halo2 advice cells (for $n = 18$), most of which come from the huge amount of hashing that needs to be performed in the non-interactive setting of linear-code-based PC schemes. Brakedown performs at least an order of magnitude more column openings than Ligero (under reasonable choices of parameters for both), and since the verifier needs to hash each column, we conclude that it would be infeasible within the given server setting.

In all estimates (Hyrax and KZG), we only focus on the key cost (MSM/pairing) and completely disregard auxiliary costs, e.g. to make the scheme non-interactive.

Objective 4.2: Recursive verifier compute cost

This refers to the outer-verifier costs. As noted above, in the recursive setting, the inner verifier doesn’t matter much on its own (aside from offline sanity checks if anything) - it is never run natively and its computation is subsumed into the outer prover.

For all circuits we ran in halo2, we only ever encountered $<1s$ runtimes. We conclude that even when running this in an EVM-verifier, the gas costs will be acceptable and we don’t delve on the details.

Objective 5: Recursive verifier proof size

The halo2 proof sizes are tightly coupled with the parametrization of the halo2 circuit, namely the (logarithm of the) maximum number of rows k in the advice table. Rather than reporting the proof size for varying k , we instead only present the size for the choice of k that results in the fastest prover, as noted in Objective 4.1: Recursive prover compute cost. The estimates for KZG, Hyrax and Brakedown follow the same methodology.

Proof size in bytes (halo2)

$$ \begin{array}{|l|c|c|c|c|} \hline\text{Scheme \textbackslash Variables} & 14 & 16 & 18 & 20 \\ \hline\text{Hyrax} & 9856^* & 34624 & 67712 & 67264^* \\ \hline\text{KZG (k=17)} & 124096 & 141824 & 159552 & 177280 \\ \hline\text{Ligero} & 26048 & 46112 & 85536 & - \\ \hline\text{Brakedown**} & - & - & - & - \\ \hline\end{array} $$

$*$The fastest Hyrax verifier for $n=14$ and $n=20$ uses k = 20, whereas the middle values for $n$ have the fastest runtimes with k = 19. This explains the non-doubling (roughly) proof sizes as we increase $n$.

**As for the recursive prover time, we ran out of memory for the largest instance of Ligero, and we expect the same for Brakedown.

Conclusions

Picking the right PCS is hard.

While implementing the above schemes, we realized that there are a plethora of code and parameter optimizations to be made for each, which depend on the specific optimization target (prover/verifier time, etc.). The above benchmarks are just the beginning. While many of the opening proof generation instances could be batched in the case of Lasso, the commitments have to be made to each polynomial separately.

On-chain verification of ZK proofs might be a fun application that has certainly accelerated the progress in the space of SNARKs. Nevertheless, my personal take is that the future users of this technology will be off-chain, e.g. an AI judge or proof-of-identity. We need to open up to the possibility of not having minuscule proofs; sub-second verification times might be unnecessary; and the real bottleneck becomes the prover.

Linear-code-based PCSs offer good tradeoffs in this space: high parallelism thanks to their matrix structure; flexibility from choosing the code distance; plus some schemes like Brakedown also are field-agnostic: no FFTs means fewer restrictions are placed on the group of units of that field. I’ll be surprised if we don’t see an improvement of at least one order of magnitude thanks to both the theoretical and implementation advances within the next year or two. Till then, there is no clear winner, although this PCS family seems to be a good choice for the time being.

Notes

All our benchmarks were run on the following dedicated instance:

AMD Ryzen™ 9 7950X3D

  • CPU: 16 cores / 32 threads @ 4.2 GHz

  • Generation: Raphael (Zen 4) with AMD 3D V-Cache™ Technology

  • RAM: 128 GB ECC DDR5 RAM

Future work

  • Zeromorph: transformation from any univariate scheme to multilinear. Does applying Zeromorph to known univariate schemes bring better performance?
  • BaseFold: newest work on multilinear PCS, see https://eprint.iacr.org/2023/1705.pdf.
  • Can we introduce any modifications to linear-code based PCS to benefit from “small” coefficients?
  • In next steps, we would like to provide benchmarks over BLS12-381. This curve was proposed by the Zcash team as a potential replacement for BN254, whose security is estimated to be below the initially assumed 128 bits. There are some projects underway to bring BLS12-381 curve operations as precompiles on Ethereum. The curve’s future on Ethereum is uncertain, but it has already found its way into other blockchains.
  • Upgrade to a newer circuit friendly hash function. Some of the new doubly-friendly constructions aim for a tradeoff between native and in-circuit efficiency, e.g. https://eprint.iacr.org/2023/1025.
  • Benchmarking batch openings for multiple polynomials and/or multiple queries. Ligero doesn’t have homomorphic commitments, so while opening multiple points for the same polynomial can be achieved by taking linear combinations, opening multiple polynomials at the same (or multiple) points(s) doesn’t have any obvious “shortcuts” in PC Schemes using hashes, unlike in EC-based schemes (see our write-up on batching in KZG).
  • More schemes and more variants!