Zock: ZK Proofs in Uqbar

Mar 24, 2022

The Zock repo is located here. It is under very active development and subject to rapid change. It will be documented upon completion.

Prelude: Nock

Uqbar's ZK prover (Zock) is designed for Nock, Urbit's assembly language Those interested in Nock can read a short primer or this detailed tutorial.

At a very high level, Nock operates on trees of numbers ("nouns"). A noun looks like [[20 83] [[10 2] 4]]. Nock programs run by evaluating two nouns, the "subject," and the "formula." The subject can be any arbitrary noun, while the formula must be a noun whose head is a number from 0 to 11, or a tree of such formulas.

Nock runs the formula against the subject, and returns a new noun.

See the primer for examples of nouns.

Uqbar

Nock Primer for ZK

Nock is a simple, cons cell-based language. Every computation is a reduction of a "subject" (arbitrary cell or number) and a "formula" (structured cons cell). There is no symbols table or anything external--all "variables" are accessed by manipulating the subject…

3 years ago · 2 comments · Hocwyn Tipwex

Overview

Uqbar validates all transactions using ZK validity proofs, in order to reliably transition state and avoid the complex game theory of execution sharding or optimistic rollups.

We use Starkware's CAIRO language and prover to generate proofs that Nock programs have executed correctly, and those proofs can be cheaply verified on Ethereum and similar chains.

Proofs of transaction execution are currently written to Ethereum, where we will use a smart contract to unify them with the transactions themselves to enforce data availability.

How Zock Uses CAIRO

We currently prove Nock programs using CAIRO, Starkware's general-purpose ZK proving language. We are actively investigating other ZK systems and will use whatever works best. We love Starkware, and CAIRO is well-documented and accessible, which has made it great for initial proofs of concept.

CAIRO Performance Considerations

Programming in CAIRO is slightly unintuitive, because every computational step and memory allocation must be proved.

This creates a few performance challenges:

conventionally cheap operations, like mutating a value in memory, allocate a cell permanently
operations that have many computational steps must prove each of those steps
pre-existing state to be accessed must be fully loaded into the CAIRO program in a deterministic manner (very expensive for large state), or prove memory accesses are valid using a Merkle-like construction

CAIRO handles these challenges with "non-deterministic" programming. In this context, that simply means that it's often faster to verify the result of a computation than to compute it in the first place.

For example, the square root algorithm is relatively long, and computing the square root of 361 takes many steps. However, if you already know the answer is 19, you can prove that square_root(361) = 19 simply by showing that 19*19 = 361, which is just one multiplication operation.

Similarly, imagine that you need to access a leaf in a Nock noun (tree). There are two ways to do this:

Load the whole tree into memory, and navigate to the leaf (very expensive in CAIRO, since each leaf in the tree now must be proven)
Merkelize the noun, and provide a Merkle proof to the leaf. This requires more up-front work, but it can be done outside the prover, which is far, far faster.

Zock uses this Merkleization memory access technique heavily, as well as a few similar things.

Preprocessing for Non-Determinism

As alluded to above, in order to use non-determinism, we need to know the results of computations in advance. Zock does this by preprocessing all code it needs to prove, and producing hints that are passed to CAIRO. These hints are created for all computations, but the bulk of the preprocessing work is producing Merkle proofs for memory access, as well as detecting code that can be "jetted" (see below).

We use a metacircular Nock interpreter (an interpreter itself written in Nock) to preprocess code and generate hint files, which are just large JSON files fed to CAIRO. The metacircular interpreter allows us to use Urbit's native Nock interpreting capabilities while generating hints along the way.

Jetting & Standard Library

Any reasonably-performing Nock interpreter/prover needs "jets." "Jetting" is when the interpreter recognizes that a piece of code, run against a certain environment ("subject") can be evaluated more rapidly while producing the same result, and so takes that shortcut.

For example, the Nock implementation of decrement runs by starting at 0, and incrementing until reaching the argument to decrement. However, if the interpreter knows that decrement compiles to the code [8 [9 2.398 0 1.023] 9 2 10 [6 7 [0 3] 1 19] 0 2], it can match that pattern when it occurs, see that what's "really" happening is (dec 19), and just return 19 - 18. This is a slight oversimplification, but not much.

In order for jets to be reliable, their code must be committed to (via a hash) in both the preprocessor and CAIRO program. That way, we know that we are actually matching the decrement function, and not some malicious substitute, and can trust our jet recognition.

Because we have to register all jets in the CAIRO code, we are limited to a small standard library. This library is sufficient for smart contracts and Uqbar execution, but leaves out hundreds of function that are available in the normal Urbit OS. This allows us to significantly limit our auditing surface area, but does mean that it's completely impractical to prove arbitrary Nock code that uses the full range of the actual stdlib (without creating a new Zock that adds the omitted jets).

Caching Memory Lookups

All memory access in Nock is of the pseudocode form lookup(noun memory-slot), where noun is a tree of numbers and memory-slot is a number indicating which leaf in the tree to return. These lookups are completely deterministic and can thus be memoized.

We memoize by recording noun/memory-slot pairs during preprocessing. Any pairs that occur more than once are written to memory in CAIRO, and their index in memory is recorded in the hint for that operation. As a result, subsequent accesses of the value don't require Merkle proofs—we can just verify that the retrieved noun's Merkle root matches the one we are currently evaluating.

We can use this same memoization for subtrees: whenever we encounter a subtree/memory-slot lookup that is in our cache, we can short-circuit and return that value.

There are many other tricks like this that we will likely find in the course of optimizing Zock. Knowing a computation's results before proving it opens up many performance avenues.

Putting It All Together: Proving & Settlement

A Zock sequencer runs a bundle of transactions against our Uqbar execution engine, and submits the resulting program trace (currently) to the Starkware shared prover, SHARP, to be proved.

Once SHARP proves the code's execution, it posts a fact to Ethereum (currently the Goerli testnet). This fact is a hash of our program's code and its output. We write our program so that its output also commits to what the input was, and so can verify on Ethereum that state1 transitioned to state2 by checking that hash(program_hash, state1, state2) == fact.

In the initial live version of Uqbar, using Ethereum for settlement bootstrapping, we will have a state transition contract (STC) that operates as follows:

sequencer executes transaction bundle and sends to Starkware SHARP
SHARP proves and posts a fact on Ethereum
sequencer posts transactions and fact to STC
STC publishes transactions to Ethereum (for data availability) and updates the state from old-state to new-state, along with recording the Merkle root of transaction bundle.

Note that this is just our initial architecture. We will explore and evaluate other proving systems as well as options for settlement layers and data availability solutions as the project progresses, with the goal of eventually natively unifying everything in Uqbar.