Technical

Idiomatic Noir - Part 1: Collections

Core principles and patterns of working with collections in Noir.

June 20, 2024
Marek Kirejczyk
linkedin logo

As we deep dive into a fast-growing ecosystem and develop our own codebase in Noir, we recognize the need to define and use idiomatic Noir.

Idiomatic code follows the conventions and best practices of a particular programming language. Hence, idiomatic code has a big impact on readability and maintainability.

Noir is inspired by Rust, allowing us to borrow some of its idioms. However, the Noir community will still need to develop certain language constructs and patterns.

This article is just the beginning. We will share more insights from our development team here. Follow us on X so you do not miss the next parts.

Rationale

It is hard to overemphasize the value of readable and maintainable code. Here are the top reasons to consider in the context of Noir:

  1. The language is young and will evolve, so the ecosystem codebases must keep up with its developments.
  2. Security:
    • Readable code is less likely to contain bugs.
    • Bugs in readable code are more likely to be found via audits and bug bounties.
  3. Most idioms are for free or very cheap:
    • We assume Noir, after Rust, inherits the idea of zero-cost abstractions. Hence, many constructions that make code more readable are free.
    • The language will evolve, and some more expensive features may become more efficient over time. Therefore, we should resist the temptation of premature optimizations, especially for boilerplate code. However, this doesn't apply to low-level cryptographic or computation-heavy functions, which may necessitate "hacky" code in early versions.

Below is an analysis of a fictional JSON parser implementation, highlighting idioms, patterns, and refactorings encountered while working with collections in Noir.

Simple JSONParser

Consider the following JSON parsing function:

global MAX_INPUT_LEN = 1024;
global MAX_JSON_NODE_LEN = 1024;

struct JsonNode {
  offset: u64,
  len: u64
}

fn parse_json(input: [u8; MAX_INPUT_LEN], input_len: u64) -> ([JsonNode; MAX_JSON_NODE_LEN], u64) {
    ...
}

Note one important limitation inherited by Noir from PLONK implementation. Array length information usually comes in pairs. First, we have the array's MAXIMUM length and then the actual length. Therefore, we need two arguments:

  • input: the array itself - which already has a fixed maximum length set by a global
  • input_len: which holds an actual count of items.

Also, the returning type has to be a tuple:

  • an array that has the upper bound on the number of nodes to be returned
  • a collection size, which can vary depending on the input

We can improve the code above by using BoundedVec instead of Arrays. It stores both the max length (as a generic parameter) and the actual length.

Below is the updated function:

fn parse_json(input: BoundedVec<u8, MAX_INPUT_LEN>) -> BoundedVec<JsonNode, MAX_JSON_NODE_LEN> {
    ...
}

There is another problem to notice in this code. It works only up to specific input sizes defined in a global. Regardless of the input size, proving always uses the same - maximum - amount of computation power.

To solve this problem, we will introduce generic parameters instead of constants.

Refactoring: Introduce generic parameter

The code after the first iteration of improvement could look like this:

struct JsonNode {
  offset: u64,
  len: u64
}

fn parse_json<MAX_INPUT_LEN, MAX_NODES>(input: BoundedVec<u8, MAX_INPUT_LEN>) -> ([JsonNode; MAX_NODES], u64)
 {
    ...
}

However, a careful reader might sense something is still off. Let's examine an example of how to use the function above:

  let (nodes, len) = parse_json(input);
  for (let i: u64 = 0; i < MAX_NODES; i++) {
    if (i < len) {
        doSth(node[i]);
    }
  }

This is a familiar pattern we found in the code using loops. It consists of iterating over a fixed predefined size, and a nested if-statement restricts iterations to the dynamic size.

We can see a leaking abstraction. A simple dynamic loop takes 5 lines of code. It’s not DRY nor readable. Let’s improve that.

Refactoring: Introduce Iterable

A popular way to abstract iteration is to introduce Iterable. Here is the boilerplate code we need to apply the pattern:

trait Iterable<T, MAX> {
    let len: u64;
    fn each(f: fn (T) -> ());
    fn map<R>(f: fn (T) -> R) -> R;
    ...
}

impl<T, MAX> Iterable for BoundedVec<T, MAX> {
    ...
}

fn parse_json<MAX_INPUT_LEN, MAX_NODES>(input: Iterable<u8; MAX_INPUT_LEN>)
  -> Iterable<JsonNode, MAX_NODES> {
    ...
}

And example usage:


  let node = parseJSON(input);
  node.each(|node| {
      doSth(node)
  });

Generating proofs

Now, let's put our JSON parser into production. Suppose we need to parse content coming from a user, consume API content from a browser, handle OAuth data, or similar tasks. Often, this data comes in various sizes.

Creating a single large circuit capable of handling all sizes would be inefficient. It would require paying the computational costs of processing the largest possible input, even for the smallest practical input sizes.

Instead, we can generate several provers of different sizes, such as S, M, and L.

For example, the distribution of input data from users might resemble this:

  • 80% is S or smaller, proving time 10 sec
  • 18% is M or smaller, proving time 1 min
  • 2% is L or smaller, proving time 10 min

So, I don't want to use an L-sized prover for S-sized input data.

Hence, I need to generate three provers. The only way to do this now is to create a library with core functionality using generic parameters. Then, I need to create three projects with the main function, each containing a single line that calls the generic functionality with different parameters.

Lib:

fn do_json_stuff<MAX_INPUT_LEN, MAX_NODES>(input: Iterable<u8, MAX_INPUT_LEN>) {
    ...
}

Main 1:

fn main(input: [u8; 1000]) {
    do_json_stuff<1000, 10>(input);
}

Main 2:

fn main(input: [u8; 10000]) {
    do_json_stuff<10000, 50>(input);
}

Main 3:

fn main(input: [u8; 100000]) {
    do_json_stuff<100000, 100>(input);
}

Note that we assume a similar distribution of two variables MAX_INPUT and MAX_NODES. Those distributions can vary a lot. In practice, I could easily imagine a matrix 5x5 (i.e., (XS,XS), (XS, S), (S, XS), ..., (XL, XL)) of generic parameters. That would produce 25 provers, which could be a reasonable optimal computational resources optimization strategy.

Prover generation tool

We propose to implement a tool and, in the future, a feature of the language that would allow us to generate multiple provers more simply. It would leverage annotations to generate multiple subprojects. 

Here is an example:

#[prover(MAX_INPUT_LEN=1000, MAX_NODES=10)]
#[prover(MAX_INPUT_LEN=10000, MAX_NODES=50)]
#[prover(MAX_INPUT_LEN=100000, MAX_NODES=100)]
fn doFancyJsonStuff<MAX_INPUT_LEN, MAX_NODES>(input: Iterable<u8; MAX_INPUT_LEN>) {
    ...
}

In this case, Nargo would generate multiple provers for multiple directories.

Ideal scenario: Initially, an external tool could be built to generate a subproject as an initial proof of concept. If the tool is deemed valuable by the community, the functionality can be moved to Nargo.

Consequently, a developer needs to manage multiple verification keys, which I believe is an acceptable short-term solution. It would be great to have one "master" verification key in the future.

Summary

The reasoning presented above encapsulates the core principles of working with collections in Noir. We hope it demonstrates how to work efficiently with Noir today and highlights important directions for the language's future development. 

What are your thoughts on our approach? Do you have any suggestions or insights to share? Let us know on X.

Share the article
linkedin logo