In this blog post series, we are exploring the idiomatic code of Noir. In case you missed the first part, check it out here.
Today, we will focus on how to deal with data parsing. Sounds interesting? Let’s dive in.
A common routine in a circuit is to parse data in a particular format (e.g., Base64, RLP, JSON) from an external source (e.g., email, blockchain, OAuth response, web). Then, verify its content against cryptographic primitives like hash, signature, Merkle proof, etc.
To do this, we usually need to do two things:
The majority of ZK projects do all of the above daily. This includes ZK Email, ZK Login, ZK Pass, and many others.
A fun fact is that encoding and decoding can be interchangeable in ZK circuits. After comparing the encoded and decoded data, the circuits essentially just return a boolean value.
Therefore, we usually only need to encode one of the two routines. Then, we can choose the cheaper one or the one that is easier to implement. Below are typical reasons why decoding is the preferred option:
Hence, the focus of this post is on decoding.
Let’s go back to the JSON parser example from part 1 of this blog post:
fn parse_json<MAX_INPUT_LEN, MAX_NODES>(input: Iterable<u8; MAX_INPUT_LEN>)
-> Iterable<JsonNode, MAX_NODES> {
...
}
We will now focus on what the type of JsonNode should look like. For simplicity, we will start with a function that only parses an array of strings. So we will narrow down inputs to the following: ["zk," "is," "awesome,"..., "hard"].
With these assumptions, we could rewrite our function to look something like this:
global MAX_STRING_LEN = ...;
type JSONString = [u8; MAX_STRING_LEN];
fn parse_json_string_array<MAX_INPUT_LEN, MAX_NODES>
(input: Iterable<u8; MAX_INPUT_LEN>) -> Iterable<JSONString, MAX_NODES> {
...
}
However, this introduces a few new problems:
We can improve our example code by introducing a Fragment type that only stores a reference to the original string and the offset and length of the parsed input.
struct JSONFragment<MAX_LEN> {
offset: u64;
length: u64;
data: [u8; MAX_LEN]
}
fn parse_json_string_array<MAX_INPUT_LEN, MAX_NODES>(
input: Iterable<u8; MAX_INPUT_LEN>)
-> Iterable<JSONFragment<MAX_INPUT_LEN>, MAX_NODES> {
...
}
Two things to notice here:
Generalized implementation could look like this:
struct Fragment<T, MAX_LEN> {
offset: u64;
length: u64;
data: [T; MAX_LEN]
}
fn parse_json_string_array<MAX_INPUT_LEN, MAX_NODES>(
input: Iterable<u8; MAX_INPUT_LEN>)
-> Iterable<Fragment<u8; MAX_INPUT_LEN>, MAX_NODES> {
...
}
A careful reader might notice that there is some performance degradation leaking. Consider the following code using the Iterable trait from the previous part:
let input: [u8; 10000] = "...";
let json_nodes: [u8; 20] = parse_json_string_array(input);
json_nodes[0].each(|c: u8| {
});
The loop above will now iterate over 10000 elements. The parsed string may often be shorter than the entire input. Can we do something about this? We may introduce a new method that limits the number of iterations. This would work if known at the compilation time:
impl<NEW_MAX_LEN> Fragment {
fn focus<NEW_MAX_LEN>(&self) -> Fragment<NEW_MAX_LEN> {
...
}
}
Now, let’s make sure that the nested loop is optimal:
let input: [u8; 10000] = "...";
let json_nodes: [u8; 20] = parse_json_string_array(input);
let short_str: [u8; 10] = json_nodes[0].focus()
short_str.each(|c: u8| {
};
The revised loop above will now iterate over just 10 elements.
The natural name for the structure that holds the reference to an array, offset, and length is referred to as the slice. The slice is already used as a type in Rust, and it has very similar properties. However, the Noir Slice is actually closer to a list. It has cheap operations to add elements at the beginning, the end, and in the middle. So we would suggest using the name *List* for it and creating a space to rename Fragment to Slice. Alternatively, we could extend the Slice to match the functionality of the Fragment.
Now, we can model a generalized JSON parser function:
global JSON_TYPE_STRING = ...
global JSON_TYPE_INTEGER = ...
global JSON_TYPE_ARRAY = ...
...
struct JSONNode {
fragment: Fragment;
type: u64;
}
fn parse_json<MAX_INPUT_LEN, MAX_NODES>(
input: Iterable<u8; MAX_INPUT_LEN>)
-> Iterable<JSONNode<u8; MAX_INPUT_LEN>, MAX_NODES> {
...
}
Which could be used in the following way:
let nodes = parse_json(...);
nodes.each(|node| {
if (node.type == JSON_TYPE_STRING) {
let small_str:[u8; 10] = node.focus()
doStringyStuff(small_str.fragment)
} else if (node.type == JSON_TYPE_INTEGER) {
doIntegeryStuff(node.fragment.into())
} else if (node.type == JSON_TYPE_ARRAY) {
let inner_nodes = parse_json(node.fragment);
...
}
});
Again, a few things to notice here:
A typical circuit design might go something like this:
Let’s examine the pseudo-code below:
struct Decoded {
field_1: u64
field_2: [u8; 32]
...
field_n: bool
hash: [u8; HASH_LEN]
}
fn main(input: data[u8; LEN], expected: Decoded) {
let actual: BoundedVec<Fragments> = parseEncoded(input);
assert(actual.hash == hash(input));
assert(actual.field_1.into() == expected.field_1);
assert(actual.field_2.into() == expected.field_2);
...
assert(actual.field_n.into() == expected.field_n);
}
Two things to point out here:
There is a lot to improve. In this example, we make many decisions during the execution. The parser is generic and does not benefit from the specific structure of JSON. In the case of recursive parsing, the parser has multiple passes. There are multiple passes. This can be mitigated by writing custom parsers for specific use cases. However, they can preserve the principles proposed in this document.
In the future, our ultimate goal is to develop serialization and deserialization which would allow direct parsing into Noir structures. This requires a more robust macro system in Noir, which we hope will be possible in the future.
Let us know your take on that on X.