How to prove that you are 21 and not reveal anything else

Tricks with Merkle Trees

9 min readFeb 4, 2022

Classic identity principles

On a sunny Monday morning, Alice decides to go to the store to get some tequila. Naturally, as soon as she chooses the required product and is going to pay for the purchase, she needs to prove to the seller that she is already 21 years old. The proof procedure in this case is quite simple: Alice shows the seller her passport / ID card (hereinafter “identity”), the seller checks Alice’s date of birth, and if everything is ok, sells the product.

In current conditions the seller trusts the authority that issued Alice her passport (let’s call that authority “Trust”), and, accordingly, Alice’s date of birth (indicated in the passport): this information is enough to validate the purchase.

Now let’s figure out what the problem is. In fact, along with disclosing her date of birth and photo (the only information necessary to make the purchase), Alice also — involuntarily — disclosed:

name, surname;
gender;
possibly — address of residence;
other information that the seller doesn’t need to know.

*Disclosure of such unnecessary information may negatively affect Alice. Imagine that the seller, now knowing Alice’s address of residence, crashes her Friday party, or begins stalking her with declarations of love, or tells Alice’s employer that Alice bought several bottles of tequila on a Monday morning thus raising suspicion that Alice is a habitual heavy drinker. Each of these scenarios is negative and harmful for Alice, but could have been easily avoided if Alice disclosed only the data directly relevant to her purchase, that is, date of birth and photo.

In this particular scenario, Alice simply has no choice but to disclose that additional data, which in return creates a dilemma. On the one hand, every country has a legal requirement to provide to sellers of adult products a valid ID issued by that country’s authorities since the seller must be able to confirm both the purchaser’s identity and the originality and integrity of the document (including issuing authority aka Trust and presence of required watermarks). On the other hand, Alice cannot obfuscate or hide all other data contained in her ID so that the seller could see only the date of birth, thus involuntarily revealing all other data in the ID that is not directly relevant to her transaction.

Digital identity

Let’s discuss the same situation in the digital world. Suppose Alice wants to subscribe to a newsletter from a club of beautiful and adults. To do this, the site administrator must verify that Alice is an adult and has reached the required legal age (say, 21 years). We also assume that Alice has a digital passport (let’s call it Alice’s digital identity) which contains data about her. Unlike a physical passport, a digital identity (in the common sense) is just a byte array of data.

As in the physical world, the administrator must verify that the passport is issued by Trust. But while in the physical world this fact can be confirmed by the signature / seal of Trust, in the digital world that signature / seal is replaced by the digital signature that Trust calculated for Alice’s digital identity (using their own keypair).

First, suppose Trust hashed the entire byte string and signed it with their private key and passed the signature to Alice.

When Alice proves to the adult club’s website administrator that she is 21 years old, she transfers to the administrator all of her digital identity data along with Trust’s signature. The administrator checks Alice’s data, as well as the validity of the signature (the public key of Trust is commonly known).

It turns out that we are facing the same problem — Alice revealed data that was not necessary. Moreover, if she provides only a part of the identity data, the administrator will not be able to verify the validity of Trust’s signature (since the signature covers the entire byte string).

Deeply to privacy

However, in the digital world there are quite a few math tricks that can provide very interesting properties for information. In this case, we will use the concept of binary hash value trees proposed by Ralph C. Merkle in 1979, known simply as Merkle trees*.

Suppose that Trust does not sign the hash value of Alice’s complete byte string of data, but rather separates it into several logical components which are further used to construct the Merkle tree. In this case, Alice’s name, surname and other information appear as “leaves” of the tree.

Trust only signs the root value and returns it to Alice along with the signature.

How can Alice prove to the administrator that she is 21 using the proposed scheme? To do this, she gives them the value of the Merkle Root along with the signature of Trust, a set of data that confirms her legal age (date of birth) and Merkle Branch as proof that the provided data set is part of the general structure of the tree. Merkle Branch values are shown in red in the image below.

Now Administrator can verify Alice’s legal age easier in only three stages

The site administrator first subtracts the date of Alice’s birth from the current date and if the result is >= 21 years old, the first stage of validation is passed.
The administrator then calculates the hash value of Alice’s birth date and uses it and the Merkle Branch for sequential pairwise concatenation and hashing of values among themselves in order to obtain a single hash value (the Merkle Root). It then compares the obtained value with the Merkle Root transmitted by Alice and if the values are identical, the second stage of validation is passed.
The administrator then verifies Trust’s signature. If it is correct, the validation procedure is completed and Alice can receive a service.

There is, however, a problem that needs to be solved when using this scheme. It is related to the simplicity of finding the preimage of a hash function for small values by enumeration. Let’s take a quick look at a simplified example: Alice’s identity contains two components: last name and date of birth. When Trust signs Alice’s identity, it signs the root of the tree with two leaves:

If Alice wants to prove that her last name is Black, she provides to the verifier her last name, the Merkle Root with the signature of the Trust and the Merkle Branch, which in this case is represented by the value of H02. Verification is performed exactly as in the example above.

However, a verifier with a value of H02 can try to find the initial value by enumerating (in this case, it will take several seconds) — for each of the possible values, it calculates the hash value and compares it with the existing one until it matches.

Humanity came up with a solution to this problem a long time ago, when application servers began storing, instead of user passwords themselves, hash values of their passwords with additional salt value. The same approach can be applied to this case. When forming Alice’s identity, a randomly generated salt value can be additionally used for each data block. Trust, in turn, will sign the value of the root of the tree, taking into account the salt used.

Now, when proving her identity, Alice must additionally provide the salt value used for a particular data block. But at the same time, the verifier will be able to access only the information that is defined by Alice.

There may arise a case when a verifier requires several elements of the user’s personal data — for example, date of birth plus full name plus residence address. In this case, if Alice sends a Merkle Branch for each data fragment, some values of the nodes of the Merkle tree may overlap, resulting in redundancy (this increases the total proof size). To solve this problem, a multiple authentication scheme (like Octopus Authentication) can be applied to the Merkle tree. This particular authentication scheme allows creating a Merkle Branch while avoiding redundancy through increasing the number of authentication nodes, in return reducing the signature size. This authentication scheme is part of the Gravity-SPHINCS hash-based signature scheme which is a heavily optimized version of the original SPHINCS scheme*. Below is a graphic example of Octopus Authentication application to a Merkle tree.

Role in digital identity infrastructure

The goal of the digital identity infrastructure is to replace the trusted Trust with a decentralized infrastructure, which (in our own opinion) implies application of the following principles:

There are no centralized identity providers; instead, every individual using the digital identity infrastructure may be both an identity provider and a user whose data is verified;
multiple identity providers verify identities and attributes of users;
each identity provider is responsible for confirming proof of identification during any identity (data verification) procedure;
all events related to any digital identity are potentially publicly auditable by everybody, while at the same time user data is kept confidential in a separate secure database the contents of which are distributed among separate providers;
all events with digital identities are time stamped and recorded in a particular order, ensuring that integrity and authenticity of the events are also publicly verifiable;
the more identity providers independently confirmed identity and a time stamp, the better.

In other words, Alice herself is responsible for the formation of her own identity and all the attributes associated with it. The task of identity providers is only to independently verify these attributes.

At the same time, Alice must be able to give to various providers only those attributes that she considers necessary. However, the task is to confirm that each of these attributes separately and all of these attributes collectively are in fact related to the same identity.

The advantage of the proposed scheme is that now Alice independently creates a tree of attributes and sends identity providers the root value and the corresponding Merkle Branch for the attributes that need to be verified, without having to provide or verify any data that is unnecessary for any particular transaction.

Identity providers, in turn, form confirmations of a set of attributes that belong to a single Merkle Root value. The transaction of identity confirmation should include:

provider’s identifier;
the set of data that has to be confirmed;
expiration date;
provider’s signature.

In life, a procedure like this can go through providers who already have a certain weight in society. For example, Alice can receive confirmation of personal data from the tax office, confirmation of higher education from the university, and health status from the clinic. In this case, when applying for a new job, Alice can transfer the necessary data and the recipient (e.g., employer) can check that these attributes of Alice’s identity were confirmed by the responsible parties.

This scheme allows Alice to provide only the data that is necessary for confirmation / verification, without disclosing the remaining attributes. Moreover, all confirmations of this data are tied to a single entity that connects the attributes (the existence of the latter in the structure is quite simple to check mathematically).

Pavel Kravchenko: pk@distributedlab.com
Oleksandr Kurbatov: ok@distributedlab.com

How to prove that you are 21 and not reveal anything else

Tricks with Merkle Trees

Written by Distributed Lab Academy