Understanding Key Data Structures in eMRTDs
This document outlines the core data elements and structures involved in the security mechanisms of electronic Machine Readable Travel Documents (eMRTDs), such as ePassports. It details the journey from the physically printed data to the digitally signed components on the chip.
1. Machine Readable Zone (MRZ)
- Definition: The MRZ is the set of lines printed in a machine-readable font on the data page of a travel document. It's designed for optical character recognition (OCR) by document readers.
- Formats (e.g., TD3):
- ICAO specifies different MRZ formats for various document types. A common one for passports is
TD3. - A
TD3MRZ consists of two lines, each 44 characters long, totaling 88 characters. These characters encode key biographic and document information. - Other formats like
TD1(e.g., ID cards, 3 lines of 30 characters) andTD2(other card-format documents, 2 lines of 36 characters) also exist.
- ICAO specifies different MRZ formats for various document types. A common one for passports is
- Structure: Composed of uppercase alphanumeric characters (A-Z, 0-9) and the filler character
<. The content and position of data elements within these lines are strictly defined by ICAO Doc 9303. - Purpose: Provides a standardized way for automated systems to quickly capture essential holder and document information.
2. Data Group 1 (DG1)
- Definition:
DG1is an elementary file stored on the eMRTD's integrated circuit (chip). It is the electronic representation of the data contained in the physical MRZ. - Relationship to MRZ: The content of
DG1directly mirrors the data from the MRZ. For a document with aTD3MRZ,DG1will store those 88 characters. - Structure:
- The core data within
DG1is typically an ASN.1IA5Stringcontaining the concatenated lines of the MRZ. - The actual
EF.DG1file on the chip is ASN.1 encoded. This means the 88-character string (forTD3) is wrapped with an ASN.1 tag (identifying it asDG1, typically61hex) and a length field. For an 88-characterTD3MRZ, this commonly results inDG1being 93 bytes long on the chip (e.g.,61 5B 5F 1F 58 <88 bytes of MRZ data>).
- The core data within
- Purpose: Allows electronic access to the MRZ information, forming a foundational data element for security checks performed by inspection systems.
3. LDS Security Object (SOd or LDSSecurityObject)
- Definition: The
SOd(Security Object document), more formally referred to asLDSSecurityObjectin some ASN.1 definitions, is a critical data structure stored on the eMRTD chip. Its primary function is to ensure the integrity of the various Data Groups (likeDG1,DG2for the photo, etc.). - Relationship to DG1 (and other DGs): The
SOdcontains cryptographic hashes (digital fingerprints) of specified Data Groups. For instance, it will include a hash calculated over the entire content ofDG1. - Structure (Conceptual ASN.1): The
LDSSecurityObjectis an ASN.1SEQUENCEtypically containing:version: An integer indicating the version of theSOdstructure (e.g., 0).hashAlgorithm: AnAlgorithmIdentifier(an OID) specifying the cryptographic hash algorithm (e.g., SHA-256, SHA-384) used to compute the digests of the Data Groups.dataGroupHashValues: ASEQUENCE OForSET OFstructures, often namedDataGroupHashorDatagroupDigest. Each of these structures within the list contains:dataGroupNumber: An integer identifying the Data Group (e.g., 1 forDG1, 2 forDG2).dataGroupHashValue: AnOCTET STRINGrepresenting the cryptographic hash (digest) of the content of the specified Data Group.
- Encoding: The entire
LDSSecurityObjectstructure is DER (Distinguished Encoding Rules) encoded before being processed further or stored. - Purpose: To enable verification that the Data Groups stored on the chip have not been altered since the
SOdwas created and signed. - Important remark: DG1 is always at the first location, the prefix length is deterministic. The total size is dependent on the digest algorithm size and how many data groups exist.
4. SignedAttributes
- Definition:
SignedAttributesis a specific ASN.1 structure defined within the Cryptographic Message Syntax (CMS, RFC 5652). In the context of eMRTDs, it's a collection of attributes that are DER-encoded and then become the actual data that is digitally signed by the issuing authority's Document Signer Certificate (DSC). - Relationship to
SOd: TheSignedAttributesstructure includes a cryptographic hash of the entire DER-encodedSOd. - Structure (Conceptual ASN.1, within CMS
SignerInfo):SignedAttributes ::= SET SIZE (1..MAX) OF Attribute- Each
Attributeis aSEQUENCEof:attrType OBJECT IDENTIFIER: Identifies the type of attribute.attrValues SET OF AttributeValue: Contains the value(s) for that attribute.
- For eMRTD
SOdsigning, criticalSignedAttributesinclude:contentType(OID:1.2.840.113549.1.9.3): TheAttributeValuefor this is an OID. For an eMRTDSOd, this value isid-icao-ldsSecurityObject(OID:0.4.0.127.0.7.2.2.1), indicating that the signed content's "type" is an ICAOSOd.messageDigest(OID:1.2.840.113549.1.9.4): TheAttributeValuefor this is anOCTET STRINGcontaining the cryptographic hash of the entire DER-encodedSOd.- (Often)
aaSigningCertificateV2(OID:1.2.840.113549.1.9.16.2.47): An ICAO-defined attribute whose value (anESSCertIDv2structure) contains a hash of the Document Signer Certificate, linking the signature back to the specific certificate.
- Encoding: The collection of these attributes is itself DER-encoded to form the octet string that is input to the digital signature algorithm.
- Purpose:
- To ensure that the digital signature applies to a specific, well-defined set of information, including the hash of the
SOdand the type of content. - This prevents replay attacks or misinterpretation of the signature's scope. The signature on
SignedAttributescryptographically binds these attributes to theSOd.
- To ensure that the digital signature applies to a specific, well-defined set of information, including the hash of the
- Important remark: The size of the
SignedAttributesis determined by the digest algorithm size and the algorithm's encodedOIDlength.
Summary of Relationships
The data elements are processed and related in a layered manner for security:
- MRZ (e.g., TD3): Printed on the physical document.
- DG1: Electronic copy of the MRZ data, stored on the chip.
SOd(LDSSecurityObject): Contains a list of Data Group numbers and their corresponding cryptographic hashes (e.g., hash of DG1, hash of DG2).SignedAttributes: Contains a hash of the entireSOd(after DER encoding), along with other metadata likecontentType.
DG1 → LDS
We need to first SHA-digest DG1, in order to prove that it exists in the expected position in LDS in later steps.
This step solely digests the DG1.
Implicit Variance
There needs to be a separate circuit for each (DG1 variant, SHA variant) pair.
| TD1 | TD2 | TD3 | |
|---|---|---|---|
| sha-224 | |||
| sha-256 | ✔️ | ||
| sha-384 | |||
| sha-512 | ✔️ |
Private Inputs
| Input | Type |
|---|---|
DG1 | Bytes |
Public Output
carry is Poseidon-digest of SHA-digest of DG1.
right is DigestState.init with carry included.
| Left | Right |
|---|---|
Poseidon-digest of DigestState+carry |
LDS → SignedAttrs
This step needs to do/check the following:
DG1SHA-digest is indeed inside the correct place inLDSLDSis SHA-digested and is propogated to the next step.
Since LDS can be large enough to SHA-digest in a single circuit, we are making use of our multi-circuit digest to SHA-digest it.
We prepare:
- 0 or more update circuits with full blocks
- 1 finalize circuit with a dynamically-sized block
- The actual
LDSvalidation circuit described below.
The carry of the digest is the Poseidon-digest of SHA-digest of DG1, which is checked against previous step's out.right, by Poseidon-digesting the input SHA-digest or DG1.
Implicit Variance
There needs to be a separate circuit for each SHA variant.
| sha-224 | |
| sha-256 | ✔️ |
| sha-384 | |
| sha-512 | ✔️ |
Private Inputs
| Input | Type |
|---|---|
DG1 SHA-digest | Bytes |
DigestState | DigestState |
LDS | DynamicBytes |
Public Output
| Left | Right |
|---|---|
Poseidon-digest of the last DigestState | Poseidon-digest of SHA-digest of LDS |
SHA-digesting SignedAttrs
We prove that SHA-digest of LDS is at the correct place in the SignedAttrs payload.
We then digest the SignedAttrs for the signature verification for the next step.
Implicit Variance
The digest algorithm of the DG1 → LDS and LDS → SignedAttrs must be the same SHA size. However, the signing of the SignedAttrs may use a different SHA size.
Therefore there needs to be a separate circuit for each pair of SHA variants.
| lds \ sign | sha-224 | sha-256 | sha-384 | sha-512 |
|---|---|---|---|---|
| sha-224 | ||||
| sha-256 | ✔️ | |||
| sha-384 | ||||
| sha-512 | ✔️ | ✔️ |
First SHA size determines the size of SignedAttrs, also the offset of the SHA-digest of LDS in it.
Private Inputs
| Input | Type |
|---|---|
LDS digest | Bytes |
SignedAttrs | Bytes |
Public Output
| Left | Right |
|---|---|
Poseidon-digest of SHA-digest of LDS | Poseidon-digest of SHA-digest of SignedAttrs |
Verifying Local Signature
Public Output
| Left | Right |
|---|---|
SHA-digest of SignedAttrs | Poseidon-digest of Pubkey of local certificate in serial form |
Implicit Variance
First dimension of variance stems from the SHA variant that needs to be used on the left.
Second dimension is based on the signature algorithm itself. For example, for RSA with key size of 2048, we need a exponentiation proof and one verification proof which includes one of the two supported padding schemes.
The signature verification stage can be consisted of one or more proofs depending on the document requirements.
Verify Local - secp256r1
Private Inputs
| Input | Type |
|---|---|
digest state | DigestState |
iteration | StaticArray<BlockN> or DynamicArray<BlockN> |
where iteration holds the chunk to be digested
Public Output
| Left | Right |
|---|---|
Poseidon-digest of DigestState | Poseidon-digest of the next DigestState |
We support all four SHA variants from the left stage, and it takes a single proof to do signature verification. Poseidon-digest of uncompressed pubkey is on the right.
Check out the explanation for EC here.
| SHA variant | |
|---|---|
| sha-224 | |
| sha-256 | ✔️ |
| sha-384 | |
| sha-512 |
Other SHA variants are trivial to add.
RSA Verification - 2048 @ Local
You can check out here for an explanation of how we handle RSA in general.
Exponentiation Circuit
Private Inputs
| Input | Type |
|---|---|
modulus | ProvableBigint2048 |
signature | ProvableBigint2048 |
exponent | Field |
| Poseidon-digest of SHA-digest of message | Field |
Exp2048_Output
| Fields | Type |
|---|---|
result | ProvableBigint2048 |
modulus | ProvableBigint2048 |
signature | ProvableBigint2048 |
exponent | Field |
| Poseidon-digest of SHA-digest of message | Field |
Public Output
| Left | Right |
|---|---|
| Poseidon-digest of SHA-digest of message | Poseidon-digest of Exp2048_Output |
Padding & Verification Circuit - PSS
Private Inputs
| Input | Type |
|---|---|
expOut | Exp2048_Output |
messageShaDigest | Bytes |
encodedMessage | Bytes |
encodedModulus | Bytes |
Public Output
| Left | Right |
|---|---|
Poseidon-digest of Exp2048_Output | Poseidon-digest of uncompressed Pubkey and Contains.init |
Implicit Variance
| pss | pkcs | |
|---|---|---|
| sha-224 | ||
| sha-256 | ✔️ | |
| sha-384 | ||
| sha-512 |
Covering all of the variants is trivial.
Utilities
Multi-Circuit Digest
The ability of digesting dynamically sized payloads using SHA algorithm is provided by the excellent Mina Attestations library. We are using a fork to expose some of the needed functionality, using the same underlying code.
SHA digesting a long payload like LDS or a x509 certificate do not fit into a single circuit. Therefore, we use a scheme of Update & Validate to calculate the digest value.
Poseidon-digest is way cheaper than SHA-digest in circuits.
Note: we are carrying a
carryfield so that we can plug this in our pipeline, carrying the previous leaf's output.
Digest State Data Type
| Input | Type |
|---|---|
len | SHA size |
state | StaticArray(UIntN, M) |
commitment | Field |
carry | Field |
Update
For each update step, we are digesting a chunk of values, which is a couple of SHA blocks, at the same time with both SHA and Poseidon algorithms. Between each update, the digest state is always fed to the circuit.
Only the current state and the chunk is fed to a update circuit, not the whole payload.
Private Inputs
| Input | Type |
|---|---|
digest state | DigestState |
iteration | StaticArray<BlockN> or DynamicArray<BlockN> |
where iteration holds the chunk to be digested
Public Output
| Left | Right |
|---|---|
Poseidon-digest of DigestState | Poseidon-digest of the next DigestState |
Validate
We give the whole payload as well as the latest digest state which contains both digest algorithms state inside, to the circuit.
The circuit Poseidon-digests the whole payload again to check if the previous steps processed the correct payload chunks in correct order. If all is well, it returns the SHA-digest output.
This step happens inside the "caller" circuit since on its own does not mean much, only meaningful alongside other validations from the application logic.
Required Inputs
| Input | Type |
|---|---|
digest state | DigestState |
payload | DynamicBytes |
carry | Field |
Elliptic-Curve Verification
Private Inputs
| Input | Type |
|---|---|
| SHA-digest of message | Bytes |
| Pubkey in serial form | Bytes |
| Signature | Two tuples of Fields, r and s |
Public Output
| Left | Right |
|---|---|
| Poseidon-digest of SHA-digest of the message | Poseidon-digest of Pubkey in serial or circuit form |
Implementation Details
Firstly, we parse the pubkey in uncompressed serial form into x and y components, in circuit. For secp256r1, this corresponds to 65 bytes into 6 fields in total.
The in-circuit verification uses the x and y coordinate form, unlike the certificates where they use the uncompressed elliptic curve public key point format.
The reason for using the pubkey in serial form here is, later we will need to find this in a certificate, we need to know they are the same publickey. This requires verifying this conversion at least once, we decided to do it in this step.
A similar conversion is done to turn the SHA-digest of message into scalar format as well, similarly in-circuit.
We verify the signature, which is computationally intensive, and return.
Implicit Variance
There needs to be a circuit per (SHA variant, EC variant) pair.
Also, since the context of the right differs for the Local and Master signature verification, we need to have a circuit for each as well.
| secp256r1 | secp384r1 | brainpoolP256r1 | brainpoolP384r1 | brainpoolP512r1 | |
|---|---|---|---|---|---|
| sha-224 | |||||
| sha-256 | ✔️ | ||||
| sha-384 | |||||
| sha-512 |
Currently, we only have the implementation of secp256r1 for the near future, the other variants are not supported by o1js yet.
We implemented sha-256 and the others are trivial to add.
RSA Verification
We have covered all of the RSA passports with our approach.
Implicit Variance
First is the key size. For exponention, this changes how many proofs will it take. 2048 does it in one circuit, while 4096 requires two.
The second is the padding scheme of the signature, we support two: PSS and PKCSv1.5.
The last variation is an incidental one, the SHA variant used in the previous stage.
Right now, the implementation covers:
| 2048 | 3072 | 4096 | 6144 | |
|---|---|---|---|---|
| sha-224 | ||||
| sha-256 | pss@local | |||
| sha-384 | ||||
| sha-512 | pkcs@master |
Covering all of the variants is trivial.
Implementation Details
We have a custom variable-size BigInt implementation that focuses on efficient exponentiation.
First we do exponentiation in one or more circuits.
Then, we do PSS or PKCS padding (depending on the document) and verification in an additional circuit.
The right output is either Poseidon-digest of uncompressed pubkey or in-circuit pubkey, depending on if its local or master verification.
Glossary
- NFC: Near Field Communication
- OCR: Optical Character Recognition
- ASN1: Abstract Syntax Notation One.
- MRZ: Machine Readable Zone (read data with OCR)
TD3 example:P<GBRBAGGINS<<FRODO<<<<<<<<<<<<<<<<<<<<<<<<< P231458901GBR6709224M2209151ZE184226B<<<<<18 - eMRTD: Electronic Machine Readable Travel Document (read data with NFC)
- Contains several data groups each encoded in ASN1 format.
- x509: A standard of certificates for digital signatures and more, uses ASN1 format.
- Masterlist: List of trusted root x509 certificates, issued to countries for eMRTD.
- ICAO: International Civil Aviation Organization, distributes masterlists for eMRTD.
- DG1: Data Group 1 of eMRTD data, exactly the same as MRZ information.
- CMS: Cryptographic Message Syntax, used to represent a payload, signature, and an x509 certificate in ASN1 format.
- SOD: Document Security Object of eMRTD data, contains a CMS formatted data (hashes of data groups enveloped) with an x509 certificate.
- o1js: TypeScript framework for zk-SNARKs and zkApps
- zkProgram: Creates off-chain zero-knowledge proofs with o1js.
- npm: Node Package Manager, a registry for node.js and javascript packages.
- npm: Node Package Manager, a registry for node.js and javascript packages.