key: cord-0059416-fnh825zv
authors: Klinec, Dusan; Matyas, Vashek
title: Privacy-Friendly Monero Transaction Signing on a Hardware Wallet
date: 2020-08-01
journal: ICT Systems Security and Privacy Protection
DOI: 10.1007/978-3-030-58201-2_23
sha: 5e263b0f371784abb9f84bf78baf0b6dc6183c32
doc_id: 59416
cord_uid: fnh825zv

Keeping cryptocurrency spending keys safe and being able to use them when signing a transaction is a well-known problem, addressed by hardware wallets. Our work focuses on a transaction signing process for privacy-centric cryptocurrency Monero, in the hardware wallets. We designed, implemented, and analyzed a privacy-preserving transaction signing protocol that runs on a hardware wallet and protects the spending keys. Moreover, we also implemented a privacy-preserving multi-party version of the Bulletproof zero-knowledge prover algorithm, which runs on a hardware wallet with constant memory. We present the protocols and evaluate their performance on a real hardware wallet.

Cryptocurrencies gained popularity and increased adoption by general public in the recent years. They became a valuable asset worth protecting. In the vast majority of the cryptocurrency designs, the only thing needed to transact (spend) the coins is a cryptographic key (master key). Recently, we have seen several attacks on the software wallets storing the master keys and leading to coin thefts [4, 11] .

Software wallets are inherently vulnerable to malware threats, so users seek better ways to protect their cryptographic assets. One option is to use a dedicated hardware device, the hardware wallet, that stores the master key securely and performs the signature on the transactions specified by the user. The device can be equipped with a display to show the transaction details to the user (e.g., destination and the amount) and buttons to confirm the transaction information. As the hardware wallet (HW) is a special-purpose device, it has a much smaller attack surface than a PC. The HW is limited and usually needs a PC client for operation, e.g., transaction construction, transaction scanning.

Bitcoin, the first massively used cryptocurrency, does not provide much privacy to its users. i.e., the whole transaction history is stored in a public blockchain (append-only ledger), it contains source, destination addresses (cryptographic keys, pseudonymous identifiers), and transacted amounts in a clear form so the attacker can mount several chain analysis techniques to trace the financial flow between the users. On the other hand, creating the signature and implementing the logic in the HW is usually straightforward as it requires just transaction serialization logic and ECDSA signature over the data.

We focus on a privacy-centric cryptocurrency, Monero [1] , which is the most used privacy-centric cryptocurrency 1 . In Monero, all destination addresses are unique, and the amounts being transacted are hidden using Pedersen commitments [9] . The secure implementation of the Monero transaction signing is thus a more challenging task. Moreover, HWs are resource-constrained devices with limited memory and computational power.

For the practical testing, we choose a Trezor HW model T 2 (Trezor for short) as it represents a class of HWs with generally available processors used in the embedded devices, with around 100 kB of RAM. It runs Micropython 3 , a Python version for resource-constrained devices.

Contribution: This work builds on our previous technical report [5] and for an extended version of this paper please refer to the [6] . We designed and implemented a Monero transaction signing protocol for HWs. The protocol is simple, which helps with the security analysis. Moreover, it mimics the already deployed cold-signing protocol [5] used with offline Monero wallets. We implemented the protocol to Trezor and Monero codebases, and it is being used in practice. Moreover, we designed and implemented a secure multi-party protocol (MPC) version of a zero-knowledge proving system called Bulletproof [3] , which uses constant memory and works on HWs. The MPC version protects private values, from PC-based attacker. To the best of our knowledge, this is the first MPC implementation of the Bulletproof runnable on HWs.

General Methods: As the HW is a resource-limited device, the general methods for converting arbitrary protocols into MPC, such as Garbled circuits [10] , are not applicable, as those usually require significantly more memory and running time than we have available. We thus resort to a more effective protocol offloading design tailored specifically to the application domain, to preserve the practical usability of the resulting protocol.

The elementary operation of transacting a particular amount from a sender to a receiver is called transaction. The transaction is an atomic state transition. Transactions are stored in the blocks, the block contains a hash of a previously generated block, thus forming a ledger of blocks, a blockchain. A new block is created every 10 min.

Transaction has |T in | = m inputs T in j , |T out | = p outputs T out t and a fee. The amounts values in the input and the output side have to be equal, i.e., m j=1 v in j = p t=1 v o t + fee, so the transaction is a valid state transition (and no value is lost or created). Let's denote transaction outputs TXOs. Transaction inputs are also called unspent transaction outputs (UTXOs). UTXOs are addresses that have a non-zero balance that can be spent. A balance can be trivially computed by replaying all transactions recorded in the blockchain. Blockchain clients usually track all UTXOs and update the state with each new block.

Transaction construction is controlled from the PC client as it scans the blockchain for UTXOs that can be spent. A user enters the transaction recipient addresses and amounts on the PC. The client then performs the transaction signing protocol with an HW to obtain a signed transaction. The signed transaction is then broadcasted to the cryptocurrency network and eventually added to a block.

The Monero user wallet has two key-pairs, (k v , K v ) and (k s , K s ), the viewkey, and spend-key, respectively. The view-key is used to scan the blockchain for incoming transactions to the user wallet; the spend-key is required to create a signature for spending the incoming coins.

HWs are considered as trusted in all attacker models in this paper, i.e., HW securely stores master keys, and an attacker can gain no knowledge by observing and tampering the HW device. We only focus on an attacker controlling the PC client. The honest-but-curious attacker model is defined by an attacker that obeys the protocol precisely but tries to learn new information observing the protocol transcripts. The malicious attacker model is stronger, an attacker can arbitrarily deviate from the protocol. He can start multiple instances of protocols, interleave protocol runs, send, replay, delay, or drop any protocol messages.

We use a standard notation used in the related literature, such as [1, 3] . Due to the paper application domain in the Monero transactions, we use the elliptic curve (EC) Ed25519 [2] as a specialization of the cyclic group G of the prime order l, let's denote Z l the ring of integers modulo l. The Z * l denotes Z l \{0}. The x $ ← − Z * l denotes a uniform sampling of an element from Z * l . Capital letters represent points on the curve G, lower-case letters represent scalars from Z * l unless said otherwise. We use the EC additive notation for G, e.g., P +Q is a point addition, aP = (P + · · · + P ) is a scalar multiplication, 0P = O, i.e., neutral element, point in infinity. Let F n denote a vector space over F, the a ∈ F n is a vector from the vector space with elements a 0 , . . . , a n−1 ∈ F. The a · b = n−1 i=0 a i b i denotes a dot-product of a, b ∈ F n , the a • b = {(a i b i )} i is element product. We also use Python notation for vector slicing, i.e., a [:l] = (a 0 , . . . , a l−1 ), a [l:] = (a l , . . . , a n ). For k ∈ Z * l , the k n ∈ F n denotes the vector k |i| = k i . The G is a generator of the 

Ring signatures are signatures generated by a single private key k π corresponding to the public key K π which is in the ring of unrelated public keys R = {K 0 , . . . , K π , . . . , K n }. The verifier is not able to tell which K i ∈ R generated the signature. This provides n-anonymity for the signer. Keys K i , i = π are called decoy keys. Let's define n = |R|, i.e., the ring size.

Monero uses Schnorr-style multilayered linkable spontaneous anonymous group signatures (MLSAG) [8] . The linkability is a property that links signatures generated with the same private keys. The linked signatures have the same key image (explained later). Signatures with already seen key images are considered as invalid to protect from double-spending the same UTXO.

Monero uses the Pedersen commitment to conceal the transacted amounts and to prove that transaction input amounts are equal to transaction output amounts. A Monero range proof is a zero-knowledge proof that the TXO amount encoded in the scalar value v ∈ Z * l lies 4 in the interval of allowed values [0, 2 N ], N = 64. The range proof is an essential part of the confidential transactions as it protects from overflows and new coin generation.

The transaction generator takes T in and set of destination addresses and amounts T out and produces a transaction with signature. The value |T in | can be quite high and is limited by the fee the user is willing to spend for the transaction to be added to the blockchain and the current block size. The current Monero protocol version (0.15.0.1), i.e. hard-fork, specifies that for a valid transaction it holds that 2 ≤ |T out | ≤ 16. Thus the |T in | is the main limiting factor for the transaction generator with respect to the memory. The ring size n is fixed to 11 at the current version, but it is likely to increase in the future.

It is not feasible to construct a transaction in the HW in one pass. Thus the building process has to be separated into several steps so it can be computed with limited memory. We designed, implemented, and tested the transaction generation protocol that runs in the HW with unlimited T in (the protocol).

The offloaded signing protocol is described fully in the extended paper [6] . Here we explain only key ideas of the protocol construction.

A state offloading is required to build the transaction incrementally. Some parts of the state are sent to the host for later retrieval during the transaction construction. The protocol uses HMAC to protect the public state parts, e.g., parts of the final transaction, and an authenticated encryption (Chacha20Poly1305) for private state information, e.g., T in private spending keys.

All MLSAG signatures are generated in the HW, the k s never leaves the device, while the k v is exported to the host so it can scan all blockchain transactions to determine whether the funds were sent to the recipient wallet keys. Performing the blockchain scanning with the device would be inefficient.

The transaction is built on the host incrementally, from the information provided by the HW. In general, the host sends initial transaction information to the HW. The user is asked to confirm transaction details on the HW screen, such as destination address, amounts, and transaction fee. If confirmed, the HW generates HMACs for transaction input elements so they cannot be changed later (commitment to values).

Then all T in are sent to the HW one by one; the HW derives required signing keys, serializes T in to blockchain format, incrementally hashes information required for later MLSAG construction. Then destination information is sent one by one; the HW generates T out related information, serializes T out to the blockchain format, range proof generation (see Sect. 3) is handled. Finally, the MLSAG is generated per T in .

Key Construction Scheme: Let's describe HMAC key construction on the T out example, generated after user confirmation. HW returns to the host T out :

where k mac is a random HMAC master key generated per-transaction. The HMAC keys construction prevents from changing destination specification later in the protocol by an attacker. All the offloading keys used in the protocol are generated correspondingly, i.e., the keys are unique per offloaded element to make the protocol strictly commit to the offloaded values and their ordering. The key is derived from the master keys, the domain separator, e.g., "txdest", and the item index.

Encrypt-then-Reveal: In order to limit the attacker's reactivity, MLSAG signatures are returned encrypted to the host. The encryption key is returned to the host after the protocol finishes successfully.

The protocol is based on the cold-signing protocol [5] implemented in Monero codebase, which takes all transaction inputs T in , transaction outputs T out , asks the user to confirm transaction outputs, and a fee and generates a valid Monero transaction. Cold-signing protocol is trivially secure as it is evaluated in a secure environment (offline Monero wallet), and the user confirms all transaction outputs. Our offloaded protocol mimics the cold-signing protocol. It is evaluated in a HW, which is considered a secure environment. The transaction is constructed incrementally, from the basic input blocks T in , T out sent one by one to the HW. The user confirms the T out on the HW, as it has a display and touch screen. After the confirmation is done, the HW generates HMAC for confirmed T out . Thus it cannot be later modified by an attacker.

Each call is guarded by a state automaton, set up in step 1 by the parameters of the transaction. This prevents from calling protocol methods in a different than expected order. Moreover, due to HMAC and encryption key construction, it is not possible to modify, reorder, reply, or drop the offloaded state elements. Protocol aborts if invalid input is provided.

The only place where the k s is used is during spend key computation, during T in construction. The result of the computation is offloaded in an encrypted form, which could only be used as input in the last protocol step, during MLSAG generation per each T in . MLSAG signature is generated over hash commitment m, which hashes the entire transaction specification (T in , T out ). The protocol is thus secure in the malicious attacker model as cheating in each protocol step is detected by HMAC, auth tag, or state transition failure.

The only information the attacker can obtain from the protocol runs (without a need to finish the protocol) is key images corresponding to the T in j . The key images are part of the constructed transaction. As we need the host to sort key images so some kind of order-preserving encryption would have to be used to protect key images from leaking before the protocol finish. However, we do not consider this as a required measure as the key images are computed during the blockchain scanning once the transaction sent to our wallet has been found.

The space complexity is determined by O(n + p), i.e., by a ring size and the number of the transaction outputs, i.e., n = 11 and p ≤ 16 in the current Monero version. The whole ring is needed only for the MLSAG signature, which can be easily extended to support large rings. If the p is increased later, the protocol can be easily changed to offload all outputrelated values. The transaction signing protocol implemented in Micropython Table 1 . Performance of the transaction signing protocol on Trezor HW. The algorithm was tested on the emulator and Trezor T HW. Configuration is a tuple (#inputs, #outputs). The first metric, "Time emu", is a runtime in an emulator, other statistics are from runs on the real hardware. " Steps" is protocol computation time without communication overhead, "rounds" is a total number of message round-trips. Rows with "State" show a maximal state size over the protocol, where "real" is the real size measured in the implementation, "min" is the minimal space required, without Micropython objects overhead. Note that the range-proof is not included in the statistics as it is measured separately in Sect. 3. It is visible that the state size is constant, and timing is linear to the number of inputs. for Trezor HW was tested with various input transactions. Refer to Table 1 for performance overview data.

A range proof is a zero-knowledge proof that the amount encoded in a scalar v ∈ Z * l (256-bit number for Ed25519), lies in the interval [0, 2 64 ), without revealing the amount value. Range proof computations are the most resource expensive operations in the transaction construction (time and memory). Thus it makes sense to offload the computation to the host.

The range proofs make use of commitments

is a mask, and the V is part of the publicly stored information. If the attacker generates the masks in a special way, he can exfiltrate information about the keys or the transaction. From the binding property of the commitment scheme which relies on the discrete logarithm problem, it is infeasible to find a different v , γ , s.t. γ G + v H = γG + vH. The attacker already knows the amount as he observes the transaction construction, but knowing the masks enables the attacker to prove the amount to a third party (e.g., court). This poses a privacy risk as the attacker can prove that he has seen the transaction construction or knows the amount keys. The Bulletproof [3] (BP) is the range proof system used in Monero. The proof size increases logarithmically with respect to the number of statements (transaction outputs). BP can prove statement of the form M = 2 x . We implemented the memory-optimized prover version for the Trezor HW, described in Algorithm 1. (N ), M) ) memory. Table 2 shows the performance of the prover and the verifier implemented on the Trezor.

Due to increasing space complexity, it is not possible to generate BPs with M ≥ 4 on the Trezor. We thus designed a new privacy-preserving secure multiparty computation protocol (MPC) to compute Bulletproofs jointly with the PC host and an HW with a constant memory on the HW side. We do not consider the time and memory requirements of the protocol running on the host in the following 5 . Input: Amounts and masks 2 (V , A, S, T 1, T 2, τx, μ, x, h, l0,i, l1,i, r0,i, r1 

t ← l · r; ; ; x ← Hs(x||x||τx||μ||t) Evaluated with const. memory up to here 6 (L, R, a 0 , b 0 ) ← BulletproofLoop(l, r,G, y −|H| •H , MN, −1, x , x ) 7 return (V , A, S, T 1, T 2, τx, μ, L, R, a 0 , b 0 , t) 8 function BulletproofPrefix(v, γ) varint encodes integer to bytes 9G i∈[0,...,M N ) ← Hp(H("bulletproof"||H||varint(2i + 1))) 10H i∈[0,...,M N ) ← Hp(H("bulletproof"||H||varint(2i))) 11 Vj ← γjG + vjH Compute commitment vector V used on line 18 (a , b , G , H , n , c, w, x ) 29 while n > 1 do 30n ← n ; ; ; n ← n /2; A naïve offloading protocol computes all vector-related operations by chunking, i.e., exporting encrypted vectors to a host, then asks for vector chunks to compute the intermediate results. The vectors l, r and dot-products c L , c R , t can be computed incrementally as t = l i r i . This yields a constant memory protocol, but with high communication overhead.

We present basic offloading techniques in the following paragraphs, which are used to transform the in-memory prover to the privacy-preserving MPC prover with constant memory.

We can evaluate dot-products and foldings on the host privately, using homomorphic property of a blinding. We export the vectors π a a , π b b to the host, where π a , π b $ ← − Z * l 2 are random blinding scalars known only to the HW. The host computes the dot-product of blinded vectors s r = π a a · π b b = π a a i π b b i = π a π b a i b i and returns s r, i.e., blinded value r, to the HW. The HW then unblinds the s r as π −1 a π −1 b s r = r = a · b to get the dot-product.

As the scalars are from Z * l and vector elements are essentially random, the attacker cannot infer a from π a a . The π a a 0 = z does not have a unique factorization, i.e., ∀z, x ∃y : xy = z; y = zx −1 . Thus, a blinded vector is indistinguishable from an unblinded one for an attacker. Moreover, each element is divisor of 1 in Z * l so we cannot extract the blinding masks by GCD(πl 0 , πl 1 ) as it is undefined.

Folding Offloading: A vector folding is defined as a i∈(0,...,n ] = wa i +w −1 a i+n , before folding it holds |a | = 2n , after the fold |a | = n . Computing the folding on the host saves CPU and communication round-trips. Only the w is needed for the host to compute the folding, but it is desired to keep internal constants secret to preserve the privacy-preserving property of the offloading. Thus {w, w −1 } are incorporated into blinding masks.

We have two distinct blinding constants for one vector. One for the lower half (LO), the other for the higher half (HI): {π a LO , π a HI }. The folding is then computed in two parts, as we preserve the LO/HI blinding also for the folding result vector, as shown in Eq. 1, so this blinding scheme is composable.

Let's thus define x LO 

, where θ are randomly generated blinding masks from Z * l for the next round. The φ is constructed so it cancels the blinding mask π, multiplies by w {1,−1} , and multiplies by a new blinding mask θ. It is also easy to observe that folding offloading is compatible with the dot-product offloading, as

The blinding technique differs from the dot-product offloading due to constants {w, w −1 } being used. We need to have distinct blinding masks for each term in the folding sum, so an attacker cannot extract the w from the blinding masks. The folding offloading works in the following way: (1) Initial G , H Folding: The folding of the G , H cannot be performed as defined above as the vectors are protocol constants in the first round (known to attacker), i.e., π G = π H = 1. Thus the attacker could extract w from the φ and unblind the folded vectors. We define folding constants φ (0) for the first round as in the Eq. 2: φ

The host computes the folding with the G , H , φ (0) , the HW then generates a vector of correction points π G LO G LO and returns it to the host so the host can remove extraneous component caused by the additive blinding mask π G LO .

L c , R c Offloading: Observe that L c from line 33 contains 3 independent components:

Each component can be computed by the host with blinded vectors. The c L is offloaded dot-product, host returns π a LO π b HI c L . The sum n i=0 a i G i+n is computed from the blinded vectors in a similar way, the host returns: Ě L cA = π a LO π G HI n i=0 a i G i+n , the other sum is analogical. The HW unblinds the components and computes L c .

The offloaded Bulletproof prover as defined in Algorithm 2 was implemented in the Micropython for the Trezor and performance of the implementation was evaluated. Please refer to Table 3 and Fig. 1 for performance metrics.

Attacker Constraints: As we transformed the Algorithm 1 to the Algorithm 2 using offloading steps that preserve the privacy of the computation, the protocol remains secure in the honest-but-curious attacker model. The malicious attacker could tamper the intermediate results to learn new information or break the security properties of the protocol. However, such manipulation leads to an invalid proof with overwhelming probability due to the use of a cryptographic hash function on line 16. Rejection of an invalid proof leads to attack being detected. We could also run a Bulletproof verifier on the generated proofs, abort the transaction signing, and alert the user on the invalid proof. As in-memory verifier runs in constant memory, the protocol becomes malicious attacker resistant; however, the running time increases. There might be more effective methods to verify intermediate results or catch attacker cheating with high probability. Such extensions are left for future work. 

The work in [7] presents a signature protocol for a Ledger 6 HW (Ledger for short). Ledger is an HW with a secure element (SE). Using the SE and the overall architecture limits the usable RAM to a few tens kB. Thus they had to implement a more low-level protocol with basic operations such as: generate key image, H s (x), xP , get sub-address secret key, etc. Low-level cryptographic operations are computed in the device, acting as crypto proxy. The protocol is tightly integrated into the Monero codebase, and it imposes maintainability challenges as a Monero algorithm change usually requires an HW signing protocol change. The low-level protocol design makes the security analysis difficult as the information flow is quite complicated, and the attacker can call several methods in an arbitrary order, which can lead to information leak and potential vulnerability. To the best of our knowledge, the overall protocol is not documented nor analyzed. The low-level commands are documented in [7] only. The protocol does not use a state machine to guard the command calls.

To the best of our knowledge, there is no other Monero transaction signing protocol published nor used. Our protocol addresses issues with the security analysis, works on higher abstraction level, has very simple interface and thus reduced attack surface, it is more stable over protocol changes, easy to maintain, and needs less message round trips. Moreover, we compute the Bulletproofs in HW thus protecting blinding masks.

We designed, implemented, and tested a secure Monero transaction signing protocol for HWs. We designed and analyzed the memory and time complexity of the zero-knowledge proofs (range-proofs, Bulletproofs [3] ), algorithms focused on low-memory consumption so they can be computed in HW. The memory consumption is linear in the number of inputs/UTXOs. The results can be easily applied to other protocols based on ring signatures, Pedersen commitments, and Bulletproof range proofs.

We also designed, implemented, and tested a privacy-preserving two-party Bulletproofs computation protocol with constant memory, enabling the computation of large input instances securely and in a reasonable time. This is the first privacy-preserving Bulletproof prover implementation running on an HW in constant memory. Techniques used in the protocol are applicable to similar protocols like Bulletproof, and Bulletproof also has several applications outside of Monero.

The implemented protocols are practically usable. The transaction signing protocol has been deployed since Nov. 7, 2018, integrated both to Trezor and Monero codebases. All implemented sources are available online under a permissive open source license at: https://github.com/ph4r05/monero-tx-paper.

Zero to monero

High-speed highsecurity signatures

Bulletproofs: short proofs for confidential transactions and more

Official monero website is hacked to deliver currency-stealing malware

Monero wallet trezor integration

Privacy-friendly monero transaction signing on a hardware wallet, extended version

Ledger device for Monero

Ring signature confidential transactions for monero. Cryptology ePrint Archive

Non-interactive and information-theoretic secure verifiable secret sharing

Protocols for secure computations

Malware steals user funds & bitcoin wallet keys from PCs

We thank our colleagues PetrŠvenda and Marek Sýs, who provided valuable insights and ideas that helped to improve the protocols. Thanks also go to SatoshiLabs employees, Tomáš Sušánka, Jan Pochyla and Ondřej Vejpustek who did the security review of the design and implementation and helped significantly with simplifying the protocol implementation. We also thank the anonymous reviewers for their feedback and suggestions for improvement, and to Daniel Slamanig for shepherding the final revisions of our submission. This work was partly supported by the Czech Science Foundation project 20-03426S. For an extended paper please refer to the [6] .

Bulletproof prover with offloading. b tch is a number of elements to offload in one batch, n thr is a n threshold for in-memory finishSend(y) The host needs y to compute H and Lc, Rc related sumsThe y is public, computable from the final proof 12n ← n ; ; ; n ← n /2; ; ; c ← c + 1 13Receiveif n ≤ n thr then Finish in-memory with original algorithm 18Receive BulletproofLoop(a , b , G , H , n , c, 3] Compute and send blindings 22if c = 0 then 23Compute and send folding correction points for G , H , by chunks 24 π ← θ Update blinding masks for the next round