OUTPUT DISTRIBUTION OBFUSCATION ================================== Gregory Maxwell and Andrew Poelstra July 2014 Introduction =============== Using Bytecoin Ring Signatures (BRS), described at cryptonote.org, it is possible to disguise which of utxos is being referenced in any given transaction input. This is done by simply referencing all the utxos, then ringsigning with a combination of all their associated pubkeys. Double-spending is prevented using a so-called "key image", a novel feature of BRS. The key image is a component of a BRS signature which is determined entirely by the private key used to produce the signature. This means that two signatures which use the same private key, i.e. which try to spend the same coin twice, will have the same key image. Thus preventing double-spending is as easy as ensuring that the same key image never appears twice. (On the other hand, the key image cannot be inverted to obtain the original private key; thus in the absense of double-spending attempts it cannot be used to deanonymize the signature.) The value of the transaction input is the value of each of the referenced outputs, which must all be the same. (Alternately, as suggested by gmaxwell, the outputs might have various values, in which case the input value is their minimum. We will not consider this, except to point out that it doesn't interfere with anything else that will be said.) This uniform-output-value requirement on our ring signatures is a serious burden. It reduces the anonymity set available to the spender of any given output, and makes any minimum anonymity set size requirement unworkable. (We want a minimum anonymity set size to prevent people creating "ring signatures" with only one public key, incontrovertibly spending an output and retroactively removing it from every anonymity set that it had been used in.) To correct this, we propose an output-encoding scheme by which outputs of *every possible size* are created alongside each real output. We further require that the "ghost outputs" are indistinguishable from the real ones to anyone not in possession of the outputs' associated private key. (With ring signatures hiding the exact outputs which are being spent, even spending a real output will not identify it as real.) The Scheme =============== We require a new chain parameter, K, which is some nothing-up-my-sleeve data. Let G be the generator of an EC group. G and the group are also chain params. We also require a function COERCE which will coerce numbers to EC points in an easily invertible way. (We don't need invertibility, but it prevents the discrete logs of the output values from being correlated with the input values, which we do need.) We are using this abstract function COERCE to be group-agnostic. For an EC group, it can be as simple as "(x, y) where x is the input and y is the numerically biggest value for which (x, y) solves the group equation". Then each output is labelled by an EC key P = pG and a total value V. Then for each n in [1, ..., V] and each i in [1, ..., ceil(V/n)], the (n, i)th ghost output has value n (except when i = ceil(V/n), then the value is the remainder so that for each n, all the (n, i)th outputs have total value V). It can be spent by a signature using the public key P + iG + COERCE[ H(K||n) ] Choose h(n) so that h(n)G = COERCE[ H(K||n) ]. Then the "real outputs" are simply those ghost outputs for which the value p + i + h(n) are known. Note that nobody actually knows h(n) --- this would entail solving a discrete log problem. Instead, P is chosen based on COERCE[ H(K||n) ] so that the sum p + i + h(n) is known. Specifically, to construct such an output, the recipient: 0. Chooses a random keypair Q = qG. 1. Chooses the value V. 2. Chooses a random n in [1, ... V]. 3. Computes P = Q - iG - COERCE[ H(K||n) ]. Then labels the output with key P and value V. Since P is forced once n is decided, it is impossible to compute a P for which multiple n's private keys are known. Thus the real outputs all have the same n, and the total value of the real outputs does not exceed V. This way, whenever somebody needs an output of size N to participate in some anonymity set, she can choose literally any output with point P and value V ≥ N, and choose random i, then use the (N, i)th ghost output for P. Anonymity of n =============== Now, since every real output has the same values of n but different i, while ghost outputs will be chosen randomly, it is likely that any two occurences of {P, V, n} with different i belong to the same person, and that this value of n is the real one. Users can therefore improve their anonymity when selecting ghost outputs by trying to reuse n for any given P, V. That way the real outputs (which are forced to reuse n) won't stick out so much. This anonymity risk --- that real outputs share a property while the ghost outputs do not --- will become more serious in the next section. Scripts =============== In fact, we can do one better than simply having every output take every value: we can have every output take every script! The benefit of this is that a scripting system can be added without requiring complex scripts to throw away the ring signatures' anonymity (by insisting that every transaction in an anonymity set have the same script). For standard pay-to-address transactions, we can optimize by having no script at all; the ring signature itself provides the proof-of-knowledge. Even so, these scriptless transactions can join the anonymity set for scriptful ones. The way to do this is simple: replace H(K || n) with H(K || n || script). To spent an output, you must provide the script alongside its satisfying input. I warned in the last section that the anonymity risk would be worsened: now the real outputs share the same n -and- the same script, while ghost outputs might share neither. Since a script must be accompanied by a satisfying input, and for some types of scripts only one person is able to produce such an input, this means that others cannot reuse such scripts even if they wanted to! (Other types of scripts, such as those that simply need a hash preimage to be satisfied, could in principle be reused by unrelated parties, but they'd only be able to create an output requiring this script -after- the preimage had been published, providing an easy way to distinguish the original from the copycats.) The net of this is: 1. If you don't require scripting, i.e. your output can be redeemed by knowing a single secret key, you can spend this output using any outputs as ghosts in the anonymity set. The value of n that you choose is forced when spending the output, so it is better to use ghosts that have already been used with this n (but different i). That way n-reuse is not a good way for an adversary to distinguish parties. 2. If you require scripting, there is no way to avoid n-and-script reuse when spending multiple real outputs. Therefore, you should create your output with n == V so that there is only one real output. Doing this means that any outputs in the anonymity set with n != V must be ghost outputs, so this restricts the anonymity set -- but on the other hand, every output with this V value has a usable ghost output, so this is still potentially a large pool from which to choose an anonymity set. The takeaway is: when using scripting, choose a popular output value. Integer numbers of coins, powers of two, whatever. Security =============== There are two security requirements: 1. That ghost outputs are indistinguishable from real outputs for anyone not in possession of their corresponding private keys. This is true since these keys are not required to produce any part of the blockchain (except ringsignatures with only one key — so forbid these). 2. That the real outputs never have total value ≥ V. This is true in the random oracle model since if an attacker knows both p + i + h(n) and p + i' + h(n') for n ≠ n', it is an easy piece of algebra to compute h(n) - h(n')... so use a standard simulation argument which guesses n and n' and sets COERCE[H(K||n)] - COERCE[H(K||n')] to a discrete log challenge. To do this use invertibility of COERCE. (This is not a proper argument since an attacker can perhaps produce signatures for (p + i + h(n)) and (p + i' + h(n')) without actually knowing these values....but to use such an adversary in a proof, we would have to fix our signature scheme, which is beside the point. For an actual implementation this gap should be filled in.)