--- Log opened Tue Aug 27 00:00:44 2013 10:54 < amiller> yeah so 10:54 < amiller> the only obstacle to implementing a snark verifier within a snark program 10:55 < amiller> is that we don't have any simple C code that implements the bilinear pairing needed to make pinocchio work 12:36 < gmaxwell> So I've been talking with Iddo in private about a bunch of SCIP things and came up with a cute idea you all may enjoy. 12:37 < gmaxwell> In some of the SCIP versions the prover produces a large number of locally testable points, then builds a hashtree over them, and the hash tree tells them which points to sample to show the verifier. 12:38 < gmaxwell> This can achieve reasonable security because the local tests depend on the other local tests, and a bad one is unlikely to pass with junk inputs. 12:39 < gmaxwell> But you still need to have many tests to achieve reasonable security, because the prover has a verification oracle (e.g. he can just simulate the oracle and keep trying junk inputs until he finds one that passes, unless you have many sampled points) 12:39 < gmaxwell> So I suggested this idea: You create the large SCIP proof with all the locally testable points, with its hash root in a transaction, and you give the whole big thing to a miner. 12:40 < gmaxwell> The miner uses its own randomness to test it (an interactive proof, no verification oracle)... and happy that its valid the miner mines a block. 12:40 < gmaxwell> Now you use that block hash to ultimately select which parts of the locally testable proof to transmit along with the block. 12:41 < gmaxwell> So a verification oracle now would have to have some large multiple of the whole bitcoin network's computational power. 12:41 < gmaxwell> Morover, as the block becomes further burried, later blocks can perform additional selection to further trim down the proof. 12:41 < petertodd> define "verification oracle"? 12:41 < gmaxwell> Until the proof is nothing more than the hashroot, security provided by POW burrying. 12:42 < gmaxwell> petertodd: A magic black box that you give a proof to and it tells you if the verifier would accept the proof. 12:42 < petertodd> gmaxwell: Hmm... ok so you want the verification oracle to have to have hashing power so you can't just use it to create fake proofs? 12:44 < gmaxwell> Right the idea behind these hash tree committment proofs is that they're non-interactive— the hash of the proof tells you the random elements to test... but unless you make the function that selects the points you sample very expensive a dishonect prover can potentially create a fake proof. 12:45 < gmaxwell> My idea is to introduce a weak kind of interaction, interaction with the bitcoin network, to create a verifier which is at least as strong at getting "oracled" as the bitcoin network is against overpowering attacks. 12:46 < gmaxwell> I had a weaker form of this idea earlier where you make the tranasactions two phase— first make a txn that commits your proof, then a second transaction which provides the selection... but I just today realized that you can have the miner do this, and it eliminates the need for two transactions. 12:47 < petertodd> Ok, so another way of describing it, is to just say that miners are making a non-interactive proof, but the selection process within that proof relies on the incredibly high cost of selecting blocks to avoid the invalid parts of the proof as opposed to a more general "picking only invalid requires 2^n hash ops" 12:47 < petertodd> Or really, by being multistage the "2^n hash ops" is achieved by multiplying the ops by the ops required to find a block. 12:47 < gmaxwell> petertodd: yea exactly. But in particular, it requires a multiple of the bitcoin computing power, whatever that is. 12:47 < petertodd> Kinda like I was talking about for UTXO posession proofs before. 12:48 < gmaxwell> Also, it fits with a general idea that as a block gets burried (more POW) you could use the subsiquent blocks to throw away more and more of the proof... So the network starts out as zero-trust validation, but the deep history is just POW-consensus validation. 12:49 < gmaxwell> if transactions were structured so that you could elide scriptpubkeys this could also be used to compress regular bitcoin transactions burried far in the history. 12:49 < petertodd> Heh, so you could describe it in terms of "including all the subsequent blocks, the proof would require n hash ops to have p probability of finding a fraudulent proof" 12:51 < amiller> i wonder why it should matter if you have old proofs 12:51 < amiller> like 12:52 < gmaxwell> also, if you generalize this to cover the validation of whole old blocks (e.g. by making hte chain a hash tree itself) and the fall off in proof size is exponential with more work, it means that the data required to sync the historical chain is some constant. 12:52 < amiller> if you have a proof that the nth block is valid and contains a proof that the n-1th block is valid and contains a proof etc etc 12:52 < amiller> why is it necessary to demonstrate posession of old history 12:53 < amiller> the only reason i think is to tolerate forks 12:53 < gmaxwell> amiller: It's not, assuming you have these proofs. (and forks, but forks don't _pratically_ apply to _old_ data, for some definition of old, or the system is already doomed) 12:54 < amiller> then the work could probably be used on something more meaningful 12:55 < amiller> hm 12:55 < gmaxwell> I fully welcome someone going out and building UTXO checkpoints that prove faithful validation... thats likely a big engineering challenge however. 12:55 < amiller> i have been doing a lot of work with the pinocchio guy, which is tinyram's competitor basically 12:55 < amiller> we are trying to implement hashes and merkle trees in pinocchio 12:55 < petertodd> gmaxwell: Yeah, with the fraud % stuff I was talking about before you could use the PoW hash to select some subset of transactions/txouts to show proofs for, which as you say keeps the sync data required constant. 12:55 < amiller> pinocchio is a small C compiler but it has no ram unlike tinyram, so we have to approximate it with merkle trees 12:55 < amiller> he says that it's impractical for the time being to implement the recursive checker 12:56 < amiller> even though it's constant size it is a big constant 12:56 < amiller> we'd basically have to port the whole GMP library and bilinear pairing operation to this and it's just expensive 12:56 < amiller> but i'm really convinced now that recursive composition will work in a straightforward way 12:56 < petertodd> gmaxwell: The fraud possible would then be a function of literally some % of total outstanding UTXO value - though it really should take age into account given lost coins would then make the economic % of fraud possible go up. 12:56 < gmaxwell> amiller: there is a lot thats _possible_ but the engineering work and runtime requirements are still just too insane. 12:57 < petertodd> gmaxwell: (assuming the numbers work out so that the % is even meaningful) 12:57 < amiller> it's engineering yeah but it might be worthwhile in this case 12:57 < amiller> even the cost of building the proofs diminishes if it can be parallelized/distributed 12:57 < amiller> which it can 12:58 < gmaxwell> petertodd: So I even wonder if the worrying about subsetting single blocks makes sense when you could just subset out whole blocks. So long as you could produce a locally testable proof of a single block (which we can, if we have a comitted utxo). 12:59 < gmaxwell> petertodd: so the idea is that for the old history you forget whole blocks forever, selected by the hash of new blocks... and just retain some fraction along with the locally testable proof (uxto fragments that let you validate the block).. would be interesting to work out the cheating economics. 12:59 < petertodd> gmaxwell: We don't even need comitted utxo really: in your proof just provide the txouts spent by that block a level deeper. 13:00 < gmaxwell> petertodd: hm. it's true, you just need the SPV fragments for all the inputs. 13:00 < petertodd> gmaxwell: It'd all work especially well for the simplier interactive case where you're just trying to make sure the UTXO set a peer gave you is actually valid - best of all if we screw up we can change the algorithm without even a soft-fork. 13:01 < gmaxwell> Yea, in the interactive case this is all a lot stronger. But if we want to make historical storage a constant interactive is out. 13:01 < petertodd> Sure, but point is we can engineer that *first*, and learn from it prior to doing the non-ineractive version. 13:01 < gmaxwell> But if we can use POW hashes to pick the subsets, I think we can make non-interactive require some multiple of the networks computing power to cheat. 13:01 < gmaxwell> And yea, absolutely I agree. 13:02 < gmaxwell> The non-interactive system is just a derandomization of the interactive one. 13:02 < petertodd> In fact, with SPV proofs for each txout, you can still have an interactive node sync from another interactive node I think - again, gota think about the economics. 13:02 < petertodd> Oh, no, that doesn't work: can't stop the peer from removing a UTXO and not telling you. 13:03 < petertodd> Though it may be enough to use these challenges to determine if the delta-UTXO of some block of history is correct, meaning you don't actually need to get that whole block of history from a peer, just how it changed the UTXO set. 13:04 < gmaxwell> thats why I was talking about a committed utxo, since it makes the state transition implicit. 13:04 < petertodd> For sure, again, just thinking about doing a meaningful prototype prior to changing th eprotocol. 13:08 < gmaxwell> well I do think that unfortunately a random check of the past headers needs a protocol change. 13:08 < gmaxwell> because you can't tell if random headers are connected. :( 13:08 < gmaxwell> well no, I suppose you could ask a peer to commit to a hashtree over the past headers.. without having it in the protocol. 13:08 < gmaxwell> and if you catch them cheating you ban their ip. 13:16 < petertodd> Oh, I'm assuming you have all the headers first. 13:16 < petertodd> This is just to optimize getting the blocks themselves. 13:29 < petertodd> Hmm... the problem is if any 1 element in the UTXO set is either invalid, or missing, the attacker can fork you. The numbers just aren't going to work out for checking enough of the proof to be sure there isn't an invalid txout in there, other than getting copies of every tx for every txout in the set. The same applies to being sure that you aren't missing a txout. 13:29 < petertodd> With UTXO commitments it's another story, but without them I think it's hopeless. 13:31 < gmaxwell> The forking you isn't so bad. 13:31 < petertodd> How so? 13:32 < gmaxwell> ask the guy who gives you a block to prove any txo you can't prove for yourself. 13:33 < petertodd> But that leads to bandwidth forks - the proof of a txout is the tx, and that's far larger than the txout itself. 13:34 < gmaxwell> it's not just the tx, it's a spv fragment for the tx.. a lot larger, sadly. 13:35 < petertodd> For a small tx sure, but you could arrange for those tx's to be all MAX_BLOCK_SIZE large... 13:36 < gmaxwell> I'm agreeing with you. this is another reason that it blows that our tx format is not tree structured. 13:36 < petertodd> Yup 13:36 < gmaxwell> ideally the proof for an output should be log(blocks) hashes + log(txn in block) hashes + log(outputs in txn) hashes.... 13:36 < gmaxwell> plus the output. 13:36 < petertodd> Yup 13:37 < gmaxwell> but instead it block hashes + log(tx in block) hashes + the whole size of the transaction, which could be enormous. 13:38 < petertodd> Supposing it was though, you could pass around tx's and blocks with txout proofs relatively cheaply (k*~log() increase in bandwidth) and all nodes could start validating blocks fully fairly well. 13:40 < gmaxwell> petertodd: yea, would probably only tripple transaction sizes, assuming max size blocks. 13:40 < petertodd> yup 13:43 < petertodd> Would work nicely with fraud proofing too, because a fraud proof for an invalid txin is just to point out that it's invalid. 17:50 < gmaxwell> amiller: so for pinocchio, you just have your transcript with steps*words memory, and you compute a hashtree over that.. and then the circuit satisfication runs and just validates that every access is consistent with the transcript memory snapshot? 17:50 < amiller> yeah 17:50 < amiller> it's only a little different than tiny ram which doesn't use a merkle tree to represent ram, but it does do this weird sorting/routing thing which has almost the same effect 17:51 < gmaxwell> well kinda, the sorting is provably correct with non-determinstic advice, so it can be very minimal. Though how the efficiency ultimately plays out I dunno. 17:52 < amiller> it's kind of just an optimization of the write-to-merkle-tree-every-time 17:52 < amiller> like the tinyram begins empty and doesn't write back anything when it's finished 17:52 < amiller> so it's almost like a cache 18:03 < gmaxwell> I had a weird dream about this proof systems for software last night. Where someone had some new technique which was particularly powerful, and I went to go try to convince them to not let MIT patent it because they'd be typical licensing idiots and prevent everyone from using it. ... and then I got lost in mit. very weird. 18:10 < amiller> so i've studied the hell out of this recurive snark composition paper 18:10 < amiller> and i'm writing my own now 18:10 < amiller> they argue that their construction only works for constant-depth circuits 18:10 < amiller> which means it works for turing machines with a fixed polynomial bound 18:10 < amiller> on the number of steps 18:10 < gmaxwell> right. 18:10 < amiller> i claim that you can do it for unbounded length computation 18:11 < amiller> because you can build a fixpoint verifier 18:11 < gmaxwell> by nesting proofs? 18:11 < amiller> in either case you nest proofs 18:11 < gmaxwell> interesting. 18:11 < gmaxwell> oh I see, right nesting gets you the polynomial bound. 18:12 < gmaxwell> making the computation unbounded would be nice... having to precompute for different work sizes stinks. 18:12 < amiller> yes 18:13 < gmaxwell> (esp in the model where if the generator cheats it can produce false proofs, because you'd really want to only ever run root generator once since gaining confidence in it will be expensive) 18:14 < amiller> i can't figure out why they didn't do it this way in the recursive composition paper 18:15 < amiller> but i can describe my scheme really easily 18:15 < amiller> to start with the snark consists of a triple G,P,V 18:15 < amiller> G(k,C) takes a circuit C, security k, and outputs a verification key v 18:16 < amiller> prover P(C,x,w) takes a circuit C, input x, witness w, and outputs a proof p 18:16 < amiller> the circuit is a function C(x,w) -> {0,1} 18:16 < amiller> x and w are the combined inputs of the circuit but it's split into a part x that the client provides and a part w that the prover provides, think of the x as a blockhash and w as untrusted block data that gets checked during the circuit 18:17 < amiller> so the conciseness of a snark is that v is always constant regardless of the size of C, and so is p, and V takes constant time to run 18:17 < amiller> next part (two of three) is a constant step turing machine 18:17 < amiller> this is easy because i can represent the tapes of a turing machine as a hash chain 18:18 < amiller> so M(s0,s1,w) -> {0,1} returns 1 is s0 -> s1 is a single valid state transition 18:18 < amiller> s0 and s1 are digests of the turing machine state including the remainder of the tape to the left and to the right 18:18 < amiller> blank tapes have like the genesis digest 0000000 sentinel value 18:19 < amiller> w contains like one element of the tape, either the left or the right, so it's enough untrusted data to check one step 18:19 < amiller> okay so the final part is putting these together 18:19 < amiller> the trick is to build a circuit that contains the single step M and the verifier V, and it also takes a key v as its input 18:19 < amiller> and passes through 18:20 < amiller> so you can compile that whole circuit v* and pass v* as input and that's a fixpoint verifier 18:20 < amiller> so more specifically, 18:20 < amiller> i'll define M* as a circuit 18:21 < amiller> M*((s0,sF,vk), (w1,p1,s1)) -> {0,1} 18:22 < amiller> M* returns 1 if M(s0,s1,w1) and either V(vk, (s1,sf,vk), p1) or s1==sF 18:23 < amiller> i forgot to write the form of the verifier V earlier in part 1 about snarks, so it's V(vk, x, p) = 1 only if there's some witness w such that p = P(C, x, w) 18:24 < gmaxwell> yea, thats obvious enough, thats what the snarks prove. 18:24 < amiller> okay so that circuit is like a fixpoint operator 18:24 < amiller> v* = G(k, M*) gives you a special key 18:24 < amiller> that you can pass in 18:25 < amiller> so basically the final verify function is like V*(proof,s0,sF) = V(v*, (s0,sF,v*), proof) 18:25 < amiller> you use v* as the verification key, you also pass it through as input 18:25 < amiller> that's the whole damn thing, no troubles incurred. 18:26 < gmaxwell> But doesn't the verification key grow linearly with the depth instead of being constant? 18:27 < amiller> verification key is constant in the size of the circuit 18:28 < gmaxwell> oh I see, right. It's not N verification keys, it's a verficiation key of a circuit that includes a verifier for itself. 18:29 < amiller> it's a verification key of a circuit that includes a verifier for any verification key 18:35 < amiller> i'm beginning to understand the problem of extraction for security thouhg 19:03 < amiller> this seems like a ridiculous technical detail. 20:28 < amiller> yeah this is frustrating, i think it's a crypto-definitions quirk more than anything practical 20:28 < amiller> the problem is that security for these snark things is defined using a non-black-box extractor 20:28 < amiller> the scheme is secure if an extractor exists 20:29 < amiller> if some adversary P' produces an untrusted proof, then the extractor is given non-black-box access to the code of P' 20:29 < amiller> and the extractor is supposed to produce the witness, and run in polynomial time relative to the time of P' 20:30 < amiller> so the problem is this definition composes really poorly 20:30 < amiller> because if P' altogether runs in time t 20:30 < amiller> then E(P') might run in time t^2 *just to give you the next-to-last proof* 20:35 < amiller> then you'd have to run E(P') in time t^3 just to get the 2nd from last, etc... 20:35 < amiller> E(E(P')) i mean 20:39 < gmaxwell> yuck. 22:59 < amiller> i want to make a new definition for proof of knowledge 22:59 < amiller> bitcoin is really the perfect example for this 23:05 < gmaxwell> hm? 23:05 < amiller> the need for something like an extractor is because of the vacuousness of just saying "there exists", in the sense that a blockhash is valid if there exists some valid blockdata that's a preimage of it 23:05 < amiller> because there are a lot of valid blocks and the hash has collisions somewhere 23:09 < amiller> the recursive snark / proof-carrying-data paper basically defines this "compliance predicate" thing that describes valid blocks but as a recursive statement 23:09 < amiller> hrm 23:09 < gmaxwell> hm. I guess a useful definition of proof of knoweldge required that the thing you're proving be concrete enough that it's not a totally empty claim. 23:11 < amiller> the idea of an extractor is pretty compelling, like it says you have to efficiently provide the witness, where the witness is all the actual data 23:12 < amiller> the technical details are baffling and unnecessary tricky though, like it basically says "given access to compiled program code that produces a proof, there's an efficient reverse-engineering that produces the witness" 23:15 < amiller> so i wonder if there's a more indirect way to do it that's like 23:17 < amiller> rather than saying there's an extractor that extracts the witness, producing the proof using anything other than the witness is hard 23:37 < gmaxwell> it is a bit interesting the the SNARK proof is there exists a witness such that f(public,w)=x... but it doesn't directly prove that the prover knew the witness. 23:39 < amiller> "knew the witness" is really difficult to define 23:44 < amiller> it would be a really minor engineering effort to make pinocchio work for bitcoin 23:44 < amiller> like, who cares if it takes 10 minutes to make a whole blockchain proof 23:45 < amiller> per block even 23:45 < amiller> the "real world practical costs" threshold is a whole lot different if it's public data and its providence concerns a lot of people 23:45 < amiller> provenance* 23:46 < gmaxwell> You think the prover could run that fast, with a state space of several hundred megabytes? 23:46 < gmaxwell> (and ECDSA signature validation in it?) 23:47 < amiller> yeah maybe 23:47 < amiller> one of the weird things is that 23:47 < amiller> because of the algebraic structure (it's bilinear groups based on elliptic curves anyway) you get some kind of strange operations for free 23:47 < gmaxwell> well I think that would be tremendously valuable, it greatly changes our long term scaling, since we could have comitted utxos and then proofs of them and nodes could hotstart without substantially degrading the security model. 23:48 < amiller> yeah it changes things about the whole chains-validating-other-chains kind of stuff too which is more deeply why i'm so interested 23:48 < amiller> so, like, it's possible that lattice based hashes or lattice based signatures would be even cheaper than it seems 23:49 < gmaxwell> eliminating storage of user provided data would also remove a lot of existential risk for us... I think it's only a matter of time before someone tries to use childporn in the historic chain as an excuse to shut down bitcoin or to force it to become centeralized. 23:51 < gmaxwell> I know how to keep user provided data out of the utxo, but can't remove it historically without either proofs of validation or a reduction in the security model. ... but if the computation cost thousands of dollars to perform for the proof thats not a big deal. 23:52 < gmaxwell> (okay, well thousands would be kinda obnoxious, but it's viable) 23:52 < amiller> yeah. 23:54 < gmaxwell> by the numbers I think the majority of bitcoin users don't have a clue about security at all, and would be perfectly happy if all the rules were removed from the software and BTCguild, slush, and asicminer were just trusted to do the right thing. ... so I do worry a lot about a politically hot argument to degrade the security for expedient reasons. --- Log closed Wed Aug 28 00:00:47 2013