--- Log opened Wed Jan 22 00:00:50 2014
00:50 < petertodd> gmaxwell: nifty chips - vitalik claims they're going to do a PoW (+PoS) competition - I predict it's going to be a horrible failure because the don't even have the skills to properly vet candidate judges...
00:52 < petertodd> gmaxwell: incidentally, I was talking about PoW with a EE unfamiliar with the field, and he independently thought of the area-power re-use thing immediately, which I think indicates how utterly out to lunch 95% of the people here are (scrypt authors included)
00:52 < gmaxwell> petertodd: well and everyone participating has an incentive to play up their advantages. It's also predicated on a goal which is not proven to be objectively worthwhile.
00:53 < gmaxwell> yea, this wasn't obvious to me before.  Now it really would be interesting to go analyize scrypt power usage and go compute up the total costs.
00:53 < petertodd> gmaxwell: meh, the other thing the EE immediately saw was how important the goal was - he understood damn well how easily niche technology gets regulated out of existence
00:53 < petertodd> gmaxwell: it *is* an existential threat and figuring how how best to solve it is very important, even if only to make sure the threat doesn't actually happen
00:54 < petertodd> I really suspect there's some interesting games you can play with power gating memory and scrypt - for instance you could probably make a low-power dram implementation that doesn't refresh ram and accepts errors in exchange for low power (another thing that EE immediately thought of)
00:55 < gmaxwell> actually the lifetime of the required memory is so low it probably doesn't need refresh.
00:56 < petertodd> that's the *problem*! DRAM controllers already take that into account, but on top of that optimization you can probably push voltages even lower than standard, and maybe even use some simple, and custom, prediction stuff to shave it even further
00:56 < gmaxwell> scrypt access patterns are somewhat unpredictable so it would be hard to just size the capacitors so that it never failed, but you could still get failure rates as low as you want.
00:57 < petertodd> yeah, and economically optimal is going to be very high failure rates by conventional standards
00:57 < petertodd> probably orders of magnitude higher - so much so that the design will be 100% custom
00:58 < gmaxwell> yea, existing mining hardware runs fine at failure rates around 1%. e.g. stuff ships out of the factor with ~1% of returned nonces being wrong.
00:58 < petertodd> existing computers have failure rates probably... I dunno, twelve orders of magnitude less than that?
00:58 < gmaxwell> you can't run commodity silicon at those error rates because something important will glitch out and it'll wedge.
00:59 < petertodd> well... that's changing though, because designers are being forced into that kind of error territory - we're also lucky that GPU's can tolerate higher error rates than other computing stuff, kinda
00:59 < gmaxwell> (this was actually one of the reasons gpu mining headlessly worked better: most cards could be pushed a lot futher when they weren't displaying anything)
01:00 < petertodd> in any case, said EE thought my ideas about FPGA "cottage industry" PoW algorithms were feasible, because FPGA hardware these days can have a surprising about of power gating and similar tech
01:01 < petertodd> similarly things like DRAM often have a lot of control over how the internals work if you're willing to attach it to a custom controller, and those controllers are FPGA-implementable with good performance
05:34 < adam3us> yeah I was wondering as a trend if FPGAs can get closer to ASIC in density, and reduce the ASIC/FPGA performance gap, and that as seemingly moore's law may top out with current fab around 5nm, then the next stage is more cores, more CISC designs, and reconfigurable - eg if you have some GPU units on the die, why not a slab of FPGA; we already have microcode, why not lower (hw) level reconfigurabilty as an on die FPGA co-processor
06:03 < wumpus> adam3us: so you're counting on the overhead for (low-level) programmability to go down; any specific reason for that?
06:03 < wumpus> it would be great, agreed though
06:06 < adam3us> adam3us: they're running out of other options, and the intel & amd & arm chips are getting more and more cisc.  gpu, mmu, power regulator, level 4 cache, more simd instructions, special crypto instructions, codec instructions.  seems like the next step.  (I am not a hw person tho).  so if there is room, and fpga are maybe not so widely used vs cpu so maybe with more r&d focus that asic/fpga gap could be closed somewhat
06:08 < wumpus> there certainly seems to be a trend toward lower-level many-core paralellism programmability in newer architectures (paralella, xmos), but not entirely at the gate level, it's more GPU-like from what I understood
06:10 < wumpus> one of the (sw) problems with FPGAs in general-purpose computers is sharing them between applications, it's a limited resource users may not easily understand. GPU vendors spend a lot of work on context switching / multitasking, but on a FPGA that may be harder.
06:15 < wumpus> of course, if you have a fast programmable FPGA or one that supports partial reprogramming you could maybe dynamically allocate gates, but from what I've seen up to now  reprogramming a FPGA isn't quite as granular/fast
18:23 < gmaxwell> Interesting: I emailed Colin Percival and expressed my concern that the scrypt cost assumptions may be inaccurate due to a failure to account for energy consumption and asked if he'd performed or was aware of anyone else performing an analysis which included energy consumption.
18:24 < gmaxwell> He responded and said "I'm not aware of any analysis which includes energy consumption.  I don't
18:24 < gmaxwell> know anyone who has looked at this who has the necessary expertise in
18:24 < gmaxwell> microfabrication technologies to accurately predict how energy-efficient
18:24 < gmaxwell> a *custom* circuit could be."
18:26 < phantomcircuit> gmaxwell, hmm?
18:31 < gmaxwell> phantomcircuit: New theory: Scrypt may be less effective as a KDF than the conclusions in the scrypt paper suggest because the analysis there did not include operating costs, just chip making: For number crunching chips the power cost outpaces the fabrication cost quite rapidly... and given a specific commodity hardware time budget scrypt cracker may actually use less power (than say sha256-pbkdf2).
18:33 < phantomcircuit> gmaxwell, that is certainly correct
19:01 < midnightmagic> wow
19:01 < sipa> such theory
19:03 < gmaxwell> maybe I can extract some data from the gridseed folks to allow for a scrypt asic cost model that includes energy.
19:03 < gmaxwell> (what I can't just extract from their data sheets is how the energy usage scales with the memory hardness parameter, not without knowing how much of their power is used by the dram vs the rest.)
19:09 < Luke-Jr> Anyone have any tips on boarding in Miami Beach? :/
19:09 < sipa> don't tell them you have a bomb
19:10 < Luke-Jr> …
19:10 < Luke-Jr> s/boarding/lodging/
19:12 < jps> sipa: the NSA will never let Luke-Jr board the plane now
19:12 < Luke-Jr> NSA has no authority over that :P
19:13 < sipa> TSA...NSA... just 3 bits difference
19:13 < Luke-Jr> lol
19:15 < jps> I'm sure those guys get a shot at the no-fly list
19:24 < petertodd> gmaxwell: I'm thinking of just hiring someone with ASIC design experience to look at this stuff frankly
19:25 < petertodd> gmaxwell: the EE I was talking to yesterday said he had some contacts
19:25 < gmaxwell> petertodd: actually having built a scrypt asic trumps abstract expirence. :P
19:27 < petertodd> adam3us: the FPGA overhead might get closer to ASIC overhead, but only in the sense of power limitations - you'll never get similar space limitations unless ASIC tech changes pretty drastically in ways that are rather unpredictable
19:28 < petertodd> gmaxwell: the ASIC was a single performance point - scrypt is tunable after all
19:29 < gmaxwell> petertodd: yes, but my _suspicion_ now that I've though about it is that the power usage per user-tolerance-unit will go down as memory usage increases.
19:29 < gmaxwell> esp once memory usage is high enough to not fit in cache on commodity hardware.
19:30 < petertodd> gmaxwell: as is mine, but does that make asics more or less attractive? potentially *less* if commodity dram can be tuned the way we want it to be
19:31 < gmaxwell> I don't follow your argument but I suspect its on an entirely different subject matter than I'm talking about. I'm specifically concerned with scrypt as a KDF here, and I think this thinking invalidates the argument given in the scrypt paper, and that the result might be that scrypt reduces security against a well funded attacker cracking your password.
19:31 < petertodd> gmaxwell: e.g. scrypt with 4GiB might stress random access latency so much that everything but the ram doesn't matter at all
19:32 < petertodd> gmaxwell: no, I'm talking about KDF's - they're an easier problem that ASIC-hard PoW functions
19:34 < gmaxwell> petertodd: right, and if the power costs dominate after N months of operation, and the custom cracker has 10 fold lower power usage than an alternative one that used the same amount of user-tolerance budget but used sha256, then it wouldn't be a win.
19:36 < petertodd> gmaxwell: that's the thing though, your random access related hardware needs to run at full power and high speed, so the rest of the system may not be a big difference in terms of power
19:36 < petertodd> gmaxwell: of course, down the road FRAM tech could blow all these assumptions out of the water too
19:38 < gmaxwell> petertodd: well one way of looking at it once thinking about energy— given commodity hardware is actually often made using state of the art technology, the task is to make the most use of the hardware the user has— so how can you make commodity hardware use the most energy possible. Grinding against dram is _not_ the way to do that.
19:39 < gmaxwell> on a desktop PC, sitting in a tight inner loop on the SIMD registers in the cpu is.
19:39 < petertodd> gmaxwell: Are you sure about that? Because I'm not.
19:40 < maaku> what gmaxwell just said is definately, 100% true
19:40 < maaku> waiting on the ram bus idles the CPU
19:40 < petertodd> gmaxwell: that *might* be true if the SIMD registers were power limited, but that's not at all a given
19:40 < petertodd> maaku: ram uses power to access
19:41 < maaku> very, very little power by comparison
19:41 < gmaxwell> it does but the power distribution is just not compariable. you're talking about 20w vs 100 watts.
19:41 < petertodd> gmaxwell: the problem is if you're algorithm ever winds up not being power limited, then someone can go build an ASIC for it
19:41 < gmaxwell> petertodd: yea sure, the assumption I started out with is that the commidity hardware is already an efficient implementation of everything it does, which isn't true... indeed.
19:42 < gmaxwell> But to the extent its true the KDF problem is just a matter of using all thats available... use the most gates the most power.. etc. make the most use.
19:42 < petertodd> gmaxwell: yeah, and that's an enormous problem. I'm sure you can make a PoW/KDF algorithm that targets *a* cpu family and uses it 100%, but that's not very interesting - you'd just as easily use a KDF with very tunable params to economically stay ahead of attackers with ASIC-dev costs
19:42 < poggy> is it plausable that energy costs would be a limiting factor or is this just a mental exercise?
19:43 < petertodd> poggy: it's plausible, but not certain
19:43 < gmaxwell> And if power costs dominate, you probably don't want to touch the memory at all... because the computer cannot burn 100watts accessing ram. and even if you imagine that its ALU is 4x less efficient than an optimized one, it probably still can burn more efficiency-weighed-power in the ALU.
19:44 < gmaxwell> poggy: it is the case that for computation tasks operating longer than a few months energy is more expensive than fabrication with modern processes.
19:44 < poggy> ah ok
19:44 < gmaxwell> How that balances out exactly is another question.
19:44 < gmaxwell> You have to run for a long time before fabrication is negligible.
19:45 < petertodd> gmaxwell: but look at the system as a whole: quite possible the lower power density of ram per unit area is irrelevant on a system wide level because you can't necessarily remove head from higher density effectively anyway
19:45 < petertodd> s/head/heat/
19:45 < gmaxwell> that works against the user, not against the attacker.
19:46 < petertodd> no, it works against the attacker for PoW for sure: removing low-grade heat cheaply with fans costs very little. For KDFs, that's less certain.
19:47 < petertodd> remember for PoW one argument for decentralization is that the heat can be useful - of course if someone comes out with a ASIC process that can run at hotter temps...
19:47 < gmaxwell> oh I misunderstood what you were saying, I was not arguing in terms of unit area.
19:47 < gmaxwell> I'm arguing in terms of whats actually in a PC.
19:47 < poggy> are there any hybrid functions?
19:47 < gmaxwell> e.g. how much attacker joules can 1 second of a PC possibly require.
19:48 < poggy> requiring both memory and gpu(or whatever)
19:48 < gmaxwell> And I believe that large memory memory hard requires fewer attacker joules simply because the PC only has— say— 20 watts of ram.
19:48 < petertodd> gmaxwell: ah, well that's a better argument come to think of it.
19:49 < petertodd> gmaxwell: although like I say, for the KDF use-case you might as well just make a whole bunch of them, targetted to specific cpu's
19:49 < petertodd> gmaxwell: use whatever one the user happens to have
19:50 < petertodd> gmaxwell: and... might as well make it memory-hard too, and get that extra 20W
19:52 < gmaxwell> petertodd: the premise under scrypt is really two fold: memory technology is uniform decreasing the attackers advantage, and that computers have a lot of gates as memory. Counter arguments are that once you are talking about power memory for a cracker might not be as uniform as thought, and when talking about power a computer doesn't actually have that much memory.
19:54 < petertodd> gmaxwell: yes, I'm not claiming that's a good premise, I'm claiming that in the case of a KDF *because* you have so much algorithm agility (make it a library!) optimal is to use a per-cpu-arch algorithm like your suggestion and make it depend on memory as well
19:54 < petertodd> gmaxwell: which means the salsa20 core in scrypt probably should be replaced by a series of algorithms that do well on simd
19:55 < petertodd> gmaxwell: dunno if you noticed but I did change my mind there :P
19:55 < gmaxwell> yea, sure, I don't think anything I've argued suggests that using memory is terrible, just that it may not be as automatically good as it seemed.
19:55 < petertodd> anyway, my *main* argument is that neither of us knows much about digital logic technology so...
19:56 < gmaxwell> sort of troubling that it seems no one has explored the energy cost angle on this. :-/
19:56 < gmaxwell> would be ironic if the recommended scrypt parameters lowered attack costs.
19:57 < petertodd> meh, wouldn't be the first time
20:46 < phantomcircuit> huh that's interesting
20:46 < phantomcircuit> gmaxwell, i think cex.io moved all their hardware
20:47 < gmaxwell> phantomcircuit: interesting!
22:38 < _ingsoc> Does anyone remember that guy who talked about rewarding miners for putting energy into the grid?
22:45 < justanotheruser> _ingsoc: seems interesting, but not sure how you can prove you put energy into the grid
22:47 < _ingsoc> That was the problem if I remember correctly, needed some physical layer.
--- Log closed Thu Jan 23 00:00:53 2014