00:16:54gmaxwell:andytoshi: sipa:
00:17:07yoleaux:New HD wallet that tolerates leakage of some child private keys
00:17:26gmaxwell:A BIP-32 like public derrivation construction which is somewhat robust to private key reveal.
00:17:33kanzure:ah yes i was reading this the other day, http://diyhpl.us/~bryan/papers2/bitcoin/Hierarchical%20deterministic%20Bitcoin%20wallets%20that%20can%20tolerate%20key%20leakage.pdf
00:18:36gmaxwell:The notion is simple, run N pubkey chains in parallel for each P_n choose N random scalars X, and compute the pubkey as sum(P_n*X_n).
00:18:57op_mul:I don't like that the "discovery" is credited to vitalik.
00:18:57gmaxwell:of course you only get robustness which is linear in N, so your pubkeys become large fast, which is lameo.
00:19:26gmaxwell:op_mul: yes, thats bullshit and actually obnoxious. You see thats the first thing in my response. The initial text I typed was outright rude.
00:19:48gmaxwell:Their paper calls it "folklore". :-/
00:20:04op_mul:gmaxwell: well, just think of where we would be without vitalik pointing out our flawed crypto.
00:23:06gmaxwell:andytoshi: your n-show signature is basically the same construction, is it not?
00:25:24gmaxwell:kanzure: it's kinda frustrating, because it's neat but I think not good enough to be useful. :(
00:25:52gmaxwell:or at least I can't come up with a use for it.
00:26:13andytoshi:gmaxwell: cool, iirc this did occur to me, but i don't think i ever brought it up because of the O(N) scaling in the security parameter. what i described to you was a selectively repudiable signature, which is a very different beast
00:26:50gmaxwell:andytoshi: oh I thought you'd done an N-show previously.
00:27:09andytoshi::} i honestly don't remember
00:27:23gmaxwell:Yea. ... the O-n makes it .. not very interesting.. if it was log-n I'd probably be starting on bip3232 as we speak.
00:27:56andytoshi:i think i did such an n-show when we were playing with the output-size anonymity, but it was a one-off comment that we dismissed (because of scaling) and it didn't quite do what we were trying at the time
00:29:19sipa:gmaxwell: i've known about that for a while i think
00:29:53gmaxwell:I don't think I'd ever thought of this.
00:30:31gmaxwell:It's obvious enough once pointed out at least. But still seems not useful. :(
00:32:03kanzure:could there be a structure where the public keys are deterministic in the child derivation sense, but the private keys are unspecified
00:32:39kanzure:er, allowing for perhaps another input into the public key child derivation function prehaps
00:32:51gmaxwell:kanzure: ask another way?
00:33:14kanzure:well, the public key properties are nice, and keeping them tightly coupled to a deterministic private key generation scheme may not be necessary
00:33:31andytoshi:kanzure: not with ECDSA, with ecdsa the discrete log of the public key is always sufficient to form a signature
00:33:52andytoshi:kanzure: so you can play tricks with the KDF, but an attacker need not use your KDF; if the public key is deterministic then the private key will be as well
00:34:12kanzure:instead of "enumerate all possible child public keys" perhaps you could just do things like "ask whether or not this is a valid child key of this known public key"
00:34:53andytoshi:without thinking too hard, i'd bet you need pairing-based crypto to do something like that
00:34:55kanzure:(i know vanity grinding sounds like the answer there, but perhaps there's something better)
00:35:32gmaxwell:The only way we're able to generate public keys is because of a hormorphism of the cryptosystem. That kind of testing must generally not work because matches must be cryptographically rare or you could just find out other people's keys were Nth children of your own. :P
00:36:01gmaxwell:Yea, sure, in paring you can go one more dimension deep, and use an arbritary string as a pubkey.
00:38:53kanzure:why is it important that the children are derivable from each pubkey anyway? why not have a separate data structure unrelated to key derivation that transmits that information.
00:39:44gmaxwell:kanzure: I don't understand your question. In BIP32 the children are NOT derivable without additional information (the chaining code).
00:40:44gmaxwell:The goal of the public derrivation scheme is so you can configure your webserver with a small constant amount of information and it can generate an unbounded number of keys for you, without itself knowing the private keys.
00:42:49kanzure:in an accounting sense, you could store all public keys that you ever intend to use for a project (you can generate many millions of keys upfront if you really want to) which can easily be more than enough for many purposes
00:43:59andytoshi:kanzure: if that's an acceptable cost you can use BIP32 hardened keys, which avoid the need for tons of fresh randomness and don't have any of the problems we're talking about with key leakage
00:44:10kanzure:sure, certainly
00:44:21gmaxwell:Yep. But it can still run out. And then your server stops working, ::surprise:: For those willing to do that, what andytoshi said. :)
00:45:10gmaxwell:A 33megabyte key file is kinda annoying. Esp when people are like "oh I'll just use a single static address instead."
01:12:53Guest90723:Guest90723 is now known as gavinandresen
01:17:15op_mul:wow. consumer software.
01:17:22op_mul:ugh. wrong channel.
01:17:26op_mul:sorry folks.
02:05:57Sub|zzz:Sub|zzz is now known as SubCreative
02:17:04Pan0ram1x:Pan0ram1x is now known as Guest22299
03:29:24copumpkin:copumpkin is now known as drunkplatypus
03:30:17drunkplatypus:drunkplatypus is now known as copumpkin
03:34:19blockbits:are 20mb blocks a good idea?
03:34:35guest213123123:wy=hy not?
03:35:12blockbits:lets say i'm a startup that lets people serialize data, and put them in the blockchain as an expensive db/data storage
03:35:38justanotheruser:blockbits: 1) hardfork 2) doesn't solve the scaling problem, it only delays it 3) there are alternatives
03:35:55blockbits:i use some scriptsig tricks to make large transactions that contain data. i then bribe mining poools to mine my tx's for 0/low fees
03:36:11blockbits:now we have mega bloat.
03:36:18blockbits:terrible attack vector
03:36:40justanotheruser:why would you bribe mining pools to take your tx for no fees? The fee is a bribe in itself
03:37:12blockbits:the pools will take my bribe which costs me less the tx fees
03:37:36blockbits:i mean, we all know that mining in centralized now. blockstream will have to pay/bribe pools to mine side chains they like, and ignore ones they dont
03:37:43guest213123123:justanotheruser: why is hardforking bad?
03:38:11justanotheruser:This discussion probably belongs in #bitcoin
03:38:15blockbits:so really there's an easy attack vector here. governments can just pay mining pools to mine mega-blocks and bloat the network
03:40:25kanzure:that has nothing to do with governments
03:41:29blockbits:point is, do you guys want to see lots of tx's like this? http://webbtc.com/tx/e6ff83a41715a87dad0b181febfaee2845e2a4334c3ce8bcb6ec697a6cfed5ed
03:41:38blockbits:thats 7000 bytes in a single transaction. mined a few weeks ago
03:43:00guest213123123:how is this 'attack vector' more dangerous at 20 MB than at 1 MB?
03:43:38blockbits:i make 1000 of those per block and then a blockchain-as-a-data-storage service for random files, I can easily find enough people who would pay... we'd be seeing 7MB blocks consistently
03:45:13guest213123123:not sure it's any less likely, or dangerous, or expensive at 1 MB
03:45:38blockbits:We can add 2.8GB to the blockchain PER DAY with 20mb blocks
03:46:02blockbits:thats 1TB per year
03:46:32blockbits:we'll see less full nodes. we simply don't need 20MB blocks right now. all it does is open up the network for spam and data storage
03:47:14blockbits:plus, you can use off-chain micropayment channels to do tx's, and then consolidate into transactions on-chain when you need to
03:47:19guest213123123:what decides when we need it?
03:47:23blockbits:why do people think every tx that ever occurs needs to be on chain
03:54:25nsh:andytoshi / gmaxwell, what's the center of secp256k1 over the plane? i mean for some generic set-compatible definition e.g. you can trisect with the same number of points on each side
03:54:40nsh:does this point have any meaning, utility?
03:57:58nsh:could you not parameterize a space-filling curve starting at this central point, which is informed/train by generated/obtained curve points to improve likelihood of stumbling on new ones
03:59:07nsh:probably cloud-talk
03:59:41todays_tomorrow:todays_tomorrow has left #bitcoin-wizards
04:09:47op_mul:blockbits: it also increases the data rate to almost something which would saurate most ADSL uplinks.
04:10:43blockbits:op_mul: you don't say.... ;) looks like we're heading to an even more centralized network
04:11:41op_mul:well I wouldn't say that just yet.
04:12:06op_mul:but it does concern me that people piss away even more privacy with SPV.
04:12:35blockbits:op_mul: you can thank mike hearn for that
04:13:45op_mul:it's a double edged sword. it's nice that lite clients exist, but BIP37 was a totally bum choice in terms of privacy.
04:14:26op_mul:would have been nice to get it right the first time, because there's inertia behind whichever comes first.
04:14:35justanotheruser:are there any proposals for more private SPV clients that isn't a security through obscurity or DoS approach like asking full nodes for a ton of addresses data that you don't need?
04:15:12op_mul:well BIP37 allows a false positive rate. it's just it doesn't do anything useful and all clients have it disabled.
04:16:01op_mul:there's a paper talking about how useless it is, just by looking at the false positives and finding out how plausible they are. an SPV wallet pinging a satoshi dice transaction? nope that's not going to happen, so forth.
04:17:16op_mul:phantomcircuit was talking about a better way of doing it. you have a new consensus rule that means you add a bloom filter of all the pay to pubkey hash transactions into the block. SPV clients can download the filter, match locally, and then download full blocks if they are interested in it.
04:17:29justanotheruser:op_mul: Don't you think we should go in the direction of the payer giving you the SPV proof?
04:18:30op_mul:there's some nice things that come of that. first of all being wallets can rescan at any time without having to squirt out new filters. remote nodes learn little about what you are interested in, and they can no longer lie to you by omission.
04:21:03op_mul:justanotheruser: that's possible too, though I don't know how that works in the real world
04:24:24justanotheruser:well it only works if you have two way communication
04:27:09op_mul:horray. bring back pay to IP transactiond :D
04:40:19justanotheruser:lets also have pay to public key
04:43:14op_mul:real retro feel
06:12:28Sub|afk:Sub|afk is now known as SubCreative
06:20:42gmaxwell:* gmaxwell gives general, unrelated to anything in particular, advice to not feed trolls.
06:22:35gmaxwell:nsh: it's on a finite field. every point is equally the center.
07:36:59lclc_bnc:lclc_bnc is now known as lclc
08:01:26lclc:lclc is now known as lclc_bnc
08:44:24lclc_bnc:lclc_bnc is now known as lclc
09:05:14orwell.freenode.net:topic is: This channel is not about short-term Bitcoin development | http://bitcoin.ninja/ | This channel is logged. | For logs and more information, visit http://bitcoin.ninja
09:05:14orwell.freenode.net:Users on #bitcoin-wizards: andy-logbot pgokeeffe soundx shesek tacotime austeritysucks user7779078 bendavenport thrasher` cbeams coiner licnep HaltingState Emcy aburan28 TheSeven SubCreative RoboTeddy NewLiberty roconnor ryanxcharles Guest22299 DoctorBTC skyraider dgenr8 waxwing adlai Graftec mbelshe BrainOverfl0w yoleaux burcin Transisto [d__d] gavinandresen danneu catcow TD-Linux ryan-c smooth Alanius AdrianG tromp wizkid057 Fistful_of_Coins pigeons Dyaheon- Cory
09:05:14orwell.freenode.net:Users on #bitcoin-wizards: jbenet justanotheruser livegnik bbrittain MRL-Relay epscy atgreen lechuga_ pi07r sadgit throughnothing poggy hashtag samson_ JonTitor br4n e1782d11df4c9914 cluckj op_mul nsh iddo jaekwon PaulCapestany d1ggy_ c0rw1n hollandais Guest8623 Starduster Tjopper nuke1989 LarsLarsen Muis kumavis michagogo artifexd lnovy cfields btc__ rasengan gavink hguux_ coutts Hunger-- Meeh yrashk NikolaiToryzin a5m0 adam3us jgarzik Keefe berndj fluffypony morcos
09:05:14orwell.freenode.net:Users on #bitcoin-wizards: spinza CodeShark bosma Logicwax maaku copumpkin stonecoldpat sl01 optimator wiz null_radix ebfull roasbeef_ hktud0 luny BigBitz [\\\] mappum gnusha _Iriez otoburb mr_burdell s1w go1111111 HM2 Apocalyptic sdaftuar btcdrak CryptOprah jcorgan forrestv Luke-Jr tromp_ PRab Greed lclc harrow @gmaxwell isis brand0 eric Krellan @ChanServ jaromil petertodd helo v3Rve midnightmagic espes__ butters nickler ahmed_ warren fenn phantomcircuit Graet kanzure
09:05:14orwell.freenode.net:Users on #bitcoin-wizards: bobke BananaLotus coryfields BlueMatt Anduck Eliel nanotube gwillen wumpus dansmith_btc heath EasyAt starsoccer asoltys_ K1773R comboy andytoshi warptangent Guest38445 kinlo azariah sneak sipa Taek crescendo so phedny
13:03:00gavink:anyone up for helping with a little experiment? just trying to do visualization of the tree of transaction from the bitstamp account, to see in real time as the money flows. I've got https://www.npmjs.com/package/bitcoin-tx-graph-visualizer running, but trying to figure out the best way to feed the raw hex transaction data
13:03:38gavink:or maybe you know of a good existing site that does this, I did see some visualization code, but it required a lot of dependencies, and server necessities
13:04:25op_mul:what do you expect to gain exactly? this is more for #bitcoin, lets chat about it there.
13:06:23gavink:apologies wrong forum cheers
13:29:38nsh:hm, k
17:29:12morcos:gmaxwell: still trying to understand your 'provable hash' techniques from yesterday on bitcion-dev, why doesn't the second method need pairing crypto? and what do you mean a cryptobreak just gives you sidechannel, what else could it give you?
17:35:36gmaxwell:morcos: Pairing cryptography is required because there is no known (to me, or I believe anyone) construct for a small unique signature using just a plain DH group, e.g. you cannot use ecdsa for this. As far as what else? it could (but can't in this scheme) let you construct arbritary hash collisions, which would be pretty bad since you're encoding a scriptpubkey there.
17:42:11morcos:are you getting rid of all the publishing channels somehow? not just OP_RETURN?
17:43:35gmaxwell:morcos: this could be used for all outputs.
17:45:07gmaxwell:Scriptsig is strictly less concerning, because a full verifying node technically does not need to _store_ anything for a signature.
17:49:47Pan0ram1x:Pan0ram1x is now known as Guest74209
19:01:12phantomcircuit:the fuck does that even mean
19:01:40op_mul:what's interesting is that they claim they can't recover funds send to old deposit addresses.
19:02:06op_mul:you'd think they could try racing them if they still had control.
19:02:26phantomcircuit:they absolutely can
19:02:36op_mul:they're not though.
19:05:17gmaxwell:why do you think they can? I had the impression they lost physical control of the hosts.
19:05:44gmaxwell:(not that I actually know anything material)
19:05:47phantomcircuit:gmaxwell, i would expect they have a copy of the private keys somewhere
19:08:26op_mul:gmaxwell: I'm not sure about that. the theft took many hours. if they'd lost control of the servers it doesn't seem likely they would have stuck around waiting.
19:09:58gmaxwell:I can't come up with any other reason why they wouldn't try racing for funds.
19:10:32gmaxwell:In particular they could probably get a couple miners to otherwise block spends of those addresses, so they'd frequently win.
19:10:32phantomcircuit:gmaxwell, failure to understand that they can?
19:10:35phantomcircuit:who knows
19:14:35yoleaux:We are fully rebuilding our systems from the ground up so that customers can use @Bitstamp with full confidence and trust. (@nejc_kodric)
19:16:28phantomcircuit:i hope he just means they're rebuilding the deployment system and not the code...
19:17:03kanzure:nah it'll be ready by lunch what's so hard about it
19:18:54kanzure:maybe they had a second implementation in progress from their last round of funding or something
19:24:28op_mul:that sounds very dangerous. rushing something unfinished into production could cause them even more problems. they commented that they had their dup system running with dummy data, I doubt it's a different one than before. the delay could just be them trying to still find the hole.
19:25:36kanzure:oh right, maybe they didn't have enough logging to find the hole, that would be awesome
19:27:00gmaxwell:I don't understand why bitstamp btc didn't skyrocket in value with a compromise at play though.
19:27:32gmaxwell:oh blonde moment, nevermind, if they were draining the wallet directly they wouldn't need to trade.
19:28:08phantomcircuit:gmaxwell, and if they had access to both they could have been selling swaps and making a profit on the price drop
19:28:23op_mul:even the outgoing transactions to the theft address are weird. thy don't even take the entirety of the outputs.
19:28:45op_mul:some pay absolutely nuts fees (1BTC)
19:28:49phantomcircuit:someone trying to be clever by adding larger fees
19:28:54gmaxwell:op_mul: did they just ignore dust?
19:29:20op_mul:gmaxwell: no, they made more. like they spend 500 BTC of outputs, and make a 0.1 BTC change
19:29:32phantomcircuit:so then it was done manually
19:30:47op_mul:in that case, yeah sure it makes sense. they stole 3100 and left a little change
19:32:05op_mul:the change address is a known bitstamp one first used on december 30.
19:32:56gmaxwell:well you probably end up with a fee error just trying to send it all using a normal wallet... so you hurry up and lower the amount by 0.1 btc
19:33:14luny`:luny` is now known as luny
19:33:18op_mul:here they take a totally odd number though, but again leave about 0.01 BTC to change
19:34:31op_mul:gmaxwell: I don't think it was a bitcoin core wallet. core doesn't reuse change addresses, this does.
19:35:06gmaxwell:it'll be halarious when its realized the theif put the funds in bc.i and bc.i was able to claw them all back.
19:36:00phantomcircuit:op_mul, iirc someone said bitstamp managed to push ~3k coins into cold storage
19:36:06phantomcircuit:but maybe that's nonsense
19:38:07op_mul:phantomcircuit: I don't think so. 23bff6715c450f2af6f56a42862ac5006eb6037fbee549c5820ddaf2afca7e5d was a transaction going to the bitstamp hot wallet which got swept by the theif again (0.15 BTC fee). I've got about 10 transactions in a row that the theif spent.
19:41:30op_mul:there's a couple of spends to other addresses which I can't identify, that could be it. there's maybe 10-20 BTC.
19:42:39op_mul:phantomcircuit: oh interesting, they did catch one or two
19:43:19op_mul:bb52c6c8041c1110bceb6a2afa5d387c1a180e833a435fea81da8c4c2ef34964 spends from the hot wallet to 1JoktQJhCzuCQkt3GnQ8Xddcq4mUgNyXEa, which is the "cold" address they have.
19:43:35op_mul:that's 1 BTC though.
19:44:51phantomcircuit:there seems to be a risk mitigation problem with exchanges
19:45:02phantomcircuit:anybody running an exchange is inherently willing to take on substantial risk
19:45:10phantomcircuit:seems a bit axiomatic
19:47:22op_mul:phantomcircuit: oh found it, yes bitstamp *did* rescue some!
19:47:35op_mul:phantomcircuit: 2700 BTC.
19:49:47lclc:lclc is now known as lclc_bnc
19:51:33samson2:samson2 is now known as samson_
19:51:45kanzure:they should generate unsigned transactions every morning (or whenever) to safely transition BTC to backup wallets
19:51:48kanzure:and then in an emergency sign those
19:52:24phantomcircuit:kanzure, they should generate and sign those transactions and simply not broadcast them
19:52:29kanzure:the downside is that the backup wallets have to be even better protected (and if your private keys were offline in the first place, then you have a bigger problem on your hands i guess)
19:52:35kanzure:not broadcast them?
19:52:37kanzure:oh right
19:52:41phantomcircuit:"big red button" sends them
19:53:01phantomcircuit:but there's a huge amount of security engineering that can be done
19:53:05kanzure:could also be a dead man's switch
19:53:10phantomcircuit:it's not clear where the line is on what makes sense
19:53:26kanzure:they called it "an excess of caution" but i'd rather go with "an abundance of caution"
19:53:49op_mul:having 18,000 BTC in your hot wallet is sort of nuts
19:54:00kanzure:do they have 18k BTC/hour turnover or something?
19:55:13phantomcircuit:op_mul, that's hard to judge without daily average turn over numbers
19:55:29phantomcircuit:it's entirely plausible that they turn over 10% of their total per day
19:55:46op_mul:phantomcircuit: I can tell you looking at this that they don't
19:55:56phantomcircuit:i suspect they turn over their bank account something like every 5-10 business days
19:56:18phantomcircuit:op_mul, you have a guess at their daily average net turn over?
19:57:34op_mul:I don't think so
19:57:50op_mul:just going by the numbers though, they do a huge number of transactions but none of them are *huge*
19:58:40phantomcircuit:i see...
19:58:57op_mul:eh, actually there's some pretty big ones further down. 200 BTC out, 550 in, 200 out, 200 in
19:59:23op_mul:someone seems to like doing arbitrage with bitfinex.
19:59:42kanzure:bitfinex is also doing a single address? -_-
19:59:54op_mul:no? neither is bitstamp.
20:00:03kanzure:how are you identifying it then?
20:00:27kanzure:merges with what?
20:01:17kanzure:if they are not reusing addresses then i don't see how you would know they are bitfinex addresses
20:01:20kanzure:or bitstamp for that matter
20:01:22op_mul:you filter all transactions for coinjoin transactions. then you look at transactions which spend two unrelated outputs. you pair them together and call them A.
20:01:34phantomcircuit:kanzure, it's relatively easy to identify exchange transactions because it's a reasonable assumption that two outputs spent in a transaction merge identities
20:01:47phantomcircuit:but only in the limited case of exchanges who are clearly not doing coinjoin
20:02:15phantomcircuit:(which is currently... all of them)
20:02:29op_mul:all you need to do is find an entry point (usually someone talking about a withdrawl they made), and you can find the whole wallet just by branching back.
20:02:50kanzure:oh, because they are not being careful about coin selection and privacy leaks, i see
20:03:02op_mul:careful doesn't really help you
20:03:14op_mul:you're going to merge your outputs sooner or later, and then I've nailed you.
20:03:40phantomcircuit:kanzure, if you're not doing coinjoin and you're doing demand deposits/withdrawals
20:03:45phantomcircuit:then there is very little privacy
20:03:54phantomcircuit:(read: none)
20:03:57kanzure:there is very little privacy if you merge your outputs, sure
20:04:09op_mul:(dirty secret, everybody merges)
20:04:40phantomcircuit:i actually had a proposal to avoid this but it's potentially nasty (and is actually kind of obvious)
20:05:01phantomcircuit:simply require multiple addresses be provided for each transfer out
20:05:04op_mul:I suspect I've got most coinhashing addresses marked just due to your coinbase outputs.
20:05:11op_mul:cloudhashing, rather.
20:05:12phantomcircuit:then match outputs 1:1 to avoid merging
20:05:14kanzure:phantomcircuit: how would multiple outputs be useful there?
20:05:35phantomcircuit:op_mul, probably but be careful there's coinjoin stuff in there also
20:05:49op_mul:phantomcircuit: I pre-filter for coinjoin.
20:05:55phantomcircuit:iirc if you naively trace it shows we control something silly like 100k coins
20:06:09phantomcircuit:op_mul, er how though?
20:06:31phantomcircuit:kanzure, the recipient can link the outputs, but another observer cannot
20:06:51kanzure:the observer knows that you could sign for all those outputs
20:07:10phantomcircuit:kanzure, the outputs dont get merged.. at least not by you
20:07:16phantomcircuit:probably the recipient merges them
20:07:20op_mul:phantomcircuit: depends how you mean coinjoin. I can detect a coinjoin with lots of inputs and like sized outputs (and some secret sauce), but I can't detect a manual one. it's possible that for you in particular I've got incorrect results.
20:07:47kanzure:that is a fun project
20:07:55phantomcircuit:op_mul, then it's pretty likely you've got incorrect results since the inputs/outputs often include offsets to pay people
20:08:14op_mul:kanzure: it's not. I'm out of disk space.
20:08:29phantomcircuit:also that's one reason why i suggested using power of two outputs for coinjoin transactions
20:08:43phantomcircuit:but that has the obvious downside of potentially serious utxo bloat
20:08:43op_mul:none of this scales, and I wasn't paying much attention to it when I wrote it.
20:09:00phantomcircuit:op_mul, how much disk space are you using?
20:09:12op_mul:uh. it's horrible.
20:09:31op_mul:many hundreds of gigabytes.
20:11:26op_mul:it's actually got to the point as well that it's taking me longer to process blocks than they are coming in
20:11:36kanzure:op_mul: here have some more data http://archive.fart.website/archivebot/viewer/job/7i531 https://archive.org/download/archiveteam_archivebot_go_068/archiveteam_archivebot_go_068_archive.torrent
20:12:04kanzure:well, you should be using an asynchronous queue for your extra processing, and then just scale parallelism for that asynchronous queue of extra stuff you're intending to do
20:12:23op_mul:kanzure: I already have bitcointalk, it's updated in real time from the "recent posts" view.
20:12:30kanzure:do you have diffs of old posts?
20:12:53op_mul:no. only new posts and posts which make edits when they're in the list of 100 newest.
20:13:12kanzure:hm, so you at least are not overriding old versions with new edits, then
20:13:21kanzure:except for recently-made posts
20:13:24op_mul:to have diffs would mean I need to crawl the whole site endlessly, and there's a 1 request / second / IP limiter.
20:13:39kanzure:that's troubling
20:14:01op_mul:it's fine once you're up to date.
20:14:10phantomcircuit:kanzure, 1 request/second makes sense for trolltalk
20:14:25fenn:there's only ~500k pages so 5 days to crawl
20:14:39phantomcircuit:iirc it only applies to non-static content
20:14:42fenn:however empirical data suggests its more like 2 months
20:15:21phantomcircuit:fenn, a naive scraper will end up with a bunch of different formats for the same content
20:15:26phantomcircuit:wap2 content and what not
20:15:45fenn:i have no idea what that is
20:16:30op_mul:fenn: er, it took a lot longer than that
20:17:02op_mul:there's close to a million topics alone
20:17:10op_mul:and some topics have tens of thousands of pages
20:17:46fenn:<@ivan-> we ran a crawl of it that started on 2014-04-03 and finished on 2014-06-21
20:18:05fenn:there may have been multiple nodes doing the crawl
20:19:49kanzure:oh that's the guy that notified me of the libgen guy's death the other day
20:31:42starsoccer:starsoccer is now known as Guest87507
20:32:25Guest87507:Guest87507 is now known as starsoccer
20:43:29rightonbro:rightonbro has left #bitcoin-wizards
21:00:31rightonbro:rightonbro has left #bitcoin-wizards
22:26:39rightonbro:rightonbro is now known as Guest18449