This took about an hour to download, an hour to 7z|cut|xxd, and about 40 minutes to bake. At complete, I have an on-disk artefact in kdb's native format. I can load it:
q)hibp:get`:hibp; / this mmaps the artefact almost instantly
and I can try to query it:
q)\t:1000 {x~hibp[hibp bin x]} .Q.sha1 "1234567890"
5
Now that's 1000 runs taking sum 5msec, or 5µsec average lookup time! It's entirely possible my MacBook Air is substantially faster than the authors' machine, but I think being ten times slower than an "interpreted language" suggests there's a lot of room to improve!
Challenge accepted!
I used the following in q to download and load the data into a disk object I could mmap quickly:
This took about an hour to download, an hour to 7z|cut|xxd, and about 40 minutes to bake. At complete, I have an on-disk artefact in kdb's native format. I can load it: and I can try to query it: Now that's 1000 runs taking sum 5msec, or 5µsec average lookup time! It's entirely possible my MacBook Air is substantially faster than the authors' machine, but I think being ten times slower than an "interpreted language" suggests there's a lot of room to improve!