Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Hard to beat low level custom coded solutions.

Challenge accepted!

I used the following in q to download and load the data into a disk object I could mmap quickly:

    \wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v5.7z
    \7z -so e pwned-passwords-sha1-ordered-by-hash-v5.7z pwned-passwords-sha1-ordered-by-hash-v5.txt | cut -c1-40 | xxd -r -p > hibp.input
    `:hibp 1: `s#0N 20#read1 `:hibp.input
This took about an hour to download, an hour to 7z|cut|xxd, and about 40 minutes to bake. At complete, I have an on-disk artefact in kdb's native format. I can load it:

    q)hibp:get`:hibp; / this mmaps the artefact almost instantly
and I can try to query it:

    q)\t:1000 {x~hibp[hibp bin x]} .Q.sha1 "1234567890"
    5
Now that's 1000 runs taking sum 5msec, or 5µsec average lookup time! It's entirely possible my MacBook Air is substantially faster than the authors' machine, but I think being ten times slower than an "interpreted language" suggests there's a lot of room to improve!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: