Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but his remarks suggests that hamming distance is not very nice for pictures after his transformation (d(i1, i2) == 5 -> small, d(i1, i2) == 10 -> big). As I am not familiar with image signals, I don't know if this can be somehow mitigated with other techniques. Of course, the only way to be sure would be to actually test it, but I would not bet much on it.

Another issue I forgot to mention is that each "point" (image ) is a k-dimensional vector. Organizing those so that to easily retrieve points which are close to each other quickly becomes intractable. Conventional techniques like kd-trees quickly break down when k is above a few units, because of the curse of dimensionality. This introduces a "non-linearity" in your distance which makes comparing points in a k-dimension space quite different than in 2/3 dimension spaces.



In the case of Hamming distance there is much more information to work with than in a bare metric. You can dump all the bitstrings into a suffix array, break the query into K pieces and search for an exact match for each piece. The resulting list of matches is usually small enough to just compute the Hamming distance against each element. I've used this on a corpus of 2m LaTeX strings withbgood results. See my above post for more details.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: