Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mutually Assured Recursion (kylehovey.github.io)
25 points by speleo on Sept 24, 2023 | hide | past | favorite | 4 comments


> The researchers were able to perform text classification using only text compression and a clustering algorithm (kNN).

Wasnt this result disproved, because the positive results were due to a bad kNN implementation? I recall reading something like this but can't recover the exact post/article...


Interesting, I'd love to see a link to that if you know of it. Here's the original paper: https://aclanthology.org/2023.findings-acl.426.pdf In my own work I've successfully classified emergent behavior in Cellular Automata using a similar technique, and the technique has also been used elsewhere with success: https://www.nature.com/articles/s41598-022-12826-w


This took me an unreasonable amount of time to find, but here it is

https://kenschutte.com/gzip-knn-paper2/

The moral: the methodology is cool, but implementation details matter, i guess...


Thank you for this, I appreciate it! That's unfortunate to hear. I may have to swap out the example I used in this article, and maybe also include a note that this technique has limitations. I think that using compression/Kolmogorov complexity metrics for classification is a fruitful endeavor and that the philosophy of groups like the Hutter Prize are sound, but the kNN + gzip example looks like it has some problems with it.

For anyone else following along, I think the GitHub Issue discussion on the paper's repo is really interesting: https://github.com/bazingagin/npc_gzip/issues/3




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: