Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I recently came across of a similar flaw in the EEG classification experiments. I think most results should be taken with a grain of salt until comprehensively and irrefutably confirmed by independent teams.

https://news.ycombinator.com/item?id=26696546



This contamination of test data from the training data reminds me of "Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of Flaws and Benefits when Applying Over-sampling" [1] where almost 50% of the 24 peer-reviewed studies that use machine learning based on a particular publicly-available dataset, were claiming near-perfect accuracy at predicting the risk of pre-term birth for a patient, but were actually testing (accidentally) on training data.

[1]: https://arxiv.org/abs/2001.06296


Oversampling, then applying a train-test split? Jesus, that's like machine learning 101. But then again, I see a lot of questionable practices in the application of ML in biology.


Great find, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: