Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like the "Data Fixes" section could introduce bias. "Select the right data" reminds me of researchers cherry-picking data for papers.


The purpose of that step is exactly the opposite: to use data that represents (as much as possible) the full generality of the space you seek to model. And on a more practical level, there are datasets that can't be used for commercial purposes but are useful for research (a well known but not the best example would be ImageNet).


"Data Fixes" might also correct for bias. Data often contains errors and inconsistencies, especially if it's the output of some other automation.

"Select the right data" just means don't try to do something silly like predict 2021 housing prices in Ann Arbor using historical data for Pittsburgh from 1980-2007.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: