I agree that not enough benchmarking (comparision between different codes to check that they get compatible results), verification (checking that the code produces results consistent with the underlying model) and validation (checking that the code reproduces nature to a sufficient degree to be useful) it done.
I have actually written papers that try do to a bit of that and it was annoyingly hard to publish them. But unlike you and a lot of other posters here on HN I don't think the reason for insufficient validation is that we scientists are too stupid or too lazy. On the contrary, we would love to do more of it. But there is next to no funding to do it.
> On the contrary, we would love to do more of it. But there is next to no funding to do it.
Perhaps this is the crux of the disagreement then. I don't need extra funding to do what I should be doing in the first place. It can be a big time sink in some cases (e.g., I probably spent 6+ months compiling existing data during my PhD), but often is not. It will probably make your model look worse than you'd like, but if everyone did it then science would progress faster.
Kinetic plasma simulations in space plasma and astrophyscial scenarios. I have seen that journal and I think it is a step in the correct direction.
Regarding the funding: you kind DO need that extra funding. because if you don't get funding for that you get out-competed by the other research who skips that step and the next postdoc position or grant goes to that researcher who wrote two more papers instead of testing code that ultimately didn't find any major bugs.
> you kind DO need that extra funding. because if you don't get funding for that you get out-competed [...]
This seems to be a common view, but I don't think it's always true. In my own work, I've found that being careful has led me to write more papers, not less. Particularly given that I've found errors in previous work that I intend to publish or already have published. There's a delay involved (i.e., it takes time to find the problems) but I think I'm coming out ahead in total publications in the long run.
My knee-jerk response is to say this is wrong. I have seen too many cases where people report initial results on some elaborate model as if they explain more aspects of the data, but then the results fail to generalize, and the theory, model or implementation turns out to be flawed or fragile. It gets a publication, but the state of the art does not advance.
But I notice that some younger investigators I have worked with tend to be more careful, with things like releasing code/data openly, being serious about code/model reuse, doing more careful verification/validation, being more rigorous about test/training data. It seems to be a spectrum of better research practices rather than one thing, like “better software engineering“.
Such a more principled approach should prove itself over time, because Nature cannot be fooled.
Yes, "being careful" refers to a variety of different techniques, not just better software engineering. What has ended up being publishable in my case wasn't related to better software engineering, rather, being more careful about model derivations and more rigorous about the available data. But it could have been software improvements.
To be clear, I don't think everyone needs to be as careful as I am. A few researchers in a particular subfield spotting errors can have large benefits for everyone. And those few may catch the vast majority of the errors. There are diminishing returns to adding more people focused on being rigorous.
This is a very important point. Your budget of "being careful" might be best spent on better software engineering. But it is also possible that you should rather spend it on other things such as better mathematical model, better input data sets or something else. And a (sub)field prospers most if different groups and researchers spent there (very limited and expensive) time on different things. Trying to shame everybody into following the magic 27 rules of software engineering is a step back, not forward.
Right. You do not always get outcompeted. But you run the risk. And yes, having good architecture in parts you need to change often is someting that pays of. As does having tests in section of the code that is brittle, hard to reason about or historically buggy. As does writing documentation on things that you had to spend an annoying amount of time on to understand. And I think that most researchers understand that and invest that time.
The thing is: This does not mean that either of us puts a lot of effort into having nice, good, modular architecture in parts we know that we will never change. Or write a lot of code coverage in module we understand very well and can reason about using analytic calculations. Or write a lot of documentation for things that are obvious to us (be it through familiarity or whatever reason).
The next researcher however might want to change exactly that part we never want to change. Or might the analytic check we always used to make sure to code was not going of the deep end is not valid for his work. And he is near certainly going to need documentation on other spots of the code (and find the documentation we wrote and needed useless, because that topic is something glaringly obvious to him).
There is usually no advantage for us to make the code nice for that researcher. And THAT is very scientific software gets the bad name from. Because we have to make the trade-off, what is useful to us (and possibly our time). If we try to make the code nice and friendly for every potential user under the sun we usually DO get outcompeted.
I think I misunderstood your position earlier. My experience with academic codes seems to differ significantly from yours. It is not uncommon at all for scientific software to have no documentation at all in my experience, and very little if anything in terms of tests. I agree with you that it does not make sense to have detailed documentation and tests for all parts of scientific software. One must prioritize. What I'm arguing against is the (often implicit) attitude that documentation, testing, and other good software engineering practices are not necessary in science at all. I see now that you agree, and it's more of a question of the amount of software engineering practices which are optimal. I appear to prefer more than you do, but you clearly see the value of these practices. Correct me if I'm wrong.
You are not wrong at all. And the amount of test coverage and documentation on needs depends a lot on the field, the numerical methods, the indended users, the type of the code (simulation vs analysis vs plotting) and so on.
The thing where I seem to have a totally different view from most of HN is that I think that we make valid tradeoffs, and that undocumented function and tests that only cover 40% of the code base is a perfectly fine state of affairs, instead of being caused by scientists that are too stupif for basic software engineering.
What a lot of software people also do not get at all is that variables such as x, v and a might be perfectly descriptive variable names.
You are making some good points here. It is expensive and exacting work to, for instance, set up validation experiments, or to quantify errors exactly enough so that whatever errors that do appear are explained by known effects. Agencies don’t like to pay for this.
One forcing function can be if high-stakes decisions rest on the conclusions from the code. In the geosciences, you start to see some of this care for air quality measurements and for climate science. (One relevant journal here is SIAM J. UQ., https://www.siam.org/Publications/Journals/SIAM-ASA-Journal-...)
The lab where I work has spent a lot of resources on validating thermal and fluid dynamical codes for atmospheric entry of Mars landers — an engineering example rather than science. They obviously are very motivated to avoid an unpleasant surprise from a model failure.
I have actually written papers that try do to a bit of that and it was annoyingly hard to publish them. But unlike you and a lot of other posters here on HN I don't think the reason for insufficient validation is that we scientists are too stupid or too lazy. On the contrary, we would love to do more of it. But there is next to no funding to do it.