If there is one thing I have about HN it is how smug they are about software engineering. The leading reason why scientific software stinks (and a fair fraction does) is not because scientists and software engineeres in that field suck, but because there is very strong incentives AGAINST writing better software.
Remember: This is not the 17th javascript framework, but software for problems that we don't understand going in. And often enough we have not understood the problem all that well even after a decade when we write the third code. These codes are research codes. Ongoing experiments. The main goal is NOT to produce long-term maintainable software, but to produce scientific understanding and build intuition about the systems that are modeled. The code is just another tool among experiements, analytic calculations and back-of-the-envelope discussions on a white board.
Rewriting the code every 5 years is an insane proposition in software engineering, but completely ok in some fields of science.
Would I like better language support to check SI units for me? Sure. Would I like highly performant libraries for vector fields, that work with gcc 4.6 on a top 500 machine? Sure. Would I like to be allowed to spend time on fixing yeah-I-guess-it-works code? You bet.
But would I like HN to just up about "scientists just need to learn to code"? Oh hell yes! Because -- believe it or not -- we often DO know better. But fixing code is not what the taxpayers, what YOU, pay us for. We are paided to understand nature. And until you are willing to pay higher taxes and spend more money and science and to invest more into fixing long term infrastructure you really do not get to be so damn condescending.
Two thirds of the codes in my field are on github. And for most others you can get a copy if you ask politely by email. That said I would appreciate it if journals not only had author, title, date and affiliation in the meta data, but also a git url and commit ID.
Right now I'm nearly done a PhD in mechanical engineering. I've worked on several computational projects of varying levels of code quality from good (e.g., NIST's Fire Dynamics Simulator: https://github.com/firemodels/fds) to bad (won't give unambiguous examples for obvious reasons...).
I see clear incentives against uncertainty quantification and high quality model validation in my field. Those topics are treated superficially if at all, because people just want their model to look good, and if they do a rigorous job, that increases the chance that their model looks bad. As far as I'm concerned, if you don't do good quality model validation, you're not doing your job as a scientist.
A more specific example: Some codes are kept for a decade or longer if the software is sufficiently complex. One code I worked with was used by multiple graduate students over several PhDs and had very little if anything in terms of documentation. What did exist was far out of date. I spent a fair amount of time trying to understand and document the software, but other people treated me as if I was crazy for wanting to understand the black box I was provided. I don't think I ever actually understood that software fully despite the many hours I spent working on it. And I'm not certain that it provided correct results either. There were almost no tests, and the existing tests were hard to run and were consequently almost never run. The software produced several PhDs, though. The first few PhDs had it fairly easy. I had to pay their technical debt, and I couldn't pay it all back. It made me look bad as far as I'm concerned and did not save time in the long run. If anything, there's a long term incentive toward good software development practices.
Ultimately I think the solution would involve having minimum standard for scientific software quality, or else grant funding will be withdrawn. It will anger many people who have so far gotten away with writing low quality software, but beyond that I don't see many downsides.
I agree that not enough benchmarking (comparision between different codes to check that they get compatible results), verification (checking that the code produces results consistent with the underlying model) and validation (checking that the code reproduces nature to a sufficient degree to be useful) it done.
I have actually written papers that try do to a bit of that and it was annoyingly hard to publish them. But unlike you and a lot of other posters here on HN I don't think the reason for insufficient validation is that we scientists are too stupid or too lazy. On the contrary, we would love to do more of it. But there is next to no funding to do it.
> On the contrary, we would love to do more of it. But there is next to no funding to do it.
Perhaps this is the crux of the disagreement then. I don't need extra funding to do what I should be doing in the first place. It can be a big time sink in some cases (e.g., I probably spent 6+ months compiling existing data during my PhD), but often is not. It will probably make your model look worse than you'd like, but if everyone did it then science would progress faster.
Kinetic plasma simulations in space plasma and astrophyscial scenarios. I have seen that journal and I think it is a step in the correct direction.
Regarding the funding: you kind DO need that extra funding. because if you don't get funding for that you get out-competed by the other research who skips that step and the next postdoc position or grant goes to that researcher who wrote two more papers instead of testing code that ultimately didn't find any major bugs.
> you kind DO need that extra funding. because if you don't get funding for that you get out-competed [...]
This seems to be a common view, but I don't think it's always true. In my own work, I've found that being careful has led me to write more papers, not less. Particularly given that I've found errors in previous work that I intend to publish or already have published. There's a delay involved (i.e., it takes time to find the problems) but I think I'm coming out ahead in total publications in the long run.
My knee-jerk response is to say this is wrong. I have seen too many cases where people report initial results on some elaborate model as if they explain more aspects of the data, but then the results fail to generalize, and the theory, model or implementation turns out to be flawed or fragile. It gets a publication, but the state of the art does not advance.
But I notice that some younger investigators I have worked with tend to be more careful, with things like releasing code/data openly, being serious about code/model reuse, doing more careful verification/validation, being more rigorous about test/training data. It seems to be a spectrum of better research practices rather than one thing, like “better software engineering“.
Such a more principled approach should prove itself over time, because Nature cannot be fooled.
Yes, "being careful" refers to a variety of different techniques, not just better software engineering. What has ended up being publishable in my case wasn't related to better software engineering, rather, being more careful about model derivations and more rigorous about the available data. But it could have been software improvements.
To be clear, I don't think everyone needs to be as careful as I am. A few researchers in a particular subfield spotting errors can have large benefits for everyone. And those few may catch the vast majority of the errors. There are diminishing returns to adding more people focused on being rigorous.
This is a very important point. Your budget of "being careful" might be best spent on better software engineering. But it is also possible that you should rather spend it on other things such as better mathematical model, better input data sets or something else. And a (sub)field prospers most if different groups and researchers spent there (very limited and expensive) time on different things. Trying to shame everybody into following the magic 27 rules of software engineering is a step back, not forward.
Right. You do not always get outcompeted. But you run the risk. And yes, having good architecture in parts you need to change often is someting that pays of. As does having tests in section of the code that is brittle, hard to reason about or historically buggy. As does writing documentation on things that you had to spend an annoying amount of time on to understand. And I think that most researchers understand that and invest that time.
The thing is: This does not mean that either of us puts a lot of effort into having nice, good, modular architecture in parts we know that we will never change. Or write a lot of code coverage in module we understand very well and can reason about using analytic calculations. Or write a lot of documentation for things that are obvious to us (be it through familiarity or whatever reason).
The next researcher however might want to change exactly that part we never want to change. Or might the analytic check we always used to make sure to code was not going of the deep end is not valid for his work. And he is near certainly going to need documentation on other spots of the code (and find the documentation we wrote and needed useless, because that topic is something glaringly obvious to him).
There is usually no advantage for us to make the code nice for that researcher. And THAT is very scientific software gets the bad name from. Because we have to make the trade-off, what is useful to us (and possibly our time). If we try to make the code nice and friendly for every potential user under the sun we usually DO get outcompeted.
I think I misunderstood your position earlier. My experience with academic codes seems to differ significantly from yours. It is not uncommon at all for scientific software to have no documentation at all in my experience, and very little if anything in terms of tests. I agree with you that it does not make sense to have detailed documentation and tests for all parts of scientific software. One must prioritize. What I'm arguing against is the (often implicit) attitude that documentation, testing, and other good software engineering practices are not necessary in science at all. I see now that you agree, and it's more of a question of the amount of software engineering practices which are optimal. I appear to prefer more than you do, but you clearly see the value of these practices. Correct me if I'm wrong.
You are not wrong at all. And the amount of test coverage and documentation on needs depends a lot on the field, the numerical methods, the indended users, the type of the code (simulation vs analysis vs plotting) and so on.
The thing where I seem to have a totally different view from most of HN is that I think that we make valid tradeoffs, and that undocumented function and tests that only cover 40% of the code base is a perfectly fine state of affairs, instead of being caused by scientists that are too stupif for basic software engineering.
What a lot of software people also do not get at all is that variables such as x, v and a might be perfectly descriptive variable names.
You are making some good points here. It is expensive and exacting work to, for instance, set up validation experiments, or to quantify errors exactly enough so that whatever errors that do appear are explained by known effects. Agencies don’t like to pay for this.
One forcing function can be if high-stakes decisions rest on the conclusions from the code. In the geosciences, you start to see some of this care for air quality measurements and for climate science. (One relevant journal here is SIAM J. UQ., https://www.siam.org/Publications/Journals/SIAM-ASA-Journal-...)
The lab where I work has spent a lot of resources on validating thermal and fluid dynamical codes for atmospheric entry of Mars landers — an engineering example rather than science. They obviously are very motivated to avoid an unpleasant surprise from a model failure.
In industry, we often also face an incredible amount of uncertainty in the form of unknown future functionality, UX, requirements, etc. As long as a business continues to adapt to changing market conditions, it's rare for its software to reach completeness. Some components' rate of change certainly slows down over time but newer components might rapidly grow in complexity or face major refactors. Successfully scaling to meet increased demand may also demand a pretty fundamental architectural shift. While not entirely the same, your evolving understanding of a problem and its effect on software requirements isn't too different from what we see regularly in industry.
It's very difficult to foresee what changes will be required 1-year's time, let alone 3. To that end, devs try to be extra thoughtful during the planning phase so that the end product is made up of well-behaved components. The extra time taken to come up with a sensible architecture goes a long way when business requirements eventually call for new or enhanced functionality and someone needs to understand the codebase in a reasonable amount of time. To be clear, this doesn't guarantee clean code -- bad architectural decisions happen, planning can occur without a clear requirements, no one held the quality bar during implementation, etc. -- but the outcome is always better than a laissez-faire design "strategy".
Perhaps scientific code churn is much more frequent than commercial code churn. In which case you would want to balance that by perhaps spending less time coming up with a good architecture -- and in the case of throwaway code, perhaps none at all. That being said, I would be surprised if new scientific knowledge always results in you having to discard every line and restart completely from scratch. I imagine that spending more time upfront architecting your software would instead allow you to more quickly react to new research results -- paying dividends on your time, in the long run.
This got caught because it led to a big fight. Most other scientific software is not audited at all. Given my experience with those who write them, I tend not to trust most conjugational results coming from science.
The second link was fascinating and would make a good submission to HN.
> Given my experience with those who write them, I tend not to trust most conjugational results coming from science.
In computational fluid dynamics, there's a saying: No one believes computations aside from the person who ran them, and everyone believes experiments aside from the person who ran them.
To be fair, this is mostly because turbulence modeling can be very inaccurate, but bugs in the code are also a major concern of mine.
Oh it is nothing new that HN is bashing code in science. But if anything that story is a reason for each group to build their own independent code instead of trying to enforce so much software engineering overhead until there is only one code by the best funded group is left because nobody else dares to (or can afford to) build their own code.
If you have "yeah I guess it works" code, you're likely producing "yeah I guess it's right" science, and that's not really the point it seems.
But yes, you're right. Perfecting things is at cross-purposes with the frenetic pace of research. Crappy untested code gets written under duress. Resist it. That needs to change, just as the whole funding model needs to reward verification more than new discovery... But back to the code, if the code is wrong, the science is likely to be as well.
I do hear this sentiment often: "Hey dudes, it's just a prototype! Why waste time making it pretty? Who cares? We're doing SCIENCE here!"
But prototypes (and beginnings in general) are precisely the time to be extra careful. Wrong turns and self-delusion are costlier, not cheaper, when you're the one paving the road for others.
In research, there are many ways to lead yourself astray, due to the inherent chaos of novelty, even without software bugs completely flipping the outcome. The idea that writing shitty code somehow "saves time" and is only worthwhile for the "17th javascript framework" has to go.
Articulating your thoughts into a sane logical structure (aka code), with sane names and motivation examples and conceptual units, saves you time even in the short run. Never mind 5 years. It also helps you avoid publishing unreliable, brittle, "SOTA" nonsense… of which there's sadly so much.
There is a wide difference between code that is pretty and code that is careful and written in a way that will not lead you astray. And if you don't know or don't care to make that distinction I am not interested in your opinion.
> Articulating your thoughts into a sane logical structure (aka code), with sane names and motivation examples and conceptual units, saves you time even in the short run.
If you think he or she is only talking about pretty code, you don't understand the distinction to begin with.
Remember: This is not the 17th javascript framework, but software for problems that we don't understand going in. And often enough we have not understood the problem all that well even after a decade when we write the third code. These codes are research codes. Ongoing experiments. The main goal is NOT to produce long-term maintainable software, but to produce scientific understanding and build intuition about the systems that are modeled. The code is just another tool among experiements, analytic calculations and back-of-the-envelope discussions on a white board.
Rewriting the code every 5 years is an insane proposition in software engineering, but completely ok in some fields of science.
Would I like better language support to check SI units for me? Sure. Would I like highly performant libraries for vector fields, that work with gcc 4.6 on a top 500 machine? Sure. Would I like to be allowed to spend time on fixing yeah-I-guess-it-works code? You bet.
But would I like HN to just up about "scientists just need to learn to code"? Oh hell yes! Because -- believe it or not -- we often DO know better. But fixing code is not what the taxpayers, what YOU, pay us for. We are paided to understand nature. And until you are willing to pay higher taxes and spend more money and science and to invest more into fixing long term infrastructure you really do not get to be so damn condescending.