> "More tests" is not the goal - you need to write high impact tests, you need to think about how to test the most of your app surface with least amount of test code.
Are there ways we can measure this?
One idea that I’ve had, is collect code coverage separately for each test. If a test isn’t covering any unique code or branches, maybe it is superfluous - although not necessarily, it can make sense to separately test all the boundary conditions of a function, even if doing so doesn’t hit any unique branches.
Maybe prefer a smaller test which covers the same code to a bigger one. However, sometimes if a test is very DRY, it can be more brittle, since it can be non-obvious how to update it to handle a code change. A repetitive test, updating it can be laborious, but at least reasonably obvious how to do so.
Could an LLM evaluate test quality, if you give it a prompt containing some expert advice on good and bad testing practices?
Sometimes you actually have to think, or hire someone who can.
Go join the comments section on the Goodharts Law post to go on about measuring magical metrics.
> Sometimes you actually have to think, or hire someone who can.
I'm perfectly capable of thinking. Thinking about "how can I create a system which reduces some of my cognitive load on testing so I can spend more of my cognitive resources on other things" is a particularly valuable form of thinking.
> Go join the comments section on the Goodharts Law post to go on about measuring magical metrics.
That problem is when managers take a metric and turn it into a KPI. That doesn't happen to all metrics. I can think of many metrics I've personally collected that no manager ever once gazed upon.
The real measure of a metric's value, is how meaningful a domain expert finds it to be. And if the answer to that is "not very" – is that an inherent property of metrics, or a sign that the metric needs to be refined?
Good tests reduce your cognitive load; you can have more confidence that code will work and spend less time worrying that someone will break it.
BTW, I think above are the best metrics to use for tests. Actually measuring it can be hard, but I think keeping track of when functionality doesn't work and people break your code is a good start.
And I think all of this should be measured in terms of doing the right thing business logic-wise and weighing importance of what needs testing based on the business value of when things don't work.
Are there ways we can measure this?
One idea that I’ve had, is collect code coverage separately for each test. If a test isn’t covering any unique code or branches, maybe it is superfluous - although not necessarily, it can make sense to separately test all the boundary conditions of a function, even if doing so doesn’t hit any unique branches.
Maybe prefer a smaller test which covers the same code to a bigger one. However, sometimes if a test is very DRY, it can be more brittle, since it can be non-obvious how to update it to handle a code change. A repetitive test, updating it can be laborious, but at least reasonably obvious how to do so.
Could an LLM evaluate test quality, if you give it a prompt containing some expert advice on good and bad testing practices?