CPUs support these instructions for 9 years now. When ignoring these old CPUs, most languages and compilers are usually doing a good job. Example in C which does not depend on any library functions:
double fma( double a, double b, double c )
{
__m128d av = _mm_set_sd( a );
__m128d bv = _mm_set_sd( b );
__m128d cv = _mm_set_sd( c );
return _mm_cvtsd_f64( _mm_fmadd_sd( av, bv, cv ) );
}
2 problems: Julia supports cpus without FMA, and on windows, llvm will use libc to constant fold the value of fma even on computers that have fma in hardware.
Hardware requirements are up to the product management. For instance, many modern videogames (they generally want compatibility because directly translates to sales) no longer support 32-bit or pre-AVX1 processors. Technically, Julia can drop the support of pre-FMA3 processors if it helps moving forward.
It’s inevitable anyway due to the changing hardware requirements of the OS, the only question is “when”. I don’t think Windows 10 21H2 supports any CPU which doesn’t have SSE 4.1, it’s only a matter of time when Windows will require support of newer instruction sets.
About LLVM, can’t they compile that thing with an option like -mfma to use hardware FMA3 for constant folding?