Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Workarounds for C11 _Generic() (greenend.org.uk)
74 points by fanf2 on July 30, 2023 | hide | past | favorite | 53 comments


Tiny C compiler actually has a "bug" in not implementing the "big bug".

That is, the following expressions are cleanly compiled without any errors:

    _Generic(0, float: s/<[^>]*>/ /g, int: 1),
    _Generic(0, float: now the thing about this tcc implementaion is, int: 1),
    _Generic(0, float: that it just skips until the next comma, int: 1),
    _Generic(0, float: keeping track of parentheses nesting., int: 1),
    _Generic(0, float: [[ Suprizingly it doesnt validate which parentheses)), int: 1),
    _Generic(0, float:  {{{so this bullshit is possible))), int: 1,
             default: #else #define this is great isnt it?);
See: https://godbolt.org/z/o7jne7hWM


Will this work?

  define string_length(x) _Generic(x,        \
    const char *            : strlen((const char*)(const void*)x),           \
    struct MyStringBuffer * : ((const MyStringBuffer*)(const void*)x)->length)


Heh, this was my first thought as well, and it does indeed compile and work (with GCC 12.2, anyway).


Yeah, this is what I saw in some example code. Also you don't need the (void*), at least not with clang or GCC trunk on compiler explorer: https://godbolt.org/z/a8rP91dKr


At the bottom of the page lemonade is made:

> Apart from the most obvious answer (that it’s useful for things exactly like <tgmath.h>), the best thing I’ve thought of to do with _Generic is to use it for deliberate error checking.

> The annoyance in all the previous sections was that it was very hard to avoid compile errors, when we wanted our code to actually compile, run and do something useful. But if what we wanted was to provoke compile errors on a hair-trigger basis, then perhaps we could use it for that more reliably?


Somehow C++ (and now C) standardization very often ends up with constructs that require a lot of ugly hacks in the code that uses them.


According to this the "big bug" is that... _Generic works mostly like a macro and expands code that the compiler sees. That seems like a little weak, macros have been doing this forever via a mere extra level of indirection.

So sure, "(x)->length" might not be valid syntax in all configurations the compiler might see. But "LENGTH(x)" is, e.g.:

   #if X_MIGHT_BE_MYSTRINGBUFFER
   #define LENGTH(x) ((x)->length)
   #else
   #define LENGTH(x) 0 /* anything that converts to the output type will do */
   #endif
This is a routine pattern seen everywhere in C. The Linux kernel is filled with it, e.g. field accessors or arch-specific functions that are stubbed out when not needed, etc...

Is this as clean as a full-on generic typesystem? No. But it's C, it shows a weirdness that you have to handle manually, and you do it the same way we've been doing it in C for decades. Not a new problem, doesn't need a new solution. It's C!


I think you've misunderstood the purpose of _Generic.

Your #if statement will select exactly one implementation for every single site of the use of LENGTH() in the codebase, depending on the value of X_MIGHT_BE_MYSTRINGBUFFER at compile-time.

_Generic() allows you to have both implementations available, and different implementations can be selected at the call site depending on the type of the argument passed.


Macros and #ifdef do not solve the problems that _Generic is used for. _Generic is for static polymorphism, not for target or build configuration.


Macros are perfectly capable of static polymorphism. That is one of their primary use cases. _Generic is simply an alternative mechanism for static polymorphism, with some extra capabilities and some limitations in comparison to your typical macro.


To prove your assertion, would you mind defining a length() function via macros that works with both a char * and a pointer to a struct {int len; char *data}, as the fine article explained?


There is a mixup here between different flavours of polymorphism.

_Generic does ad-hoc polymorphism via type-based dispatch in order to implement function overloading.

Macros are more like a limited form of parametric polymorphism, in that they are polymorphic up to what is allowed by the operations they apply to their arguments. In C that typically means relying on implicit or explicit coercions, or C’s mildly overloaded operators.


> Macros and #ifdef do not solve the problems that _Generic is used for.

No, but they solve the specific "big bug" described in the article very well. Use _Generic for what it's good for. Use macros to fill in the gaps (in this case build/preprocessor configuration differences that change the syntax used in the RHS of the generic expansion).

Again, this is C. You want fancy type systems, you know where to find them. Don't ask for them here, we don't want them.


> You want fancy type systems, you know where to find them. Don't ask for them here, we don't want them.

But the problem here is caused by the type system being overly aggressive. If it wasn't trying to force the wrong type onto unused code, everything would work fine.


Indeed. So? Again if you want to code in Haskell or Rust, those are fine choices. What you want, you can't have absent a complete rework of C's type system. And you don't actually want that.


By all means work on getting `_Generic` removed again if that is what you want.

As long as the feature is there, it should endeavor to be the best possible version of itself. Making it deliberately useless and unsuitable in order to avoid the obvious use of _doing different things for different types_ is really just having a suboptimal feature. And if it would make it a better feature to ignore the parts of the syntax, that are statically known to not be part of the program, it seems like low-hanging fruits.

I'm sure there are philosophical reasons for every choice made in the design of this feature, but at the end of the day, users just see a feature being needlessly obtuse.


> Making it deliberately useless

How do you get "useless"? I literally showed five lines of code above that solve the putative problem with _Generic by leveraging an idiom we've been using for decades already.

Meh. Use what you like. Clearly it's not C.


> I literally showed five lines of code above that solve the putative problem

The top reply to your five lines literally points out why your solution doesn't solve the right problem.

The macro you wrote puts the same definition everywhere. It can't even handle the original use case of calling a different function for each number type.

If you're suggesting a combination approach of hiding the implementation with a macro until after _Generic gets resolved, I don't think that ordering is possible.


You say "indeed" but you don't seem to understand my point: Nobody is asking for a fancy type system. If anything, they're asking for less type system in this specific situation.

But really it's just a matter of which type it uses for the expression, where neither option is fancier than the other.

The type system does not need a complete rework to support this.


I would have designed the feature like this:

  _Generic(<expr>, type1 : ( expr1 ), type2 : ( expr2 ), ... default : ( expr ))
Here, the parentheses shown in this phrase pattern are required.

The implementation would only parse and semantically analyze the expression of the matching type. For the others, the ( expr ) would be treated as a token sequence to be skipped, which has to contain valid tokens, and balancing parentheses, square brackets and braces.

E.g.

   char *p = "foo"

   _Generic(p : int : ("foo" ++ / ? ([]&xyz) { ; } ),
                char * : (p[0]))

Here, the type of p doesn't match int, and so the interior token sequence of ("foo" ++ / ? ([]&xyz) { ; } ) would just be scanned to check for balancing parentheses, brackets and braces, which allows the parser to locate the next clause in the association list.


That would allow you to put syntactically invalid code into the expressions, but is that a useful feature? It's not very hard to have valid syntax in a situation like this. The main issue here is type enforcement.


> but is that a useful feature?

It would solve the usability issues experienced by Simon Tatham, described in his article. For instance, if one of the clauses has x->foo, it wouldn't matter that the actual x argument in a given instance is char *, that cannot be dereferenced as x->foo.

It would be up to the programmer to test every one of the supported cases of their _Generic construct.

One way to avoid problems would be to include only cases that are used in your codebase. Before adding a new case, add a use* of your generic macro with the currently unsupported type. That should fail to build. Then, add the case. (TDD style).

The preprocessor's #ifdef allow syntactically invalid code. This is very useful; it lets you have conditional sections of code that are understood only by certain compilers due to using their extensions.

Arguments to preprocessor macros, if not planted into syntax, can be syntactic gibberish, but parentheses have to balance. You cannot invoke a macro as mac({(}) and such.

_Generic is meant to work with macros; it should have some macro-like forgiveness in it to make it as useful as possible.

In Common Lisp, the #+ and #- syntax for conditionally skipping forms is similarly defined as doing a minimal parsing for balancing parentheses and a few other details. This allows #+ to switch between implementation-specific code that includes implementation-specific read syntax that cannot be read by all implementations.

http://www.lispworks.com/documentation/lw51/CLHS/Body/02_dhq...

It works by binding *read-suppress* to true and then calling read to read the form.


> It would solve the usability issues experienced by Simon Tatham, described in his article.

In a very overkill way, though. You can solve the issue in the article by parsing the expression but not doing anything with it. You don't need to allow invalid syntax for that problem.

So to be clear, I'm asking specifically about whether it's useful to allow invalid syntax inside _Generic. What does that allow in practical use, moreso than allowing valid syntax with invalid types/semantics?


It's easy to implement and obviously correct. I can't think of a situation in which it would not work, or on which it would be a nuisance (prevent the programmer from doing something sensible).

I don't know that about the implementation which parses the dead code. Implementing it could be troublesome in some way. In any compiler that checks constraints while parsing, implementing parsing without constraint checking means going to all those places and checking a flag.

Actually, it occurs to me that none of this is necessary. How the construct can work is by pretending that the argument in the dead cases has the type of the left side of the association list entry:

So if we have

  _Generic(x,
           char * : puts(x),
           dev * : puts(x->devname));
which is invoked with x being "abc", the dead dev * clause should be parsed and type checked under the pretense that there is a dev *x declaration in scope. And not that x is of type char *.

The implementation should be, anyway, generating a hidden local variable (gensym) to take the value of the expression.

We can imagine a code transformation like this (using GCC brace syntax):

  _Generic(x,
           char * : ({ char *__g0025 = x; puts(__g0025); }),
           dev * : (({ dev *__g0025 = x; puts(__g0025->devname); }))
Now here, we just have to suppress type checking of the dev *__g0025 = x; declaration where the only problem is that x is incompatible with __g0025. (That declaration wouldn't even have to be source code: it can be injected at the AST level.)

While this is all cool, my bracket counting idea is much simpler and more robust toward cases like different syntactic extensions being used from different compilers or other unforseen snags.


Just admit that it's pattern matching, and introduce a new variable for each match: ``` _Generic(expr0, type1 id1?: expr1, ... typeN idN?: exprN, default id0?: exprDefault ) ``` Every identifier is optional, but if you include `idX`, it gets bound to the value of `expr0` with type `typeX` in the following expression, or with the type of `expr0` for the default case.

Then you can safely refer to the value at its now-known type, even if `expr0` was not a variable to begin with.


The only reason I can imagine for this behaviour is that the compiler/standard writers did not want it.

Maybe C just shouldn't include generics. Especially not as part of the macro layer.


As a member of the WG14 I can tell you that _Generic does cause a fair bit of issues and complications in the language, and as a user of C i think _Generic is bad, because it mostly useful for confusing users about what code does so, Yes C would be better off without _Generic. Please don't use it.


_Generic allows for more complex assertions in static_assert which is really useful for correctness (For example, your embedded chip vendor offers an API where uint8_t is used to transfer bytes using a protocol, so you have to assert that uint8_t is either char or unsigned char in order to prove that there will be no strict aliasing violation during pointer casting). Also, it's useful in order to implement the lengthof macro in a safe way that will not allow pointers to scalars or nullptr to be passed, but will only allow arrays, pointers to arrays and VMTs. There are even more uses of _Generic (if constexpr, compile time pattern matching, type traits, type safe formatting, macrowrappers to APIs that accept void*) that people have been using to emulate stronger typing and reject violations of the type system at compile time, I don't see how that's bad.


There are always uses for every feature, thats how you end up with C++. The question is if it is worth the implementation burden, the complexity, making the language harder to learn and read, and the potential intentional and unintentional abuse. IMO it is clearly not.


The macros in tgmath and more recently stdbit show why those could be necessary.

If you have a set of functions for addition with overflow detection, say add_overflow{i,l,ll}, and you have a pair of ptrdiff_t’s or int32_t’s or whatnot that you know are standard integer types, and you want to use the appropriate add_overflow* function, can you do it?

With _Generic you can. Without it I think you’re stuck providing separate functions for every integer typedef in the standard library and then requiring all library authors to do the same for both integer typedefs and integer-accepting functions that they define.

(_Generic is not part of the macro level, that’s why the semantic-checking issues discussed in the article even arise. The C preprocessor can still be implemented as a separate binary that doesn’t understand C itself, even in C23.)


You make it sound alot more complicated. Ignoring the library functions or stuff you can include from safe coding standard headers. You just do something like.

  BOOL AddOverflowUnsigned(BYTE* a, BYTE* b, INT32 sizea, INT32 sizeb)
  {
     BOOL IsMsb = PlatformIsMSB();
     BOOL IsLsb = PlatformIsLSB();
     
     //log error
     if(!IsMsb && !IsLsb)
       return FALSE;

     //Early out for overflow. Assuming 
     if(sizea == sizeb && PlatformIsMSB())
     {
        BOOL AMsb = (a >> ((sizea * PlatformByteSize()) - 1)) & 1;
        BOOL BMsb = (b >> ((sizeb * PlatformByteSize()) - 1)) & 1;
        if(AMSB && BMsb)
        {
           //We overflow, early out.
           return FALSE;
        }
      else if(sizea == sizeb && PlatformIsLSB())
        //Code here.

     //We add using bitwise operators so we can do it on any size.
     //SUM = A XOR B XOR CARRY, return in A
     //Impliment algo here.

     return TRUE;
  }

  int main()
  {
     UINT64 a = 69;
     UINT64 b = 420;
     if(sizeof(a) !
     return AddOverflowUnsinged(&a, &b, sizeof(a), sizeof(b));
  }

But they have this in the STD now and also all the secure coding libs have this in it's header as a basic function. There's also the easier implementation of just adding a check to see if it will overflow, rather than add in the function.

You also only have a few you can hardcode if you don't do it generic, since all will go back to the primitives of uint8, uint16 ....


This is OK as a fallback, even if it's not at all strictly compliant (the standard still allows PDP or Honeywell endian and padding bits wherever).

As the main option it's extremely silly, though, when the entirety of the function on e.g. x86-64 could be (using the GCC signature but MS calling convention and syntax for according to your apparent preference)

  ; bool add_overflowll(long long x, long long y, long long *result)

  add_overflowll PROC
      add rcx, rdx
      seto eax
      mov qword ptr [r8], rcx
      ret
  add_overflowll ENDP
for a total of no loops, no conditional branches, and three instructions with no branches at all when inlined (not shown because I don't remember the MS inline assembly syntax). Why check beforehand when the CPU does the check for you on every arithmetic operation and all you need to do is to ask it for the result? (Unless your CPU is RISC-V because RISC-V.)

And of course the newfangled standard functions are size-dependent (more like mine than yours), so either you still need _Generic or essentially equivalent compiler-specific magic, or you have to introduce an ABI dependency on which integer type your ptrdiff_t or ssize_t or off_t or whatever typedef you got from a random place in a library or OS header actually ends up being.


That relies on architecture things that arn't in C. and that's why I say "ignoring the library functions or stuff you can include from safe coding standard headers. ". Of course you can do an intrinsic or whatever, and that's better obviously. But I wanted to show how easy it is to write a platform independent function to do this from raw C.


First of all, you forgot to return the actual sum from the function. Second, original add_overflow-style functions support arbitrary expressions as input arguments, not just pointers to l-values. And third, there is no way all this stuff is going to get optimized down to inlined

    add     rdi, rsi
    setc    eax
    mov     [rdx], rdi


Return value is returned in a, it returns true or false on whether overflow happened. And you're assuming that you're on x86_64. You can't do that in C. C is platform independent. Which is why I said you can use one of the functions or intrinsics, and writing it yourself is bad, but still easy.


Generics are fine. Its useful to have some introspection into the data that compiler has. Thats the whole point of macros in the first place.

The behavior with _Generic is basically emergent behavior from implementation of macro processors. Macro replacements occur prior to actual compilation, so no compiler context exists for x, as such all code paths must be valid.

Its much easier to require the programmer typecast x in the replacement value then start trying to shoehorn the compiler context into macro processors.


> Macro replacements occur prior to actual compilation, so no compiler context exists for x, as such all code paths must be valid.

How is it throwing an error like "invalid type argument of ‘->’" if there's no compiler context?

And why does an error like that get in the way of macro replacements?

(That might be the wrong message but the article says you at least get an equivalent.)


I mean in the sense of expanding the macro. It doesn't look at the type of operand at the macro expansion step, it simply replaces it with the _Generic() function call, which contains all the expressions, and each expression is parsed, and thus must be valid.


> It doesn't look at the type

> each expression is parsed, and thus must be valid.

There's multiple kinds of "valid".

The code has valid syntax but invalid types.

The preprocessor doesn't understand types, so it can't know the code has invalid types, right? So this error is not happening in the preprocessor, right?

Therefore changing this behavior would not require preprocessor changes/shoehorning. This error is not a consequence of how the preprocessor works.


_Generic is not part of the macro level.


This is bad for non-compiler tools.

Without it, we can parse the preprocessed source code.

With the enhancement proposed, we need to do half the compiler's work.

In tools like IDE, we want quick (sub millisecond) feedback for most edits


Not directly related but does some c/c++ compiler implement a combination of flags that create a sort "c with templates" version of C ?


Templates are so intertwined with the C++ type system such that bringing "just templates" to C would require also bringing in a bunch of C++ features if you want templates to behave like they do in C++. Just for starters, you would need name mangling and/or function overloading.


Just use C++ with just templates?


That would be my advice too. What was the point of “You don’t pay for what you don’t use” if nobody’s going to use it?


There's no common C++ build system. That means (among other things) that there's no way to turn features you don't use into compile-time errors. There's no project configuration to select an allowed subset of features. All the difficulty goes onto the programmers, and in a multi-person project into the code review process. In practice, you end up using the combined set of C++ sublanguages each contributor chooses to use. You have to know about all of C++'s features to use C++ in a large enough team.


Yes there is, with tooling like Sonar.


If C wanted a form of generics, it could likely do much better than templates. Concepts might've improved things, but I'm stuck on C++17 right now so I can't speak with experience.


Currently it seems that the only possibility of getting generics in the language is generic lambdas wrapped in macros + tag compatible structs also wrapped in macros. It's fine but, skipping the preprocessor would be better.


I think it'd be more C like to add types as first class, tagged unions, and phat pointers.


It's amazing how much power _Generic, typeof, typeof_unqual, auto and empty brace initializers can give to the compile time of C (that we already enjoy in C++ through templates, constexpr, auto and friends). Now, if only we had a way to write generic code in a reliable way...


It sounds like the author misunderstands the purpose of _Generic. The author wants it to behave like pattern matching in ML languages, but that is not its purpose. The purpose of generic selection was to introduce function overloading [1] into C without breaking ABI compatibility.

[1] https://en.wikipedia.org/wiki/Ad_hoc_polymorphism


The "big bug" indeed limits the possibilities of _Generic. But if you really want to abuse it beyond its intended scope, I think you can simply add some casting, and not the gimmicks with nested _Generic.

  #define length(x) _Generic(x, \
    char * : strlen((char*)(x)), \
    String * : ((String*)(x))->len \
  )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: