More

bengarney · 2026-01-27T16:27:36 1769531256

Here is a thought experiment (for devs who buy into package managers). Take the hash of a program and all its dependency. Behavior is different for every unique hash. With package managers, that hash is different on every system, including hashes in the future that are unknowable by you (ie future "compatible" versions of libraries).

That risk/QA load can be worth it, but is not always. For an OS, it helps to be able to upgrade SSL (for instance).

In my use cases, all this is a strong net negative. npm-base projects randomly break when new "compatible" version of libraries install for new devs. C/C++ projects don't build because of include/lib path issues or lack of installation of some specific version or who knows what.

If I need you to install the SDL 2.3.whatever libraries exactly, or use react 16.8.whatever to be sure the app runs, what's the point of using a complex system that will almost certainly ensure you have the wrong version? Just check it in, either by an explicit version or by committing the library's code and building it yourself.

sebastos · 2026-01-27T17:08:45 1769533725

Check it in and build it yourself using the common build system that you and the third party dependency definitely definitely share, because this is the C/C++ ecosystem?

bengarney · 2025-09-30T15:24:55 1759245895

I increasingly wonder if writing and binding performance critical things in C/C++ would be less overall effort. Performant zero-alloc C# vs C/C++ is backdoor magic vs first class language support. Boxing gloves vs. surgical gloves.

C# _can_ do this! But I face many abstractions: special perf APIs, C#, IL, asm. Outcomes will vary with language version, runtime version, platform, IL2CPP/Burst/Mono/dotnet. But C/C++ has one layer of abstraction (the compiler), and it's locked in once I compile it.

I want to do the thing as exactly and consistently as possible in the simplest way possible!

A build environment that compiles .cpp alongside .cs (no automatic bindings, just compilation) would be so nice for this.

----

Example of what I mean regarding abstractions:

  void addBatch(int *a, int *b, int count)
  {
    for(int i=0; i<count; i++) 
      a[i] += b[i]; 
  }

versus:

    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void AddBatch(int[] a, int[] b, int count)
    {
        ref int ra = ref MemoryMarshal.GetArrayDataReference(a);
        ref int rb = ref MemoryMarshal.GetArrayDataReference(b);
        for (nint i = 0, n = (nint)count; i < n; i++)
            Unsafe.Add(ref ra, i) += Unsafe.Add(ref rb, i);
    }

(This is obviously a contrived example, my point is to show the kinds of idioms at play.)

int_19h · 2025-09-30T16:00:49 1759248049

But your first code snippet is also valid C# if you just throw in `unsafe` there. And, generally speaking, everything that you can do in C (not C++) can be done in C# with roughly the same verbosity.

bengarney · 2025-09-30T19:14:09 1759259649

It is, but it isn't quite the same as C, either. That is to say, there is some semi-unknowable stack of stuff happening under the covers.

I will predict the future: you will pull up the JIT assembly output to make the case that they output similarly performant assembly on your preferred platform, and that you just have to do X to make sure that the code behaves that way.

But my problem is that we are invoking the JIT in the conversation at all. The mental model for any code like this inevitably involves a big complex set of interacting systems and assumptions. Failure to respect them results in crashes or unexpected performance roadblocks.

int_19h · 2025-10-01T01:48:47 1759283327

I don't see what makes JIT any different from AOT in this case. But C# can be AOT-compiled as well.

Will it be as efficient? Probably not; C++ compilers have been in the optimization game for a very long time and have gotten crazy good at it. Not to mention that the language itself is defined in a way that essentially mandates a highly optimizing compiler to get decent performance out of it (and avoid unnecessary creation of temporaries and lots of calls to very tiny functions), which then puts pressure on implementations.

But my point is that this is not a question of language, but implementation. Again, your C example is literally, token-for-token, valid C# as well. And, in general, you can take any random C program and mechanically convert it to C# with the exact same semantics and mostly the same look (with minor variations like the need to use stackalloc for local arrays). So if it's all 1:1, equivalent perf is certainly achievable, and indeed I'd expect a C# AOT compiler to do exactly the same thing as the C compiler here, especially if both are using the same backend; e.g. LLVM.

Now in practice the implementations are what they are, and so even if you are writing C# code "C-style", it's likely to be marginally slower because optimizer is not as good. But the question then becomes whether it's "good enough", and in many cases the answer is "yes" - by writing low-level C# you already get the 90% perf boost compared to high-level code, and rewriting that in C so that it can be compiled with a more optimizing compiler will net you maybe 10% for a lot more effort needed to then integrate the pieces.

buybackoff · 2025-09-30T16:29:53 1759249793

I use an extension for arrays, something like:

    internal static class ArrayExtensions
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static ref T RefAtUnsafe<T>(this T[] array, nint index)
        {
    #if DEBUG
            return ref array[index];
    #else
            Debug.Assert((uint)index < array.Length, "RefAtUnsafe: (uint)index < array.Length");
            return ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), (nuint)index);
    #endif
        }
    }

then your example turns into:

    public static void AddBatch(int[] a, int[] b, int count)
    {
        // Storing a reference is often more expensive that re-taking it in a loop, requires benchmarking
        for (nint i = 0; i < (uint)count; i++)
            a.RefAtUnsafe(i) += b.RefAtUnsafe(i);
    }

The JITted assembly: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8AB...

I'm convinced C# is so much better for high perf code, because yes it can do everything (including easy-to-use x-arch SIMD), but it lets one not bother about things that do not matter and use safe code. It's so pragmatic.

See also the top comments from a recent thread, I totally agree. https://news.ycombinator.com/item?id=45253012

BTW, do not use [MethodImpl(MethodImplOptions.AggressiveOptimization)], it disables TieredPGO, which is a huge thing for latest .NET versions.

bengarney · 2025-09-30T19:28:00 1759260480

The world falls into two categories for me. "Must be fast" and "I don't care (much)". C/C++ is ideal for the first one, and C# is awesome for the second.

My argument isn't that C# is bad or performance is unachievable. It's that the mental overhead to write something that has consistent, high performance in C/C++ is very low. In other words, for the amount of mental effort, knowledge, and iteration it takes to write something fast + maintainable in C#, would I be better served by just writing it in C/C++?

The linked assembly is almost certainly non-optimal; compare to -O3 of the C version: https://godbolt.org/z/f5qKhrq1G - I automatically get SIMD usage and many other optimizations.

You can certainly make the argument that if X, Y, Z is done, your thing would be fast/faster. But that's exactly my argument. I don't want to do X, Y, Z to get good results if I don't have to (`return ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), (nuint)index);` and using/not using `[MethodImpl(MethodImplOptions.AggressiveOptimization)]` are non-trivial mental overhead!).

I want to write `foo.bar` and get good, alloc free, optimized results... and more importantly, results that behave the same everywhere I deploy them, not dependent on language version, JIT specifics, etc.

If I was operating in a domain where I could not ever take the C/C++ path, these features of C# are of course very welcome. And in general more power/expressiveness is very good. But circling back, I wonder if my energy is better spent doing a C version than contorting C# to do what I want.

buybackoff · 2025-09-30T20:54:10 1759265650

It just looks like you are much more fluent in C/C++ than in C#.

bengarney · 2025-06-24T03:47:43 1750736863

I was a developer on that version!

We got the team back together and did a spiritual successor, Marble It Up!. If you are still enjoying the original I’d recommend checking it out (on consoles and Steam).

mcyukon · 2025-06-24T04:28:48 1750739328

Haha, thank you!

I have many fond memories playing Marble Blast on a G5 iMac as a teen, I think it came to our Internet-less house bundled with the iMac along with Glider Pro which I think came via CD.

I managed to get Marble Blast ultra running on my Steam deck, but the controls weren't amazing. Marble It Up runs like a top, and scratches that itch I have for the original.

vmladenov · 2025-06-24T04:25:39 1750739139

My friend had Ultra on his Xbox 360 some 15 years ago and I instantly fell in love. Thank you for making another entry

bengarney · on April 30, 2025

Is there an example of this?

A skybox with depth is only marginally better than a skybox for any sort of 3d experience. Using the depth for occlusion would be kind of cool.

bengarney · on April 8, 2025

Really interesting analysis of where the data lives… cutting 3-4 textures would save you more memory even in the 100k actor case, though.

reitzensteinm · on April 8, 2025

Depending on a bunch of factors of how this data is accessed and actors are laid out in memory, it may be more cache friendly which could yield substantial speedups.

Or it could do next to nothing, as the data is multiple cache lines long anyway.

bengarney · on April 8, 2025

I would not expect much, but you'd have to measure to be sure.

If you actually have a million of something you're better off writing a custom manager thing to handle the bulk of the work anyway. For instance, if you're doing a brick building game where users might place a million bricks - maybe you want each brick to be an Actor for certain use cases, but you'd want to centralize all the collision, rendering, update logic. (This is what I did on a project with this exact use case and it worked nicely.)

reitzensteinm · on April 9, 2025

I wouldn't expect much either. The potential for speedups would be if there's locality for data on either side of the multiplayer padding, or if the actors have contiguous layout and deleting the data plays better with the CPU's stride prefetching.

Significant performance degredation is also possible if at some point a smart (but not wise) developer positioned the data to eliminate false sharing on either side.

Agreed that you shouldn't be using this heavy weight paradigm with large amounts of entities. My intention was just to add a bit of color to the idea that saving memory allocations can have implications beyond just the number of bytes you ultimately malloc.

cma · on April 8, 2025

If the memory savings he got were fully read or fragmented with other stuff on cache lines that are read in every frame (not likely for static world actors), it could be ~10% of CPU memory bandwidth on mobile every frame at 120hz on an lpddr4 phone.

A big problem with them is they are so heavyweight you can only spawn a few per frame before causing hitches and have to have pools or instancing to manage things like bullets.

I think in their Robo Recall talk they found they could only spawn 10-20 projectile style bullets per frame before running into hitches, and switched to pools and recycling them.

teamonkey · on April 8, 2025

Pooling is pretty standard practice though, it would be the go-to solution for any experienced gameplay programmer when dealing with more than a dozen entities (though annoyingly there isn’t a standardised way of doing it in Blueprint).

dijit · on April 8, 2025

To be completely fair though, blueprints themselves are oft-maligned for performance.

They're fantastic for prototyping, but once you have designed some kind of hot-path most people typically start converting blueprints to code as an optimisation.

In such a scenario adding pooling becomes a trivial part of such an effort.

cma · on April 8, 2025

Standard practice, but it bit Epic by surprise. You wouldn't think it would be needed at such small numbers. You wouldn't automatically think it would be needed on 3+ghz machines.

Pxtl · on April 8, 2025

I've never played with UE and so I'm kinda shocked to learn that there isn't pooling already for objects that have this kind of creation cost.

bengarney · on April 6, 2025

As someone on the hiring side, I strongly agree with GP.

I can think of a specific example where someone with experience and strong qualifications pushed for a higher salary - which I agreed to - then struggled with the role and ended up not sticking around. In another instance, someone had lower qualifications and experience, but also negotiated hardest out of their hiring cohort - same outcome, plus they weren't a great fit personality wise.

Meanwhile, I can think of several other people who cross-trained from their initial non-technical careers at the local community college, came in with low experience, didn't negotiate aggressively (although did stand up for themselves)... They've done great work (and grown substantially and been good to work with) over the long term, and seem to enjoy working for me enough that a few who left for other jobs were interested in being hired again later on.

Negotiating employment terms is the first task you complete at a new job. It is a good predictor. If it leaves a bad taste in the mouth for either side, it's not a good sign of things to come...

bengarney · on Feb 3, 2025

I have a product that does exactly this. E-mail me at ben AT theengine DOT co, I'd love to show it to you and see if it would help.

bengarney · on Feb 3, 2025

This is stuff is such a PIA to parse. I assume it's just different teams doing different features over the years, and being alternately repulsed/seduced by each format. Probably features are implemented as libraries so there isn't a master oversight - they aren't trying to make iMessage's internal formats follow a consistent plan, just let all the libs coexist...

meibo · on Feb 3, 2025

Maybe they should be repulsed, considering all of the journalists that are getting persecuted and/or murdered because they are getting pwned through iMessage serialization bugs :)

pixel_tracing · on Feb 3, 2025

As someone who used to work on that team, it’s so interesting hearing thoughts from external public on the team.

ajtaylor · on Feb 4, 2025

I would love to hear your thoughts as an insider.

bengarney · on Dec 21, 2024

Given what's there today, especially the sizzle reel, I'm pretty dubious.

If the author drops an amazing generative text-to-sim system on top of this... THAT is impressive - but effectively orthogonal to what's there - so I'm withholding excitement for now.

Take the time to read over the repo. It is not revolutionary. It is an integration of a bunch of third party packages (which are largely C/C++ libraries with Python wrappers, not "pure python"!). The stuff unique to Genesis is adequate implementations of well-known techniques, or integration code.

The backflip is awesome but plausibly explained by the third party RL library, and they include an example program which... runs a third party library to do just this.

The performance numbers are so far beyond real world numbers as to be incoherent. If you redefine what all the words mean, then the claims are not comparable to existing claims using the same words. 43 million FPS means, if my math is right, you are spending 70 clocks per frame on a 3ghz processor. On a 4080 you would have ~500k clocks in the same period, but that implies 100% utilization with zero overhead from Amdahl's law. (Also, Hi, Erwin, maybe you think these claims are 100% realistic for meaningful workloads in which case I'll gladly eat crow since I have a huge amount of respect for Bullet!)

I can only judge what's released now, not a theoretical future release, and what's here now is something a really good developer could bang out in a couple of months. The USP is the really good spin around the idea that it's uniquely suited for AI to produce beyond-SOTA results.

I have slightly longer form thoughts: https://x.com/bengarney/status/1869803238389887016

bengarney · on July 5, 2024

I wrote a tool to do automated QA on internet video (HLS/DASH, tech used for Netflix, YouTube, Twitch, etc.).

It evaluates streams against a database of 100 or so "quirks" that identify either general issues or issues that will only manifest on certain player libraries. For instance, specific "in spec" encodings which are actually non-standard in practice get flagged.

Built on TypeScript/node/Docker over the course of maybe 18 months. Used it fairly often when I was working in the space, not at all these days. Originally the plan was to license it as an enterprise vid tool.

(I've been considering open-sourcing it - would YOU use it if so?)

klabetron · on July 6, 2024

I’d be interested if for no other reason to see if some of the hiccups I see in streaming video recordings are more common than just me/just random.

vivekv · on July 6, 2024

I am definitely curious about a tool like this. I work with a lot of video streams and this collective knowledge of quirks might be useful as a QA tool