Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like a bit of an insult to the colleagues that manage to successfully design a 50+ billion transistor GPU that their product gets messed up by something like this. Talk about the last mile.


You are right of course, but that 50+ billion transistor design is mostly copy-pasted from a small number of blocks.

For example, the RTX 4090 has 16,384 cores, so you have to divide 50+ billion by that number. Then inside every core you have lots of similar repetition, etc.

The biggest challenge is to create a die that size with little to no defects, which is of course the foundry's achievement.


I get the point but that's also a little like saying Windows is quite simple software because it is simply full of 'if' and 'for' and 'while'.

A modern great CPU or GPU chip is not that simple to design. It is not just an ALU copy pasted thousands of times. Otherwise there would be more competition.


> I get the point but that's also a little like saying Windows is quite simple software because it is simply full of 'if' and 'for' and 'while'.

Not really, the cores are really similar units.

> A modern great CPU or GPU chip is not that simple to design.

Of course not, for starters there's not just the shader cores (a 4090 also has 512 TMUs, 176 ROPs, 128 RTX, and 512 tensor cores), you need to design all of these before you can replicate them, then you need to design the common components.

But the point was that the 50 billion transistors were not designed for individually, there is a large amount of block reuse. Your analogy would hold if windows inlined everything.


I just used that original phrase as another way of saying 'extremely sophisticated hardware'. Didn't mean for people to get hung up on the number.


"Extremely sophisticated" has many axes. An "extremely sophisticated" car might mean the engine is sophisticated, or the dashboard electronics are. The two are rarely correlated.

NVidia clearly has spectacular software and digital hardware designers.

Mechanical, analog, and power engineering are entirely different disciplines.


I know. The story is about a crappy pack-in cable, and how it's snatching defeat from the jaws of victory regarding product sentiment.


To be fair, judging from the latest tear-down video from GN, their mechanical and heat engineering team is also on top of their game.


Not really because nVidia already had previous designs as a starting point. They seem to be getting mostly bigger (more replication), and the most important thing to consider when growing like that is power distribution.


This is not a good way to look at complex systems. Both designing a GPU chip and making it are too complex to use reductive reasoning, especially when comparing one thing to the other. It’s foolish.

It doesn’t add anything to OP’s point that it took so much effort to engineer a GPU only to be messed up by a stupid connector.


How about building a space shuttle carrying 7 humans to space but it fails because of a <$1 o-ring? Then knowing that engineers had sounded specific warnings prior that were ignored by management.

We've seen these failure stories time and again to various repercussions, but they seem to all come down to some form of greed whether it be reputational or monetary.


Note that the O-rings were 12 feet in diameter - they encircled the entire solid rocket booster. They definitely didn't cost less than a dollar.


Sorry, got my Challenger and Apollo 13 price for failed part confused with each other.

So, yeah, how about building a rocket designed to send three humans to the moon, but failed because of some <$1 part?


> The cause of the disaster was the failure of the two redundant O-ring seals in a joint in the shuttle's right solid rocket booster (SRB).

https://en.m.wikipedia.org/wiki/Space_Shuttle_Challenger_dis...


This is what failure is like, there’s a lot of surface area and sometimes it’s things you never thought of worrying about.

I would bet the cause coming down to minor manufacturing defects and testing that was “too good” at plugging things in not accounting for sloppy end user assembly.

Combined with good old inadequate safety margins.


I agree with that, but, glibly, I'd argue with this:

> things you never thought of worrying about.

In my experience, it's nearly always the connectors that you suspect first.


Also, power supply design (including power cables) is a really foundational piece for any electronic device, and the one that typically causes complete failure/letting the blue smoke out if not done carefully. And of course, safety/fire issues if screwed up particularly badly.

It is weird that in their push to all time high TDP they didn’t spend more time thinking about it. That said, cutting edge often means bleeding edge, something something.


Self-insult for the company overall. But yeah must feel bad for those not responsible. Those responsible probably don’t care, which is exactly the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: