Hacker Newsnew | past | comments | ask | show | jobs | submit | ejb503's commentslogin

Controversial opinion but the problem with Gen AI is that we conflate intelligence with competence and expertise. This is a common ailment of the exceptionally intelligent.

There is an over indexing on mental capacity, and an underappreciation for grinding and actual, erm, hard work and understanding…

I see this a lot with learning a foreign language. Intelligent people assume it will be simple for them, because they are intelligent. Spoiler, actually less intelligent folk tend to have less ego, ergo they progress much faster (personal experience, not gospel).

I don’t really care about AGI. I care about how good you are at your job. And the fact of the matter is, Gen AI isn’t that good at all.

Generative AI is misunderstood as a sentient tool that is highly intelligent and thus capable of expert output. Truth is, Generative AI is capable of performing enormous amounts of work to an intermediate level.

Given the speed of the output, we forgive Generative AI for errors and inaccuracies that we wouldn’t begin to tolerate in any walk of life, i.e. from absolutely anybody who actually knew what they were doing. "Sorry sir, it wasn't the correct arm I amputated, but it was still an arm!"

Is AGI really imminent? Probably, but only to a specific and flawed definition. There is an important distinction. Intelligence is not a synonym or equal to competence. We are conflating intelligence with expertise and competence. This is a common problem in hyper-intelligent people.

These algorithms may well reach superhuman levels of intelligence. Does it matter? Regards actual expertise and output, they are mediocre at best.

To a beginner in any field, intermediate is impressive. Unparalleled speed alongside something that resembles (but is not equal to) competence has the effect that those who are starting in any field are in awe of this tool. They are the confident graduate, unknowingly ruining the professional competence of your business.

Full blog post about why I think GenAI is cracking nuts with a sledgehammer


All these APIs are doing is converting audio to text, processing it through a language model, and then converting it back to audio. It might seem sophisticated on the surface but underneath it's just text generation in a robot's voice. Misses all the important details of audio interactions.

I used ->

* Llama 3 (on Groq) * WebSpeechRecognition API * Deepgram (TTS)

Each individual system is comprehensive and reasonably mature, but glue them all together on our proverbial pig in lipstick and there is no real understanding of the nuances of audio interactions.


Happy to answer any questions!


Not quite so easy as the blog makes out... didn't see any mention of turn and stun servers, and multi-peer adds layers of complexity...

To stably build a negotiation system you'll probably need an infrastructure of websockets and some kind of nosql db to handle identity and other quirks around negotiation...

Example... how do you handle refresh from a new tab or after the connection has dropped... some kind of device signature is probably needed too!!

(We've just spent a year building this for ecommerce @ https://yown.it)

BIG thumbs up for the interest in WebRTC though enormous potential...


WebRTC is complicated, its been around for a while and support in browsers have not been great in the past, which might be why Zoom first used WebSockets for video. They use WebRTC now though, and WebRTC is fine now, it is the standard, but potential is not the right word.

Have a look at WebTransport to see a future alternative with potential.

For those who are interested, the technical term is signalling (not negotiation), and there are many providers that will help with that (ably.com, pubnub.com, pusher.com), you don't need to build your own infrastructure. WebSockets is also just one option.

Using a SFU/ MCU is almost a requirement for multi person calls, becoming more important for bigger groups.

I had a look at yown.it, I don't know what it does, your description of it is a bit vague. Those problems you mention are not hard to solve: "device signature"? You just set a cookie. Connection dropped? Cookie got you covered. New tab? Cookie got you covered. Refresh? Cookie you got covered.

Other interesting technologies are:

Twilio's network traversal service: https://www.twilio.com/stun-turn

Agora's higher level products (e.g. video call, voice call) https://www.agora.io/en


Essentially we enable in-browser comms (including but not limited to WebRTC for video and audio streams on top of storefronts).

Given we allow anonymous connections, we need to associate each WebRTC connection with user defined data (read user profile). It's not quite as simple as "a cookie" because one user can have multiple devices, updated user information has to sync across the other connections and for a smooth experience you have to have synced connection statuses.

We did look at syncing all this with RTC data channels, problem... you can't get message history and you also can't depend on the channel until after a successful negotiation, which again for us is only part of the larger infrastructure...

This forces the use of a parallel comms system such as websockets, allowing for event based synchronisation as well as the organisation of the WebRTC metadata both pre and post connection...

Most people don't want "naked javascript" with two faces on it, and WebRTC is a fantastic tool for video and audio streaming, however it is limited in its wider use (which is perfectly fine it does enough!)...

I think the problem is that people associate "video chat" with simply the media streaming, whereas the reality is that integrating it into a feature rich front end framework is significantly more complicated, and not simply a case of "adding a cookie"

The difference between the solutions you posted and websockets is as far as I can tell, "your own websockets" or "pay someone else to run your websockets".


What do you mean by anonymous connections? Without them being logged in and you actively tracking them, it is anonymous. You'd be reinventing the wheel to de-anonymize the user if you want to track users across devices, which is certainly not anonymous: existing companies use advertising IDs or cookies, based on the problem. There is no way you can identify users across devices devices (or solve this problem better) than Google and Facebook, since you run in 1 application, they run in almost all of them.

"We did look at syncing all this with RTC data channels,", that's when you use a reliable service with additional functionality like history and presence, not WebRTC data channels, that might be why you struggled. It sounds like you should be using WebSockets for this type of data.

It sounds like you're trying to build chat for ecommerce websites, but isn't that Intercom, tidio.com (free tier alternative). Agora is lower level, but also solves these problems and more: messaging, audio, video calls. I don't think any of these offer cross device identification without having users log in on all their devices.


"I don't think any of these offer cross device identification without having users log in on all their devices."

Exactly, we have...

I wrote a little blog about it: https://yown.it/live-video-call-webrtc

If that doesn't explain it well enough I'll just assume you see being intentionally aggressive!

None of the above solutions enable users to easily manage personalized commerce experiences without paying a developer!!


Since you advertised yourself as a solution to some problem, I first wanted to find out what the problem was, and then see how you are solving it. I don't know either at the moment (I've read the product hunt too). I did visit the yown.it website, and still didn't understand. Now I have read that blog post, and I still don't understand. That blog post served to explain that you didn't read enough about WebRTC before trying it. You didn't know that WebRTC doesn't specify signalling, but this quite literally a basic concept in WebRTC, have a read of the introduction section. https://www.w3.org/TR/webrtc/#introduction

> Exactly, we have...

I guess it also works when using TOR...


Another great alternative: https://jitsi.org/projects/


https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/... also the term for a signalling server is "signalling" the term for negotiation is "negotiation".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: