Hacker Newsnew | past | comments | ask | show | jobs | submit | ikari_pl's commentslogin

I think the only problem may be how it's phrased. I don't mind technology checking if I'm alive and awake while operating a two tonne ballistic bullet in publicml.

I do mind, however, if the data is not immediately discarded, once it does its real-time safety purpose.


To me, as a non-native speaker it feels like a series of interruptions and focus changes with no natural flow. Hard to follow.

This is basically the MICR font: Magnetic Ink (!) Character Recognition. Amazing idea.

https://en.wikipedia.org/wiki/Magnetic_ink_character_recogni...


As a perfectionist, I twitched ;-)

How do you define a new idea?

To me, it's rearranging the information you had in a way that hasn't been applied or published before.

That's literally what LLMs are built for.


Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.


> Claude keeps its responses focused and concise so as to avoid potentially overwhelming the user with overly-long responses. Even if an answer has disclaimers or caveats, Claude discloses them briefly and keeps the majority of its response focused on its main answer.

I am strongly opinionated against this. I use Claude in some low-level projects where these answers are saving me from making really silly things, as well as serving as learning material along the way.

This should not be Anthropic's hardcoded choice to make. It should be an option, building the system prompt modularily.


Agreed. Sprawling system prompts like that are building for the least common denominator, nerfing for anyone or anytime going further.


You do realize that similar biases are also present in the training data?


I do, inevitable, but ime the prompts force certain behaviors at similar strength (instruction following). So it's one thing that the model is biased towards any particular direction by its latent space, it's another that it is biased by an immodifiable prompt which can only be contradicted for the benefit of the lcd at the expense of the more involved operator.


Sure, but now we have to remodel whatever bias we want for our use case with every new release because the system prompt changes, whereas the underlying data does not.


Underlying data changes all the time, as do training methodologies / preferences.

You do realize that these LLMs are trained with a metric ton of synthetic examples? You describe the kind of examples / behavior you want, let it generate thousands of examples of this behavior (positive and negative), and you feed that to the training process.

So changing this type of data is cheap to change, and often not even stored (one LLM is generating examples while the other is training in real-time).

Here's a decent collection of papers on the topic: https://github.com/pengr/LLM-Synthetic-Data


Well, I'd say it's a reasonable expectation for the model to behave similarly across releases. Am I wrong to assume that?

I imagine the system prompt can correct some training artifacts and drive abnormal behavior to the mean in the dimensions that Anthropic deems fit. So it's either that they are responding to their brittle training process, or that they chose this direction deliberately for a different reason.


agree!

For low level I recommend to run tests as early as you can and verify whatever information you got when you learn, build a fundamental understanding


Use the API then.


RIP bank account!


I usually need to remind it 5 times to do the opposite - because it makes decisions that I don't like or that are harmful to the project—so if it lands in Claude Code too, I have hard times ahead.

I try to explicitly request Claude to ask me follow-up questions, especially multiple-choice ones (it explains possible paths nicely), but if I don't, or when it decides to ignore the instructions (which happens a lot), the results are either bad... or plain dangerous.


it is a big problem that many I know face every day. sometimes we are just wondering are we the dumb ones since the demo shows everything just works.


"Approaching for landing"

"500 Our Servers Are Experiencing High Load"

"500 Our Servers Are Experiencing High Load"

"500 Our Servers Are Experiencing High Load"


We do.

I work with 3-5 parallel sessions most of the time. Some of the projects are related, some are not, some sessions are just managing and tuning my system configuration, whatever it means at a given time.

It doesn't feel weird to me.


3-5 parallel sessions for 8 hours a workday, fine. 5x8x20 = 800. How do we get to thousands?!


In my OP I mention this is aggregated across both work + personal, so the comparison of just 8 hour workdays 5 days a week isn't accurate.

Running some `/stats` on my work computer shows for the last 30 days:

* Sessions: 341

* Active days: 21/30

* Longest session: 3d 20h 33m (Some large scale refactoring of types)

So I'm running a little over 10 sessions a day, each session varies from something like 1-2 hours to sometimes multiple days if it's a larger project. Running `/clear` actually doesn't start a new session fwiw, it will maintain the session but clear context, which explains why I can have a 3 day long session but I'm not actually using a single context window.

On the personal side I have activity in 30/30 of the last days (:yay); I've been learning game dev recently and use Claude a lot for helping digest documentation and learn about certain concepts as I try to build them in Unity. One of my more interesting use-cases is I have three skills I use during play tests:

* QA-Feedback: Takes random thoughts / feedback from me and writes to feedback markdown files

* Spec-Feedback: Loops every minute to grab a feedback item and spec out the intention / open questions

* Impl-Feedback: Loops every minute to grab a spec, clarify open questions with the user (me) first, then create an implementation plan

So I might have a friend play my game and I'll generate 20-30 items of feedback as I watch them play the game, things like minor bugs or mechanics in general. Over the course of the day my Claude will spec and plan out the feedback for me. I have remote sessions always on so I can use my phone to check in on the implementor job and answer open ended questions as they come up.

By the following day I'll usually have a bunch of plans ready for Claude to work on. I'll send agents off to do the simple ones throughout the day (bugs) and work with Claude on the bigger items.

Sorry for the long winded explanation but trying to convey the level of usage I have w/ Claude code. I do admit "thousands" is hyperbolic, as I'm probably only nearing 2k session hours in the most extreme months but I would say I on average use Claude every day to some capacity, often times both during work and after work (for my hobbies).


Great, thank you for the detailed response! The biggest difference in our use is your "loops every minute", which I've not been willing to try yet (even with me at the helm, Claude might try to make a fairly straightforward bugfix in a cracked-out way and I have to steer it in the right direction).


Np!

I also love using `/loop` at work on combination with a PR maintenance skill, helps me push up changes initially and have a session automatically monitor + fixup a branch to get it passing before I review it myself and then later send off for a human review.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: