My 60-year-old mom isn't tech savvy and always asks me for help with her computer. You wouldn't expect her to know about Sam Altman, but she's actively sending me articles about this fiasco.
"This is an important part of the presentation, but I just want to note that Microsoft is having to carefully explain how its new search engine will be prevented from helping to plan school shootings.
"Early red teaming showed that the model could help plan attacks" on things like schools. "We don't want to aid in illegal activity." So the model is used to act as a bad actor to test the model itself."
If ChatGPT is still susceptible to simple prompt engineering attacks like DAN, I don't feel confident that their safety system is actually going to be robust enough to prevent malicious use.
I agree, and the argument OP is making sounds similar to book banning - let's ban the Anarchist's Cookbook so people won't be terrorists isn't actually sound logic.
Yeah. We should probably delete all those pages on Wikipedia. Like, all of them. And Google Maps, too. Streetview? Another nightmare waiting to happen.
What counts as "malicious" use? We all agree that school shootings should be prevented. But would it be malicious if Ukrainian military personnel used a LLM for advice on the best way to kill Russian invaders?
Facebook used to have a moderation policy banning promotion of violence. But then they made an exception and decided that urging the deaths of Russian soldiers is fine.
Personally I support the right of Ukrainians to defend themselves against foreign aggression. But deciding which forms of violence are justified and which are malicious is obviously highly subjective and contextual. I am uncomfortable with leaving those judgements up to a handful of unaccountable employees in big tech companies.
I feel the problem with red teaming is you need to actually get real red team players to play the game. Normal humans are just too naïve and sheltered in approaches hah.
This is dumb shit just like journalists going onto YouTube every few years and finding incendiary videos. There will always be a way that a person can use something for evil, that’s not the fault of the thing.