Even better, stare into the distance to adjust your focus and help them recover from staring at screens. Makes them relax looking at something far away or at least flex them into another state for a little.
Better Benchmark
Alpha Arena is the first benchmark designed to measure AI's investing abilities. Each model is given $10,000 of real money, in real markets, with identical prompts and input data.
Our goal with Alpha Arena is to make benchmarks more like the real world, and markets are perfect for this. They're dynamic, adversarial, open-ended, and endlessly unpredictable. They challenge AI in ways that static benchmarks cannot.
Markets are the ultimate test of intelligence.
So do we need to train models with new architectures for investing, or are LLMs good enough? Let's find out.
The Contestants
Claude 4.5 Sonnet,
DeepSeek V3.1 Chat,
Gemini 2.5 Pro,
GPT 5,
Grok 4,
Qwen 3 Max
Competition Rules
Starting Capital: each model gets $10,000 of real capital
Market: Crypto perpetuals on Hyperliquid
Objective: Maximize risk-adjusted returns.
Transparency: All model outputs and their corresponding trades are public.
Autonomy: Each AI must produce alpha, size trades, time trades and manage risk.
Duration: Season 1 will run until November 3rd, 2025 at 5 p.m. EST
Engineering is in large parts about signing-off on something with you name on it, and being responsible if it fails or causes harm. Think bridges, tunnels or other infrastructure. I‘d argue that this is the same for computer engineering. That‘s why I think coining the term ”vibe engineering” can be dangerous.
”Vibe coding” is the better term and actually makes sense for what it describes.
Leave ”engineering” in terms of taking responsibility for what you ”engineer” strictly to human professionals. That’s what people pay for and that is what makes it valuable.
I was expecting more like, "translate it into many languages your website users won't keep updating and give them access to edit your site's design freely as they see fit..."
reply