I'm sure it's not a trivial task, but I think the experience would be much better by treating Siri like a real person.
The thing that bothers me about the whole thing is just how awkward it still is in real life. Honestly, how many of you say "REPLY (monotone voice). Blah blah blah" in any conversation let alone in public.
It'd be much cooler if they allowed you to call Siri by a name.
"You have a text from Bob. It reads: are we still on for today?"
Look at how much Android Voice Actions use the screen as a vital part of the experience, and how Siri commands chain like a conversation. I'd say the latter is more like a revolution than the former.
The very first thing you see in the Apple video is someone hitting the button on his headset to activate Siri. I guess that means you need to hit a screen/hardware button when not using the headset.
In the Android video, most of the screens that echo back the text time out and activate without interaction.
Many of the interactions result in information on the screen (e.g. the detailed weather forecast, or navigating to a webpage) so there's no need to build an interface that works without a screen.
The difference (if there even is one, it's hard to tell from these videos alone) is less drastic than you seem to be claiming.
Apple's approach could well go the way of Applescript where they dress something that is fundamentally not natural language up as if it was, and while it initially seems neat in carefully controlled situations it only leads to problems in reality. Time will tell.
edit: just rewatched the ad, the non-screen responses are often useless e.g. Q: "Is it going to be chilly in Napa Valley this weekend" A: "doesn't seem like it", Q: "How many cups are in 12 ounces" A: "Let me think. Here you go" (displays answer as text on screen).
It's hidden by the editing but the woman who dictates her text clearly hits the "I'm going to talk my phone now" button, then says "send" instead of just hitting the send button that sits an inch higher (obviously this would actually make sense if you were using a headset to hit that button). Because of the editing I thought that the Apple way was smoother than the Android one which requires you to hit a button, then speak, but after a careful review it's clear the differences between the two systems regarding the need to consult/interact with the screen of the phone are mostly in the editing and/or the viewer's imagination.
If it actually works like that, then yes it will be pretty amazing. But right now, it's just an ad. I think the Android demo was at least a bit more honest in showing the slight awkwardness of the interface. Of course, Apple wouldn't let facts get in the way of marketing. You remember the one where soldiers on the front line have time to demo Facetime?
Edit: Remember, they haven't even got predictive text right yet.
The thing that bothers me about the whole thing is just how awkward it still is in real life. Honestly, how many of you say "REPLY (monotone voice). Blah blah blah" in any conversation let alone in public.
It'd be much cooler if they allowed you to call Siri by a name.
"You have a text from Bob. It reads: are we still on for today?"
"Hey Siri, tell him we're still on. Thanks."
Alright, it's way too late.