Where it's not optimized away, getting "an actual zero" requires a memory operation of some kind. Register ops are faster in that they are right there, no fetch needed.
Depends on the instruction architecture. 68000 had some ways of burying a small literal operand in the instruction. If I recall correctly, MOVEQ.L would let you move zero to a register without touching memory (other than the instruction fetch), and it wasn't a long instruction.
However, moving a zero to a register does take time. Time that would otherwise be used operating with the zero value already present in the zero register.
The second best is what moto did.
As you point out, there is the instruction fetch, which could be the intended operation, rather than developing the zero itself.
On par with that is having enough registers to just hold a zero, and whether that made sense depended on the need and developer strategy.
I am a big fan of the moto CPU's, starting with the 6809. Just to be clear.
But moving it from the zero register to another register would also take time. If what you want is a zero in a register other than the zero register (say, one that is going to serve as the index of a loop, which the zero register cannot do), then MOVEQ should not take any longer than a MOVE from the zero register to another register.
Say we are zeroing memory. No advantage there. Coupla cycles right at the start, then a ton of writes.
Say we are forming a bitmask. Could be an advantage there in that having a zero handy in a register means no fetching one. When a lot of dynamically created masks are needed, this can be a nice gain.
I'm sure we can come up with more. It's not always important, and like you mention with the moto designs, may not matter too much due to many other optimizations possible given a good instruction set.
Some people would rather have the register free for general use! I'm one of those, but if there is a zero register, I use it to get the benefit of it when I can. On the devices I've seen, there are generally a lot of registers so the marginal impact of having a zero register isn't significant. There are plenty to work with.
Maybe I should be clear here too. I personally don't care whether there is one. If it's there, I do things in ways that leverage it, and was just pointing out why devices that have one, ahem... have one! Those that don't may or may not have options that make sense. The way moto did it is very good, and there are other pretty great optimizations possible with their ISA, abusing the stack to write memory, etc...
If not, then I do other things. It's assembly language! Work the chip, right?