Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Source code of the file with the Zune bug (starts on line 249) (pastie.org)
58 points by vaksel on Jan 2, 2009 | hide | past | favorite | 33 comments


Took me a minute to figure out what was wrong. If IsLeapYear() returns true, and "days" equals 366, you get stuck in an infinite loop (you don't modify "days" at all).

This was the first leap year since Zune was released, and yesterday was the 366th day of the year, sooooo....

Also spotted in this code: several "goto"s.


Normally, gotos wouldn't _necessarily_ be bad, but at least one of these is really really bad.

Consider the function OALIoCtlHalInitRTC, lines 122 and 139-143. The last else block has no obvious terminator, but it looks like the else is empty because of the comment and the indentation. So the line after the label cleanUp is the body of the else. This means that if the if on line 131 is true, the cleanUp code never gets executed. Now this may be intentional (but I doubt it since the label is named cleanUp), but that should be documented in the code.

I consider this use of a goto to be legitimate, since it avoids duplication of the cleanup code (and the last thing you want is to have multiple similar blocks of code that need to be maintained/modified in tandem). This idiom often appears in the linux kernel IIRC. But this code is quite terrible since it doesn't use the idiom correctly, produces an unexpected execution path, is a problem to maintain, and is a problem to read. It seems that it ends up being a no-op since the line after the cleanUp label is a debugging line or something, but that doesn't make this a good practice.

But really, this isn't really a problem with the use of goto, it's a problem with the optional braces on single statement blocks in C. The goto and the label chosen just make it harder to spot. I've made it a habit to use braces around all blocks, even empty ones, to make the intent clearer and avoid this problem.


OALIoCtlHalInitRTC -- yeah, there's a great function name.


OAL is obviously a module name, though I can't figure out what it is. "Io Ctl" seems to be something with "IO control", although I can't figure out what IO has to do with it.

The rest reads "HAL (hardware abstraction layer) initialize RTC (real-time clock)".. and that's what it does.

This is from five minutes looking at the source. Very little college embedded programming. What more do you want?



From that function's comments:

// This function is called by WinCE OS to initialize the time after boot.

So how about calling it:

init_time

or maybe

init_time_after_boot


The reason it's not calleld init_time_after_boot - well, first of all Microsoft coding standards call for CamelCase. But secondly, there are several "times", and there are several ways to "initialize" something, and there are several meanings to "boot."

You're either going to have a naming conflict, or you're going to have a bunch of idiots who wanted to initialize a different clock, who start posting to forums saying "i don't understand???! i called init_time waht's wrong???"

Sure, they could call it

SendTheIOCTLCommandToInitializeTheRealTimeClockForTheOEMAbstractionLayer

but sometimes the names of functions are going to be ugly.


OAL is "OEM abstraction layer" and IOCTL probably means that the RTC is set by sending an IOCTL command.


...since it avoids duplication of the cleanup code (and the last thing you want is to have multiple similar blocks of code that need to be maintained/modified in tandem).

Isn't that the definition of "when to extract code into its own function"?


That may be hard to do if it's cleanup code, as the scope changes. Plus, it divorces, visually, the cleanup code from the main body. And there may be additional function call overhead (which may be avoidable by having the compiler in-line it). One reason to use goto is the style that all functions should have one entry and one exit. Even if the cleanup code was put into its own function, you'd still have to ensure that the function was called in all the right places before returning. Without gotos, the logic can become harder to follow through a series of if-else blocks.


In the case of clean up code where the duplicated code is usually specific to the current scope putting it in a function instead of a goto doesn't buy you anything really.


A lesson in code smell and good style: I took one look at the code and (before noticing the bug) thought, “why is there an inner if instead of just using a single conditional with a logical AND?”

That is to say, I could have fixed the bug without knowing what it was. (I'm a hs student and don't do much bug-shooting in code that wasn't written by me or my close friends, so I don't know if this is common.)


Unfortunately, it's much easier to introduce new bugs when refactoring other people's code.


C lets a lot of exciting things happen inside an if statement. I once spent a couple hours cleaning up a big, ugly and unfamiliar codebase with transformations like that before I realized I was ignoring some side-effects and was therefore introducing bugs willy-nilly. (Isn't version control great?)


Last I checked, there haven't been any major overhauls to the Gregorian calendar since C was invented. So why aren't we all using the same, fully-debugged date library yet?

Edit: Actually, I'm cutting it close :-). The last time we changed the calendar was in 1972, when we settled on integer leap seconds -- the same year that C was invented.


Hear, hear. It's strange how often we have to get bitten by date and time bugs. I think people ignore the wisdom of Peter van der Linden in the excellent book "Expert C Programming":

"Anyone who thinks programming dates and times is easy probably hasn't done much of it."


> Line #263: if (days > 366)

This should work if that was a >= sign. And people at work don't believe me when I say how insanely careful I have to be when I write any code, especially code that deals with people's payroll and vacation time. One missing equals symbol and every Zune in the world froze. One missing equals symbol and every employee could lose 1/12 of their accumulated vacation time.

The ability to pay attention to "mundane details" is just as important as the ability to envision the whole software application from the project manager point-of-view, especially when working in smaller companies without additional oversight.


Making that ">=" would make that entire "if" check redundant, since we already know it's at least 365 (because of the "while" condition)

I think the "while" condition should actually be

    while ((days > 365 && !IsLeapYear(year)) || (days > 366 && IsLeapYear(year)))
(or perhaps some simplification that I'm too lazy to figure out right now)

And then the inner "if" condition can be eliminated. Thus the final code could be:

    while ((days > 365 && !IsLeapYear(year)) || (days > 366 && IsLeapYear(year)))
    {
        if (IsLeapYear(year))
            days -= 366;
        else
            days -= 365;
        year += 1;
    }
This seems logical and easier to understand than the original too.

Edit: if you want to get ternary-operator fancy:

    while (days > (IsLeapYear(year) ? 366 : 365))
    {
        days -= IsLeapYear(year) ? 366 : 365;
        year += 1;
    }


Let's be a little more DRYer:

    days_in_year = IsLeapYear(year) ? 366 : 365;
    while (days > days_in_year) {
        days -= days_in_year;
        year += 1;
    }


How about a little more CORRECTer?

#define diy(y) (IsLeapYear(y) ? 366 : 355)

while (days > diy(year)) { days -= diy(year); year += 1; }


For the confused: the year changes each time, so you need to recheck IsLeapYear each time.

But I wouldn't do it with a define but rather:

while (days > (days_in_year = IsLeapYear(year) ? 366 : 365)) { days -= days_in_year; year += 1; }


Incorrect, since you don't update days_in_year on each loop iteration. If you really want...

    while (days > (days_in_year = IsLeapYear(year) ? 366 : 365)) {
        days -= days_in_year;
        year += 1;
    }


I prefer:

  for (;;) {
    int days_in_year = IsLeapYear(year) ? 366 : 365;
    if (days <= days_in_year) break;
    days -= days_in_year;
    year += 1;
  }


> One missing equals symbol and every Zune in the world froze.

Wrong. Completely ignoring all software engineering "best practices" was the problem. They could have reused a working library. They could have written unit tests tests. QA could have tested it. Other developers could have reviewed it, or written test cases without knowledge of how the code worked.

Basically, they ignored every rule for writing good code, and then ended up with a serious bug. Well, yeah. The whole "unit testing" "fad" exists for a reason -- it greatly improves the quality of software.

Many people take software development seriously enough to not let one brain fart cause a piece of software to brick thousands of devices.


Really, this looks like a rather silly way to put this function together. It doesn't even run in constant time! It might be a little complicated, due to the leap year rules, but it would probably not be too difficult to do this calculation without any loops.


No. That's incorrect. That would cause the year to go up on Dec. 31st (day 366), turning it into day 0. In fact, what you want is to change the while to an if statement. If it really does need to run as a while statement (to add more than one year at once?), then the alternative would be a break statement after line 271.


Someone at QA it's going to be fired. Corner cases like 1st and last days of a year(leap and not), 29th of February and the like are the first candidates for proper blackbox testing.

I wouldn't put all the blame on the programmer (who probably is now scared to death).


Source? (journalism, not code)

Do you have the rights to this code?

Some context would be nice...


It's allegedly from a Freescale clock driver (see the comments at the top). I'm trying to find the original on Freescale's site, unsuccessfully thus far.

If it is indeed a Freescale-supplied driver issue, it makes me wonder how many other devices failed yesterday...


I find myself wanting to see the version history for that function. Is that how it was originally written? Was the bug introduced by a previous bug-fix? By a refactoring? Something else?


And another leapyear bug on line 170? The year 2000 was a leap year, but here they seem to have the 0 and the 1 mixed up:

Leap = (Year%400) ? 0 : 1;


How did these guys (pastie.org) get hold of he Zune source code? I don't think Microsoft gives away any of their source to the public (except for some programs covered by MS's shared source license).


Freescale gives away this source for free. It's driver source for some of their hardware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: