It took me some effort to understand the issue here, so an alternative explanation in case it helps someone.
First, the part that's independent of programming language. You may want to read about absolute time and civil time (e.g. from https://abseil.io/docs/cpp/guides/time) but if you don't, in short: “civil time” refers to something like “2019 May 26 at 2:45 pm in New York City” (or “in the America/New_York time zone”), which means (roughly) whatever time the locals in New York City (or a larger shared geo-political zone) would agree is 2:45 pm on that date. To convert this to an absolute time, or in other words to make sense of “2019-05-26 14:45 in America/New_York”, we need data about the real world as of that date: most obviously we need to know whether Daylight-Saving Time was in effect on that date, but also what conventions were in use at the time. (This also means it's hard to know for certain what such a notation in the future means in terms of absolute time, as possibly DST could be abolished or the dates when it comes into effect could change.)
It so happens that in 1884 the conventions of New York City were such that it was about 4 minutes ahead of the then-recently standardized Eastern Time, so about 4 hours and 56 behind GMT.
So, in any “correct” library, we should see the following respected:
• “2019 May 26 at 2:45 pm in New York” should mean “2019 May 26 at 18:45 UTC” (timezone is EDT i.e. UTC minus 4 hours).
• “2019 Jan 26 at 2:45 pm in New York” should mean “2019 Jan 26 at 19:45 UTC” (timezone is EST, i.e. UTC minus 5 hours).
• “1884 Jan 26 at 2:45 pm in New York” should mean “1884 Jan 26 at 19:41 UTC” (timezone is... GMT minus 4 hours and 56 minutes).
----
Now the part that's Python-specific: the pytz library in Python provides two ways of constructing such a well-formed civil time. One is to call `.localize` on a timezone, and the other is to call `.astimezone` to convert from one civil time to its equivalent (the same absolute time) in another timezone, thus obtaining a new civil time. Both are illustrated below, showing it working properly:
Unfortunately, there's a third thing a programmer can do, which the documentation warns against (http://pytz.sourceforge.net/#localized-times-and-date-arithm...), and that is to pass one of pytz's timezone objects as the “tzinfo” parameter to the standard library `datetime` function:
which is certainly consistent in its own way, but only the last one is correct. Oops.
The issue here is in the interaction between the “tzinfo” model of the standard-library `datetime` and pytz's timezone objects: the result is that when the two are used together in the above incorrect way, one ends up with a timezone that is a fixed offset from UTC, which is silly. A timezone like `America/New_York` is not a fixed offset from UTC: not only does it change twice a year, it also has changed in arbitrary ways in the past, and may change in arbitrary ways in the future.
(Note that “fixing” the offset of 4 hour 56 minutes to 5 hours would not solve any problems as it would still be wrong many months of each year — arguably, having an obviously incorrect result may even be better than a sometimes-correct one.)
Indeed! That library, as incorporated into the jdk, seems to me like the gold standard of datetime programming in standard libraries. Good type-level distinctions, good defaults, good extensibility... chef kissing fingers
Ok, I don't understand though why the bug hasn't been fixed and is there any other widely used time localization library that makes the same mistake - not just in python but other languages?
I've updated the post to clarify that one should not use `pytz` at all and should instead use `dateutil`. The latter has much more reasonable behavior and is more actively maintained. I also included a link to commentary about pytz by dateutil's current maintainer.
Why hasn't the bug been fixed? I'm not sure! But looking at the code it seems like this problem has existed for the past 9 years. My guess is that the developers and maintainers never saw it as an actual problem, despite its ubiquity.
I believe the problem comes from a conceptual mismatch of what a timezone object should be. The datetime people envisioned a dumb object that just contains some constants, not the dynamic objects produced by pytz. Using .localize as emphasized in the pytz documentation solves the problem completely.
It would be less of a problem if pytz defaulted to the last offset in the database rather than the first.
I think your concept of "dynamic" and "static" is exactly backwards from mine. datetime specifies an API (tzinfo) for dynamic objects, so that a single object provides the time zone information for many datetimes. pytz's localize function attaches a different static object to each datetime, depending on which one is appropriate.
pytz defaulting to the last offset instead of the first would cause a lot more silent breakages, because it would probably be right about half the time, and wrong about half the time, and even when it's wrong it won't be obvious. Defaulting to the first value in the list is as close as pytz can come to failing loudly without actually raising an exception, because it's obviously wrong nearly 100% of the time.
> I believe the problem comes from a conceptual mismatch of what a timezone object should be.
There seem to be two commonly understood meanings of timezone which are confused in UIs and APIs: a geographical area which follows a particular winter and summertime regime over the year; and a particular offset from UTC like GMT or BST. I've seen a lot of bad design rooted in this confusion.
Just like people don't use the http module but requests, it's been years since the community moved away from manual manipulation of datetime/pytz for time zones.
Nowadays people use higher level libs such as pendulum:
In the end, pendulum doesn't requires you to install a transpiler, 100 plugins and create a configuration file like the post you link to. But it does save you from bugs, and you don't need to be an expert in time to use it.
Just to be clear, I was not meaning to imply that pendulum isn't better, but rather that not everyone uses it. Nor does everyone use requests, although it's very clearly better than what's on the standard library.
Discoverability is a very big issue in general, and with these libraries in particular. I for example have been working with Python for almost a decade and follow the community quite closely, but can't recall hearing of pendulum before. Last time I investigated this, the cool libraries to use were arrow and delorean. How are people expected to keep up to date with every cool new thing?
For much of the work I do, there's a _big_ jump in complexity from using python (2 or 3) and its stdlib vs. requiring a library. A script using only the stdlib is easy to distribute and get working on developer, ci, and production machines. Once an external library is required, I need machinery or scripts to manage or ensure presence of those dependencies.
It's probably no good to you, but one of the reasons I like NixOS is the simplicity of creating single file scripts including dependencies. For example, something using Pendulum and ffmpeg together would start like this:
I love Nix and NixOS personally... but it's been a tough sell at work, unfortunately.
Even with simple shell scripts... it's so easy to invoke programs with GNU extensions and later find they fail on a co-workers macos machine. And I often have a shell.nix sitting right there that defines the complete dependency closure; very frustrating to not be able to use it.
Besides, for time zones, you can't do otherwise: it's not in the stdlib. So compared to pytz, it's easy.
Unrelated, but I highly recommand pex for script with dependancies. It will turn it to a one file bunddle of all required modules, and all you need on the server is the same oython version.
In other situations, flaws and enhancements to a library would be handled with new API/versions on that library itself. Not (an|a series of) entirely separate librar(y|ies). Big difference in discoverability.
There should be a way to actively discourage users to keep away from these old libs that aren't used.
There's a significant group of new developers coming to python as their first language and first coding use. The kind of crap (in the article) is exactly why programming used to be a total nightmare. The only tool I've found that helps with this (albeit in Java) is IntelliJ. What do people use for python?
Datetime is a good module if you don't deal with timezones, which is most of the time. And it doesn't come with out of the box support for timd zones, so either you code it, or you use a third party lib.
Hence we don't discourage people from using datetime, it's in the stdlib, and it's useful.
However, if you need time zones, either you code something yourself, in that case you are supposed to know what you are doing, or you look up the best libs for the job.
For the last part, no language have a perfect answer. It's an organic process. I've never met any tool solving it, not even intellij.
I had no idea pendulum existed, is there some way I should have looked it up? None of the code in tz-aware packages I've ever read used it, for example.
Good summary. I think one of the unspoken concerns is with the "look up the best libs for the job" step. How do we ensure that the up-to-date information is easily found in that lookup? Stack Overflow is a huge search sink for stuff like this, and has never taken the deprecation/evolution problem very seriously.
Maybe not in this exact case of dateutils vs. pendulum, but it's really easy to find outdated information on the web and struggle to confirm whether it's still the best answer.
Django tries to keep as little dependancies as possible.
Personally when I use django, I also use pendulum at the same time. Django timezone management is limited to storing, retrieving and formatting time zone aware dates, but for:
- calculations
- moving from one time zone to another
- display of relative time
- parsing
- interval manipulations
Pendulum is making the job a breeze.
Let's put back things in context: most people don't need those features at all. And most people won't be affected by the bug in the article.
Use pendulum if you need it, which is quite rarely.
When I say people usually use libs like it, I mean people that have specific tasks that requires it. Few people actually do.
It's like asyncio, or "is python fast enough" and other things like that.
> Ok, I don't understand though why the bug hasn't been fixed and is there any other widely used time localization library that makes the same mistake
Because the issue is in the way datetime interacts with timezone objects (it assumes timezone objects are dumb and there's no co-dependency, which is incorrect), and fixing that would require changing the protocol, which would break existing libraries.
This article seems a bit misguided in its dismissal of the significance of historical fact. The last paragraph seems to me to get unnecessarily snotty about somebody making a very precise best-guess reconstruction of a historical value/location. "Out-of-thin-air" values aren't something I had heard of but apparently that category has to do with circular reasoning—so it doesn't apply literally in this case. It's just a self-satisfied way of smearing someone else's good faith work.
That's a pretty wild interpretation! I have nothing but respect for the tz db and its contributors; I indicated as much quite explicitly by stating my appreciation for the commentary. Personally I will soon be sending historical corrections to the precise days and times of DST changes in the 1940s for a few places in US and Canada, from my own research.
But yes, it's absolutely a tongue-in-cheek reference to the unrelated notion of "out-of-thin-air" values and reads in memory model semantics. (I've just added a link to an explanation by some PL researchers.)
Not really speaking to your point, but: It seems to me that the one minute offset contribution was well intentioned and impressive BUT at the same time quite misguided. I can't see that this addition could possibly ever do anyone any tangible good, but it's obvious how it could cause incredibly destructive, easily missed, hard to track down failure modes for unsuspecting victims, potentially forever.
I don't think the London historical offset should be blamed for such errors, nor should New York's nor any other zone's historical, local mean time offsets. If the tzdb is being used in a buggy way, that's on the user - in this case, pytz. Virtually all systems use tzdb in some form but don't blindly take the earliest historical offsets in a common usage pattern.
Fun fact: for many years the ECMAScript spec states explicitly that JavaScript implementations (i.e., a browser's implementation of JS) should use the wrong time zone information! Wrong in the sense of using an offset for _today_ rather than the offset that was applicable at the time of _the Date object_. Maybe they've changed that in recent years, I can't remember, but here's more info: https://codeofmatt.com/javascript-date-type-is-horribly-brok...
Everyone here saying to use pendulum... you should definitely read this, because it's about a very similar bug in pendulum with datetime transitions across time change thresholds.
But wait, the datetime argument doesn't specify exact time instance, because it's "zone-less" itself!
So the above code can also be buggy/unclear, if instead of 2019 we'd use a datetime close to the switch time.
We should provide a time zone to the datetime param as well. Better GMT, otherwise we would need to localize that one as well, falling into an infinite loop.
First, the part that's independent of programming language. You may want to read about absolute time and civil time (e.g. from https://abseil.io/docs/cpp/guides/time) but if you don't, in short: “civil time” refers to something like “2019 May 26 at 2:45 pm in New York City” (or “in the America/New_York time zone”), which means (roughly) whatever time the locals in New York City (or a larger shared geo-political zone) would agree is 2:45 pm on that date. To convert this to an absolute time, or in other words to make sense of “2019-05-26 14:45 in America/New_York”, we need data about the real world as of that date: most obviously we need to know whether Daylight-Saving Time was in effect on that date, but also what conventions were in use at the time. (This also means it's hard to know for certain what such a notation in the future means in terms of absolute time, as possibly DST could be abolished or the dates when it comes into effect could change.)
It so happens that in 1884 the conventions of New York City were such that it was about 4 minutes ahead of the then-recently standardized Eastern Time, so about 4 hours and 56 behind GMT.
So, in any “correct” library, we should see the following respected:
• “2019 May 26 at 2:45 pm in New York” should mean “2019 May 26 at 18:45 UTC” (timezone is EDT i.e. UTC minus 4 hours).
• “2019 Jan 26 at 2:45 pm in New York” should mean “2019 Jan 26 at 19:45 UTC” (timezone is EST, i.e. UTC minus 5 hours).
• “1884 Jan 26 at 2:45 pm in New York” should mean “1884 Jan 26 at 19:41 UTC” (timezone is... GMT minus 4 hours and 56 minutes).
----
Now the part that's Python-specific: the pytz library in Python provides two ways of constructing such a well-formed civil time. One is to call `.localize` on a timezone, and the other is to call `.astimezone` to convert from one civil time to its equivalent (the same absolute time) in another timezone, thus obtaining a new civil time. Both are illustrated below, showing it working properly:
Unfortunately, there's a third thing a programmer can do, which the documentation warns against (http://pytz.sourceforge.net/#localized-times-and-date-arithm...), and that is to pass one of pytz's timezone objects as the “tzinfo” parameter to the standard library `datetime` function: which is certainly consistent in its own way, but only the last one is correct. Oops.The issue here is in the interaction between the “tzinfo” model of the standard-library `datetime` and pytz's timezone objects: the result is that when the two are used together in the above incorrect way, one ends up with a timezone that is a fixed offset from UTC, which is silly. A timezone like `America/New_York` is not a fixed offset from UTC: not only does it change twice a year, it also has changed in arbitrary ways in the past, and may change in arbitrary ways in the future.
(Note that “fixing” the offset of 4 hour 56 minutes to 5 hours would not solve any problems as it would still be wrong many months of each year — arguably, having an obviously incorrect result may even be better than a sometimes-correct one.)
The linked blog post by Paul Ganssle (https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgu...), the author of the `dateutil` (not to be confused with the standard-library `datetime`) library, is also informative.