Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What the Royal Astronomical Society in 1884 Tells Us About Python Today (typesandtimes.net)
142 points by skilpat on May 26, 2019 | hide | past | favorite | 48 comments


It took me some effort to understand the issue here, so an alternative explanation in case it helps someone.

First, the part that's independent of programming language. You may want to read about absolute time and civil time (e.g. from https://abseil.io/docs/cpp/guides/time) but if you don't, in short: “civil time” refers to something like “2019 May 26 at 2:45 pm in New York City” (or “in the America/New_York time zone”), which means (roughly) whatever time the locals in New York City (or a larger shared geo-political zone) would agree is 2:45 pm on that date. To convert this to an absolute time, or in other words to make sense of “2019-05-26 14:45 in America/New_York”, we need data about the real world as of that date: most obviously we need to know whether Daylight-Saving Time was in effect on that date, but also what conventions were in use at the time. (This also means it's hard to know for certain what such a notation in the future means in terms of absolute time, as possibly DST could be abolished or the dates when it comes into effect could change.)

It so happens that in 1884 the conventions of New York City were such that it was about 4 minutes ahead of the then-recently standardized Eastern Time, so about 4 hours and 56 behind GMT.

So, in any “correct” library, we should see the following respected:

• “2019 May 26 at 2:45 pm in New York” should mean “2019 May 26 at 18:45 UTC” (timezone is EDT i.e. UTC minus 4 hours).

• “2019 Jan 26 at 2:45 pm in New York” should mean “2019 Jan 26 at 19:45 UTC” (timezone is EST, i.e. UTC minus 5 hours).

• “1884 Jan 26 at 2:45 pm in New York” should mean “1884 Jan 26 at 19:41 UTC” (timezone is... GMT minus 4 hours and 56 minutes).

----

Now the part that's Python-specific: the pytz library in Python provides two ways of constructing such a well-formed civil time. One is to call `.localize` on a timezone, and the other is to call `.astimezone` to convert from one civil time to its equivalent (the same absolute time) in another timezone, thus obtaining a new civil time. Both are illustrated below, showing it working properly:

    >>> pytz.timezone('America/New_York').localize(datetime.datetime(2019, 5, 26, 14, 45, 0)).astimezone(pytz.utc)
    datetime.datetime(2019, 5, 26, 18, 45, tzinfo=<UTC>)
    
    >>> pytz.timezone('America/New_York').localize(datetime.datetime(2019, 1, 26, 14, 45, 0)).astimezone(pytz.utc)
    datetime.datetime(2019, 1, 26, 19, 45, tzinfo=<UTC>)
    
    >>> pytz.timezone('America/New_York').localize(datetime.datetime(1884, 1, 26, 14, 45, 0)).astimezone(pytz.utc)
    datetime.datetime(1884, 1, 26, 19, 41, tzinfo=<UTC>)
Unfortunately, there's a third thing a programmer can do, which the documentation warns against (http://pytz.sourceforge.net/#localized-times-and-date-arithm...), and that is to pass one of pytz's timezone objects as the “tzinfo” parameter to the standard library `datetime` function:

    >>> datetime.datetime(2019, 5, 26, 14, 45, 0, tzinfo=pytz.timezone('America/New_York')).astimezone(pytz.utc) # Don't do this!
    datetime.datetime(2019, 5, 26, 19, 41, tzinfo=<UTC>)
    >>> datetime.datetime(2019, 1, 26, 14, 45, 0, tzinfo=pytz.timezone('America/New_York')).astimezone(pytz.utc) # Don't do this!
    datetime.datetime(2019, 1, 26, 19, 41, tzinfo=<UTC>)
    >>> datetime.datetime(1884, 1, 26, 14, 45, 0, tzinfo=pytz.timezone('America/New_York')).astimezone(pytz.utc) # Don't do this!
    datetime.datetime(1884, 1, 26, 19, 41, tzinfo=<UTC>)
which is certainly consistent in its own way, but only the last one is correct. Oops.

The issue here is in the interaction between the “tzinfo” model of the standard-library `datetime` and pytz's timezone objects: the result is that when the two are used together in the above incorrect way, one ends up with a timezone that is a fixed offset from UTC, which is silly. A timezone like `America/New_York` is not a fixed offset from UTC: not only does it change twice a year, it also has changed in arbitrary ways in the past, and may change in arbitrary ways in the future.

(Note that “fixing” the offset of 4 hour 56 minutes to 5 hours would not solve any problems as it would still be wrong many months of each year — arguably, having an obviously incorrect result may even be better than a sometimes-correct one.)

The linked blog post by Paul Ganssle (https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgu...), the author of the `dateutil` (not to be confused with the standard-library `datetime`) library, is also informative.


I believe it was all clarified long ago in Java (Joda library, that was reimplemented in newer versions in core SDK as java.time).


Indeed! That library, as incorporated into the jdk, seems to me like the gold standard of datetime programming in standard libraries. Good type-level distinctions, good defaults, good extensibility... chef kissing fingers


Joda is clearer semantically, even today. The core just can't seem to get certain features right for developers.


Ok, I don't understand though why the bug hasn't been fixed and is there any other widely used time localization library that makes the same mistake - not just in python but other languages?


I've updated the post to clarify that one should not use `pytz` at all and should instead use `dateutil`. The latter has much more reasonable behavior and is more actively maintained. I also included a link to commentary about pytz by dateutil's current maintainer.

Why hasn't the bug been fixed? I'm not sure! But looking at the code it seems like this problem has existed for the past 9 years. My guess is that the developers and maintainers never saw it as an actual problem, despite its ubiquity.


I believe the problem comes from a conceptual mismatch of what a timezone object should be. The datetime people envisioned a dumb object that just contains some constants, not the dynamic objects produced by pytz. Using .localize as emphasized in the pytz documentation solves the problem completely.

It would be less of a problem if pytz defaulted to the last offset in the database rather than the first.


I think your concept of "dynamic" and "static" is exactly backwards from mine. datetime specifies an API (tzinfo) for dynamic objects, so that a single object provides the time zone information for many datetimes. pytz's localize function attaches a different static object to each datetime, depending on which one is appropriate.

pytz defaulting to the last offset instead of the first would cause a lot more silent breakages, because it would probably be right about half the time, and wrong about half the time, and even when it's wrong it won't be obvious. Defaulting to the first value in the list is as close as pytz can come to failing loudly without actually raising an exception, because it's obviously wrong nearly 100% of the time.


Don't guess, fail explicitly.


> I believe the problem comes from a conceptual mismatch of what a timezone object should be.

There seem to be two commonly understood meanings of timezone which are confused in UIs and APIs: a geographical area which follows a particular winter and summertime regime over the year; and a particular offset from UTC like GMT or BST. I've seen a lot of bad design rooted in this confusion.


Just like people don't use the http module but requests, it's been years since the community moved away from manual manipulation of datetime/pytz for time zones.

Nowadays people use higher level libs such as pendulum:

    >>> print(pendulum.datetime(2019, 5, 21, 12, 30, tz='America/New_York'))
    2019-05-21T12:30:00-04:00
It avoids many gotchas, gives you more features and has a nicer API.

Like skilpat said, dateutil is a better fit that pytz, and hence pendulum uses it, as well as pytzdata, to stay up to date.


{{citation needed}}

Your assertion reminded me of this funny dialog about JS: https://hackernoon.com/how-it-feels-to-learn-javascript-in-2...


"pip install pendulum" is not complex.

Using pendulum is not complicated.

You can skim the doc in 5 minutes, your intern can do it too.

This is one of those tools that removes complexity when you use them.

Also, pendulum and the stdlib datetime module are compatible, making migration painless:

    >>> pendulum.now() - datetime.now(pendulum.now().tz)
    <Period [2019-05-26T16:57:35.872732+02:00 -> 2019-05-26T16:57:35.872141+02:00]>
In the end, pendulum doesn't requires you to install a transpiler, 100 plugins and create a configuration file like the post you link to. But it does save you from bugs, and you don't need to be an expert in time to use it.

I see only wins.


Just to be clear, I was not meaning to imply that pendulum isn't better, but rather that not everyone uses it. Nor does everyone use requests, although it's very clearly better than what's on the standard library.

Discoverability is a very big issue in general, and with these libraries in particular. I for example have been working with Python for almost a decade and follow the community quite closely, but can't recall hearing of pendulum before. Last time I investigated this, the cool libraries to use were arrow and delorean. How are people expected to keep up to date with every cool new thing?


> "pip install pendulum" is not complex.

For much of the work I do, there's a _big_ jump in complexity from using python (2 or 3) and its stdlib vs. requiring a library. A script using only the stdlib is easy to distribute and get working on developer, ci, and production machines. Once an external library is required, I need machinery or scripts to manage or ensure presence of those dependencies.


It's probably no good to you, but one of the reasons I like NixOS is the simplicity of creating single file scripts including dependencies. For example, something using Pendulum and ffmpeg together would start like this:

    #!/usr/bin/env nix-shell
    #!nix-shell -i python -p python pythonPackages.pendulum ffmpeg
Then you just put the code after that.


I love Nix and NixOS personally... but it's been a tough sell at work, unfortunately.

Even with simple shell scripts... it's so easy to invoke programs with GNU extensions and later find they fail on a co-workers macos machine. And I often have a shell.nix sitting right there that defines the complete dependency closure; very frustrating to not be able to use it.


OK that's pretty cool


It was compared to the linked article.

Besides, for time zones, you can't do otherwise: it's not in the stdlib. So compared to pytz, it's easy.

Unrelated, but I highly recommand pex for script with dependancies. It will turn it to a one file bunddle of all required modules, and all you need on the server is the same oython version.

It should be in the stdlib really.


Sure, IF you know that you should be using datetime ^D^D^D pytz ^D^D^D dateutils ^D^D^D pendulum.


That's true for absolutely everything in programming.


In other situations, flaws and enhancements to a library would be handled with new API/versions on that library itself. Not (an|a series of) entirely separate librar(y|ies). Big difference in discoverability.


And to solve hunger the best solution is to be able to eat. Yes.


I now have a deeper appreciation for front-end (front-line?) developers.


There should be a way to actively discourage users to keep away from these old libs that aren't used. There's a significant group of new developers coming to python as their first language and first coding use. The kind of crap (in the article) is exactly why programming used to be a total nightmare. The only tool I've found that helps with this (albeit in Java) is IntelliJ. What do people use for python?


Datetime is a good module if you don't deal with timezones, which is most of the time. And it doesn't come with out of the box support for timd zones, so either you code it, or you use a third party lib.

Hence we don't discourage people from using datetime, it's in the stdlib, and it's useful.

However, if you need time zones, either you code something yourself, in that case you are supposed to know what you are doing, or you look up the best libs for the job.

For the last part, no language have a perfect answer. It's an organic process. I've never met any tool solving it, not even intellij.


I had no idea pendulum existed, is there some way I should have looked it up? None of the code in tz-aware packages I've ever read used it, for example.


Looking up is not a solved problem no.

I usually check the "awesome python list", ask on reddit and twitter, and then do some google foo. I select 3 packages and do some tests.

I got nothing better than that.


Another good source is find talks at recent PyCon or PyData conferences that survey that topic, then pick the packages they recommend.


Good summary. I think one of the unspoken concerns is with the "look up the best libs for the job" step. How do we ensure that the up-to-date information is easily found in that lookup? Stack Overflow is a huge search sink for stuff like this, and has never taken the deprecation/evolution problem very seriously.

Maybe not in this exact case of dateutils vs. pendulum, but it's really easy to find outdated information on the web and struggle to confirm whether it's still the best answer.


> Nowadays people use higher level libs such as pendulum

I don't think Django pulls in "pendulum" (someone please correct me if I'm wrong).

If Django isn't using it, I have to question how relevant "pendulum" actually is.


Django tries to keep as little dependancies as possible.

Personally when I use django, I also use pendulum at the same time. Django timezone management is limited to storing, retrieving and formatting time zone aware dates, but for:

- calculations

- moving from one time zone to another

- display of relative time

- parsing

- interval manipulations

Pendulum is making the job a breeze.

Let's put back things in context: most people don't need those features at all. And most people won't be affected by the bug in the article.

Use pendulum if you need it, which is quite rarely.

When I say people usually use libs like it, I mean people that have specific tasks that requires it. Few people actually do.

It's like asyncio, or "is python fast enough" and other things like that.


Is arrow and pendulum same?


Sébastien Eustace created pendulum specifically to address the deficiencies of arrow: http://blog.eustace.io/please-stop-using-arrow.html


This seems very much to underscore last week's comments by Amber Brown re: the problems of the batteries which are included.


> Ok, I don't understand though why the bug hasn't been fixed and is there any other widely used time localization library that makes the same mistake

Because the issue is in the way datetime interacts with timezone objects (it assumes timezone objects are dumb and there's no co-dependency, which is incorrect), and fixing that would require changing the protocol, which would break existing libraries.



This article seems a bit misguided in its dismissal of the significance of historical fact. The last paragraph seems to me to get unnecessarily snotty about somebody making a very precise best-guess reconstruction of a historical value/location. "Out-of-thin-air" values aren't something I had heard of but apparently that category has to do with circular reasoning—so it doesn't apply literally in this case. It's just a self-satisfied way of smearing someone else's good faith work.


That's a pretty wild interpretation! I have nothing but respect for the tz db and its contributors; I indicated as much quite explicitly by stating my appreciation for the commentary. Personally I will soon be sending historical corrections to the precise days and times of DST changes in the 1940s for a few places in US and Canada, from my own research.

But yes, it's absolutely a tongue-in-cheek reference to the unrelated notion of "out-of-thin-air" values and reads in memory model semantics. (I've just added a link to an explanation by some PL researchers.)


Nothing but respect? The word "hobbyist" suggests otherwise.


Not really speaking to your point, but: It seems to me that the one minute offset contribution was well intentioned and impressive BUT at the same time quite misguided. I can't see that this addition could possibly ever do anyone any tangible good, but it's obvious how it could cause incredibly destructive, easily missed, hard to track down failure modes for unsuspecting victims, potentially forever.


I don't think the London historical offset should be blamed for such errors, nor should New York's nor any other zone's historical, local mean time offsets. If the tzdb is being used in a buggy way, that's on the user - in this case, pytz. Virtually all systems use tzdb in some form but don't blindly take the earliest historical offsets in a common usage pattern.

Fun fact: for many years the ECMAScript spec states explicitly that JavaScript implementations (i.e., a browser's implementation of JS) should use the wrong time zone information! Wrong in the sense of using an offset for _today_ rather than the offset that was applicable at the time of _the Date object_. Maybe they've changed that in recent years, I can't remember, but here's more info: https://codeofmatt.com/javascript-date-type-is-horribly-brok...


They should have used UTC for the transitions. This is an entire article built around a faulty presupposition and naive objects.

I wrote about this here a few years ago: https://gordol.github.io/date_time_manipulation.html

Everyone here saying to use pendulum... you should definitely read this, because it's about a very similar bug in pendulum with datetime transitions across time change thresholds.


Anyone know how/if this issue affects Django, which uses pytz?


Irresistible title.


> pytz.timezone('America/New_York').localize( datetime(2019, 5, 21, 12, 30))

But wait, the datetime argument doesn't specify exact time instance, because it's "zone-less" itself!

So the above code can also be buggy/unclear, if instead of 2019 we'd use a datetime close to the switch time.

We should provide a time zone to the datetime param as well. Better GMT, otherwise we would need to localize that one as well, falling into an infinite loop.


pytz.localize() takes a naive date-time as an input.

The pytz docs are pretty much on-point, too:

> "The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans."

This, also, is the problem with this article, and is a really common pain-point across the spectrum of programmers, both new and seasoned.


Yes, but my point still holds, the code above is buggy/unclear. And documentation supports this:

>>> loc_dt = eastern.localize(datetime(2002, 10, 27, 1, 30, 00))

>>> loc_dt.strftime(fmt) '2002-10-27 01:30:00 EST-0500'

> As you can see, the system has chosen one for you and there is a 50% chance of it being out by one hour.

And the solution, you're right, is to not use the code like above. But the article doesn't mention that at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: