Any clue how hard it would be to create an open-source alternative to the DarkSky API? My company needs one, too. Maybe we could band forces and organize an open-source alternative. I know it won't be as good, but I'm curious what other people in our position think...
Hi, I make the toolkit that Dark Sky (and others) use for their visuals. Been in the weather/aviation mobile space a long time.
I'd floated a Weather-Service-In-A-Box to my clients a few months ago, but got no takers. The idea was something you could set up on a small number of instances fairly quickly and leave running to harvest NOAA/ECMWF/etc data. Fairly simple front end to answer queries. Data tiles for the visuals. That sort of thing.
If people (with money) are interested now, hit me up.
I process weather simulation data from around the world (22 data sources). My servers process around 400-500 GB of data a day and I store about 120 GB which gives me about 5 days of historic simulation data and up to 16 days forecast data.
Storage and processing is relatively easy since it's just a matter of throwing resources at it. However, it'll be damn expensive to store months or years of historic simulation data.
The difficult part is writing and maintaining the processing scripts. The different weather services store data in different formats, have different ways to download the data and sometimes change data structures or fall over.
If you wouldn't mind: how do you go about creating a simulation around the empirical data? This seems to be very core to how Dark Sky is able to provide such accurate realtime data for geographies in between weather stations.
I think it will be pretty hard to do well in open source though. Server costs and keeping things working when NOAA (etc.) sources change is probably going to be expensive.
There's no reason a running service couldn't also opensource its back end code. It does provide an avenue for people to tinker, potentially improve, and self-host if they have the resources available or the service itself goes under.
There are also a lot of open projects that distribute the work of handling tweaks to parsers and source lists when the upstreams drift.
Even globally I can't imagine the minutely data from reporting stations everywhere being "big data", maybe in the 10s of gigabytes a day (very rough guess, I haven't looked into this specifically)? I'd still be willing to bet API / web requests dwarf the processing and bandwidth requirements of the raw data for a public service like this.
That probably does put it pretty reasonably in the realm of self-hosting if you put a threshold on how much historical data you want to keep and geofence the region you care about.
If it’s open source there’s no specific requirement for the project to provide a hosted instance.
Develop the (Ideally extensible) logic to parse a given government funded weather source into a common format, and have “API consumers” run their own instance of services (or set them up an instance for $ and use the profits to help with project costs).
Not disagreeing, just extending. Dark Sky merges multiple weather data sources, even in a single country, and I use it for a private app that deals with weather all around the world. An open source project could just be code, not a service, but that code should have a config where you specified which sources to fetch and how often instead of choosing "a given govt-funded weather source".
Of course, if it's designed to be extensible then definitely non-public sources could be added, but whether they'd be supplied as included modules/plugins or need to be developed (and then optionally contributed upstream) by users of a given non-public service is a question of practicality and sometimes cost (if it's a paid service an OSS project may not be able to justify the cost of a licence/account for the service to develop/test against).