You wouldn't download the lot, methinks. Not unless you have a big ole cluster to handle it.
The index is only 12GB and contains enough metadata that you can whittle it down to a subset, pull the comments and filter based on those, and ultimately produce a list of photo IDs to grab from the collection. That's a couple of day's work for a grad student, it's not even Big Data.
The index is only 12GB and contains enough metadata that you can whittle it down to a subset, pull the comments and filter based on those, and ultimately produce a list of photo IDs to grab from the collection. That's a couple of day's work for a grad student, it's not even Big Data.