Cloud Filestore, high-performance file storage for GCP users

mleonard · on June 26, 2018

A few questions come to mind after reading the docs (minus the currently 404ing ones): (1) is this a zonal or regional product? ie does it replicate data across zones in a region? (2) roughly what latencies should I expect to see? eg for the following: a 1KG, 1MB, 1GB read and write. Any info here would be helpful. (3) does it have close-to-open consistency? or something weaker / stronger? (4) any plans to add gcp pub/sub integration somehow? would be great to be able to subscribe to changes like with other gcp storage products. (5) any plans to move from NFSv3 to NFSv4? (6) backups available or planned as a feature? eg to GCS (7) can you share anything about how it is implemented?

Thanks

jabl · on June 27, 2018

Yeah, considering NFSv4 was published almost 20 years ago (December 2000), not supporting it is kinda disappointing.

kjeetgill · on June 27, 2018

I'm unfamiliar, what does NFSv4 add that Cloud Filestore users will be missing out on?

jabl · on June 28, 2018

A good place to start is to read the introduction chapter of the NFSv4.0 RFC: https://tools.ietf.org/html/rfc7530

Further minor versions add other things that could be useful for cloud usage, e.g. in NFSv4.1 (https://tools.ietf.org/html/rfc5661 ) parallel data access, sessions, improved delegations, and in NFSv4.2 (https://tools.ietf.org/html/rfc7862 ) server-side clone/copy, sparse file support.

pinewurst · on June 26, 2018

Also, are snapshots planned? They're not mentioned now.

puzzle · on June 26, 2018

More questions:

Does it support extended attributes? You'd imagine so, but you'd have also expected NFSv4...

Will it ever support SMB?

Can you set per-UID quota limits, both in terms of bytes and IOPS/disk time?

pinewurst · on June 26, 2018

I'm pleasantly surprised by mere NFSv3. Google doesn't use other people's standards by choice. They probably have an internal file protocol that the Greys gave them to hand over to humanity at a future date.

puzzle · on June 26, 2018

You might be joking, perhaps not too much, but at some point Google used NFS (see https://wiki.ubuntu.com/AndrewPollock2), then replaced it with something else: https://github.com/google/clusterfuzz-tools/blob/master/CONT...

lxglv · on June 26, 2018

who are “the Greys”?

pinewurst · on June 26, 2018

https://en.wikipedia.org/wiki/Grey_alien

bpd1069 · on June 26, 2018

Aliens

lxglv · on June 26, 2018

thank you

tadhunt · on June 28, 2018

Hi mlenord, all,

I work for Google and am the product manager for Cloud Filestore. I would have responded sooner, but I was busy with with announcement related events :)

If you (or anyone reading this) want to discuss any aspects of Filestore, My email is my Hacker news login with at google.com appended.

(1) Filestore is a zonal product and provides high availability (HA) within the zone. We are considering adding regional regional HA, but it's not entirely clear what the use case is, as there is a cost & performance tradeoff. I'd be happy to chat with you in more depth to understand what you would like to see here.

(2) It's hard to be precise about what latencies _you_ will see, as the set of benchmarks and workloads run against NFS is so varied. Anything I say here, will be true for some workload, but undoubtably is bound to find someone who can find a workload where it's not true :). So, TL;DR: YMMV, best to test your workload when the beta launches soon (signup to be notified when it launches in a few weeks at https://goo.gl/forms/Hx6XkobcwNo5DoA33)

(3) We support close-to-open consistency, but it's really up to the client. See this Linux NFS FAQ for details: http://nfs.sourceforge.net/#faq_a8. TL;DR: If you're running a Linux version ≥ 2.4.20, and haven't mounted with the 'nocto' attribute, then yes, you'll see CTO.

(4) We don't have any plans for pub/sub integration on the roadmap, but I'd love to talk to you about the use case (see info about my email addr above).

(5) Yes, we have NFSv4 support on the roadmap. We launched with NFSv3, because it's still widely used, and in many cases customers won't see any appreciable performance delta from NFSv4. That said, we agree that it is very important, and NFSv4 can often help wih some metadata heavy workloads, and has a more extensive authentication and authorization model which some workloads require. Ultimately we made a time-to-market tradeoff.

(6) For backups, we support any of the standard commercial backup software that's certified against GCP and can backup NFSv3 shares. We don't have a native backup solution planned, but we do have snapshots on the roadmap, which in some cases are sufficient.

(7) As to implementation, sorry no, I cannot.

And to answer a few more questions from the nested comments so this is all in one place:

* Snapshots are on the near term roadmap, and are very high priority for us to get supported.

* SMB, extended attribute, and quota support are all on the roadmap, and like NFSv4 are high priority

Unfortunately I can't be more precise about when to expect these features.

-Tad

_wmd · on June 26, 2018

They've undercut Amazon EFS by 33% for the budget option ($.20/gb vs. $.30/gb). Hope this kicks off a little pricing war - shared filesystem is a hugely useful solution for migrating old apps, would be nice if it were cheap. Both options are still grossly overpriced (IMHO)

panarky · on June 26, 2018

Amazon EFS bills for usage while Google bills for provisioned capacity.

So unless you run GCP at a very high % of provisioned capacity, it's more expensive.

I'm pretty confident EFS can't match GCP's 700 MB/s and 30,000 IOPS though.

jkaplowitz · on June 26, 2018

Resizing on demand without downtime up to a maximum of 64TB will allow running at that very high % of provisioned capacity.

halbritt · on June 26, 2018

I did a fair number of tests last year and couldn't get anything like that. If anyone is interested, the poorly formatted output of that is here:

https://github.com/halbritt/benchmarks

tjenkinsqs · on June 27, 2018

Can you please elaborate a little? EFS scales with size, i.e. did you have at least a few hundred GB on there or was it mostly empty? Roughly how many IOPS did you get?

halbritt · on June 28, 2018

IIRC, I populated the thing with 10TB or thereabouts.

For small block sizes, I got hundreds of IOPS.

Latency was pretty terrible in almost all cases.

I wrote a great deal on this topic for internal consumption. Those benchmarks weren't really meant for the general public. My biggest conclusion is that EFS isn't useful for any workload where performance is a concern. Unfortunately, it's priced so high, that I'd never consider using it otherwise.

mankash666 · on June 27, 2018

You can always dynamically resize at the rate of 1GB.

That's still a drawback as your app needs to manage dynamic resizing. While aws may not be more expensive, it's more convenient

chucky_z · on June 26, 2018

We use ~100GB with a lot of in/out for EFS, as a kind of 'nearline S3,' mounted to a hundred ec2 instances or so. It replaced an NFS server doing the same thing, and it ended up being slightly cheaper cause you don't pay for the compute, only the storage (efs vs. ebs + ec2).

Also I haven't had a single issue with performance. It's not particularly busy though (~800 iops?).

toomuchtodo · on June 26, 2018

We’ve seen horrendous performance on EFS unless you’ve provisioned quite a bit of storage on the volume.

X-Istence · on June 26, 2018

Even if you have quite a bit of storage. ~40TB on EFS, and yet performance is abysmal due to the access time being about ~3ms.

So large files work well, but if you have many smaller files it's killer.

We built our app to do caching locally into NVMe drives though, so we async pull data from EFS and push back as necessary.

Definitely going to give Filestore a spin and see if it is a good fit for our use cases.

stonewhite · on June 26, 2018

Underlying S3 bits do bite you when you deal with lots of small files. Our team even considered maintaining NFS or Gluster servers over horrible EFS performance.

objectivefs · on June 26, 2018

Another file system that works well with lots of small files and that doesn't require maintaining any extra servers is ObjectiveFS[0]. It is a FUSE based log structured file system (with snapshots) using GCS or S3 for log storage with memory cache and (optional) disk cache for performance.

[0] https://objectivefs.com

mtgx · on June 26, 2018

As Amazon starts winning fat government contracts for unaccountable sums of money, it will probably care less and less about offering "value" to its smallest private sector customers.

privacypoller · on June 26, 2018

Amazon had better improve its customer service offering, or ramp up its partner network if it wants to win big "enterprisy" clients like government then. The HN crowd may look at the Oracles and IBMs as super-expensive dinosaurs, but they do offer an incredibly high level of help and support.

pnathan · on June 26, 2018

AWS has excellent customer service offerings after a certain price point. I would check the "Business" and "Enterprise" plans for examples of how to get the good support.

You get an account manager rep and upon occasion, interactions with AWS engineers, depending on what you're doing and how exciting it is.

One might say a lot about AWS - but I don't fault them on the customer service front for a serious business spending serious money.

zwily · on June 26, 2018

Amazon already has great customer service for big clients, and they already have lots of government clients too. (Including a completely private region for the CIA and other intelligence services.)

Yeroc · on June 26, 2018

The same applies to Google. Microsoft appears to have the leg up on Oracle & IBM as far as winning "enterprisy" clients.

gigatexal · on June 26, 2018

Ever try getting a SQL support request to an engineer actually at Microsoft and not one outsourced and get a response within weeks? Yeah, not easy. Or if you get a response it's the equivalent of "have you tried turning it off and then on again?".

manigandham · on June 26, 2018

Yes, we had a SQL support case replied to by email, and then followed-up via phone/screenshare to have it resolved that day.

The only constant with tech support is that it's highly variable and depends on many factors. Enterprise MSDN/SQL support is not bad though and you should've gotten a quick reply if you're using the proper channels with an appropriate support plan.

gigatexal · on June 26, 2018

The shop I came from has been a Microsoft and VMware shop for years and we spend millions a year on MSDN licensing. The support has been kinda shitty. But I’m not there anymore. I do remember in DBA meetings laughing about how crappy it is. And why do we pay so much for offshore support techs when they should be local on premises engineers.

qaq · on June 26, 2018

Amazon does too when you are well into few mil. per month range they are pretty responsive (used to work for smaller shop 50k/month range spend AWS) the difference is night and day.

oh-kumudo · on June 26, 2018

Are they? I think Startup customers are very valuable since they are most cost-effective thus providing very good feedbacks.

paulddraper · on June 26, 2018

EFS is very slow. It doesn't work with AWS Fargate or Lambda.

I'm sure there is some good use for it; I just haven't found one yet.

mixmastamyk · on June 26, 2018

Very happy that GCP is opening in LA, however pushing the VFX angle is interesting considering the bulk of that work has moved overseas due to tax credits. Sent the link to a friend at Dreamworks that still has a sizable amount of work in Glendale.

Will also be interesting to find out how they are solving/pricing the immense bandwidth/storage requirements needed to make such work practical.

manigandham · on June 27, 2018

They had a big event about this for the media/vfx industry today: https://cloudplatformonline.com/LA-Cloud-Region-Launch-Media...

The data all lives in the cloud, only the models and specs are sent back and forth. Zync is their rendering tech: https://www.zyncrender.com/

manigandham · on June 26, 2018

This is great for persistent storage on Kubernetes clusters. Storage has been complicated so far and while there are external solutions for pooling and replicating locally attached disks, this is much nicer and simpler for most uses.

paulsutter · on June 26, 2018

As convenient as this seems, there are good reason that many shops prohibit the use of NFS in production.

There’s nothing to prevent any application from doing full error checks on every disk i/o call, dealing with timeouts, etc. Except that nobody wrote that stuff into software designed for local disk.

tannhaeuser · on June 27, 2018

> As convenient as this seems, there are good reason that many shops prohibit the use of NFS in production.

Such as?

yjftsjthsd-h · on July 4, 2018

Reliability issues, IME.

shoo · on June 26, 2018

At $enterprise-dayjob we've ended up using EFS in AWS after migrating an application that was using NFS. One of the main obstacles to adopting this was perceived lack of security as EFS didn't support encryption of NFS connections between EC2 instances and EFS mount targets. AWS EFS released support for encrypting the NFS connections earlier this year, using Stunnel.

What's the corresponding story re security for cloud filestore?

cobookman · on June 26, 2018

GCP encrypts data in transit [1] & at rest between GCE nodes [2].

I do not know if Cloud Filestore offers encryption on the NFS application layer.

[1] https://cloud.google.com/security/encryption-in-transit/#vir...

[2] https://cloud.google.com/security/encryption-at-rest/

boulos · on June 26, 2018

This isn't correct: it's only between VMs "if they leave a physical boundary". VMs in the same zone do not do that.

mlrtime · on June 26, 2018

How did you get around the terrible performance of EFS. Are you not read/writing lots of small files?

shoo · on June 26, 2018

Good question.

The application had grown and ran in production over a number of years using NFS for storage before it was moved into AWS, so this probably naturally steered devs away from trying to use the shared file storage as a high performance cache (eg they might try it, and find it to be slow, and figure out a different way of doing whatever they needed to do, probably using the db).

Our usage pattern for NFS did not generally involve needing to read or write many small tiny files with any degree of performance. E.g. we might need to do reads/writes of dozens of files, each of a few mb in a process, with the overall running time of the process being largely governed by our DB performance / patches of poorly written algorithmic code running in a slow scripting language. Amdahl's law - for us making the NFS access go infinitely fast or 5x slower hardly changes the overall total process execution time.

I can certainly appreciate that EFS is an abysmally slow for dealing with large numbers of reads/ writes of tiny files, in another part of the system one of my colleagues set up a read only cache of a large number of tiny files in EFS, due to the per-file write latency it took around a week to load in < 100gb of data.

brazzledazzle · on June 26, 2018

NFS in general seems to struggle with that scenario. Do you get better performance with your own NFS server with that type of workload?

X-Istence · on June 26, 2018

A NetApp filer on 40 Gbit/sec will do NFS plenty fast for running VMs directly to/from the system using ESXi for example.

EFS is not even in the same ballpark in terms of speed :/

brazzledazzle · on June 28, 2018

Right, I’ve had that setup in the past and I wouldn’t disagree with that but I’ve also seen a netapp choke on lots of tiny files over NFS.

chucky_z · on June 26, 2018

Unrelated to EFS, but I found Gluster to have good performance for this use-case vs. NFS/CIFS.

haggy · on June 26, 2018

Nobody should be writing many small files to EFS/S3/etc. It's very bad at that.

rrdharan · on June 26, 2018

Not 100% certain but I don't think hyphen is a valid variable name character in any language whose variable references start with a $.

regularfry · on June 26, 2018

Symbols in common lisp and scheme can both start with $ and contain hyphens.

lazharichir · on June 26, 2018

The price is more attractive (read cheaper) than AWS, but it is still on the pricey side considering the storage sizes people work with these days.

Using persistent storage with Google sets me back a little over $175 per TB of SSD, per month, without networking factored in.

At $0.20, time a thousand for a TB, Cloud Filestore comes to $200. Let's see how the performance goes.

gigatexal · on June 26, 2018

Spin up a huge render job, use the perf of the storage, and then nuke the storage as you have a final product? That's probably what they're getting at.

pm90 · on June 26, 2018

Look closely at their announcement: you will see "enterprise" a few times there.

While it doesn't make sense from an Individuals perspective, for enterprises it certainly is pretty affordable.

ec109685 · on June 27, 2018

Are you able to read/write mount persistent storage to multiple instances at the same tome?

I believe it is only multiple read only mounts — there can be only one writer.

sbr464 · on June 26, 2018

Are there reliability differences in the persistent ssd and filestore?

stemuk · on June 26, 2018

This seems like a really unfortunate name choice to me. When I first read the title I misread it as Cloud Firestore, which just has one letter different to Cloud Filestore.

rishav_sharan · on June 27, 2018

Aren't these the same? Cloudstore and Firestore are the same product with different branding.

manigandham · on June 27, 2018

...no, first of all there is no such thing as Cloudstore.

Cloud FIREstore is a document-store database for Firebase, which is the GCP suite of services primarily for mobile apps.

Cloud FILEstore is a NFS file system mountable across multiple compute engine VMs.

theDoug · on June 26, 2018

Product page: https://cloud.google.com/filestore/

(disclosure: I work in GCP, but not on this product. Just happy to see it go public for more users.)

boundlessdreamz · on June 26, 2018

The docs linked from product page are 404ing. Ex: https://cloud.google.com/filestore/docs/quickstart-gcloud

khc · on June 26, 2018

Is it intentional that on the page, there's a list of Partner Solutions that are essentially a competitor of Filestore?

theDoug · on June 26, 2018

Extreme "Just my two cents" mode: user problems may be similar, but needs and solutions remain unique.

Any one group trying to build a solution to serve all needs is likely in line for complication, which is why great partners and options continue to be a wise choice.

(Gonna duck out of this thread, etc. now since this isn't my release and I assume people who work in storage can go a lot deeper.)

foobarbazetc · on June 26, 2018

They all nominally do the same thing but they all have their pros and cons.

It’s like choosing which car to buy. They all get you from A to B, but there are still many options to choose from. ;)

bhouston · on June 26, 2018

They are trying to be nice to the partners they are now competing with.

vishwajeetv · on June 26, 2018

I think these are the solutions that GCP Partners have created USING Filestore.

manigandham · on June 26, 2018

No, they are alternatives that have existed as solutions before Filestore was created. They still have unique features, better performance and other advantages if you need them.

cowmix · on June 26, 2018

The performance of EFS is horrible. I hope this is better. On paper, it seems more performant.

chrisprobert · on June 26, 2018

I'm curious whether this is backed by Google File System (https://static.googleusercontent.com/media/research.google.c...), or something else.

ebikelaw · on June 26, 2018

Nothing at Google has been backed by GFS in many years. https://www.wired.com/2012/07/google-colossus/

puzzle · on June 26, 2018

Well, the poster probably meant whatever GFS incarnation is around these days (Colossus is v2 or v3, depending on how you look at things). In the end, almost everything at Google is backed by "GFS": Bigtable, Blobstore, GCS, Spanner, Megastore, etc. There's little else that is not backed by GFS, but talks to D directly. At least that has been mentioned in public, of course. Still, none of it is user facing/serving, though.

cobookman · on June 26, 2018

Google Cloud Storage (GCS), an S3 competitor, is backed by Colossus (CNS). [1]. Colossus is the replacement to Google File System (GFS).

[1] https://cloudplatform.googleblog.com/2014/01/easier-faster-l...

manigandham · on June 26, 2018

Judging by pricing and performance, it's mostly likely using their existing Persistent Disks offering and wrapping it in a managed NFS layer.

https://cloud.google.com/persistent-disk/

chippy · on June 26, 2018

Hows it compare (pricing / performance) to using Cloud Storage, for non web accessible files.

Example use case, screenshots of websites where instances both request and write / overwrite them.

manigandham · on June 26, 2018

This is a product that gives you a file system, like attaching a disk, but over NFS so it's available for multiple servers to mount over the network.

If you don't actually need a disk-based file system and just want to read/write individual files as objects, then object storage like Cloud Storage is your best option.

namibj · on June 26, 2018

Specifically, it depends on whether you need random writes to files and/or (some) file locking semantics/consistency.

stevekemp · on June 26, 2018

Even google's blog is prone to spam-comments!

merb · on June 26, 2018

wow the premium filestore is as good as a single ssd (replicated). that's really cool to use for mysql/postgres databases on gke, way cheaper than using GCP SQL if you already have a GKE cluster.

thesandlord · on June 26, 2018

Whats the advantage of using NFS instead of just attaching a Persistent Disk to a StatefulSet? Wouldn't that be a lot cheaper?

manigandham · on June 26, 2018

NFS is much faster to attach than PD to a GCE node, which can take minutes at times and lead to crash loops waiting for the storage to be ready. This is especially problematic if the disk needs to be moved from one node to another for some reason.

toredash · on June 27, 2018

Isn't running databases on NFS, really bad?

manigandham · on June 27, 2018

Depends. There's nothing wrong with NFS itself, although v4 is better obviously. There's a performance trade-off with going over the network and sharing the storage with multiple servers, along with the necessary metadata for each file, but if you don't have much contention then it's just like a drive with higher latency.

If that latency is low and your workload can handle disk concurrency well then it works fine. It helps if you use (or configure) a database with more sequential access and buffering for large updates rather than lots of random small writes, as well as spreading the data over several disks.

AWS EFS has latency problems which make it problematic but this product seems to have better performance profile which could work well.

toredash · on June 28, 2018

I get the performance issue but I'm more concerned about data integrity.

I did a quick lookup in the MySQL docs (https://dev.mysql.com/doc/refman/8.0/en/disk-issues.html) and was surprised that this isn't really an issue.

Learned something new, thanks!

manigandham · on June 28, 2018

Databases all use some kind of write-ahead logging so you'll be safe as long as that file is safe, and they're even capable of recovering the file all the way up to any corrupt records that may have been appended at the end.

You shouldn't use multiple servers writing to the same volume for database drives, but other than that it's no different than any other disk that might lose connection. Most VMs "local" disks are still attached over the network anyway, emulating a PCIe bus interface instead of NFS.

thesandlord · on June 27, 2018

That makes a lot of sense, thanks!

evancox100 · on June 26, 2018

This is also what EDA/ASIC design tools need to move to the cloud.

madspindel · on June 26, 2018

What is EDA/ASIC design tools? AutoCAD?

namibj · on June 26, 2018

No, but exotic software people use to design integrated circuits, like a WiFi modem, or a Bitcoin miner. Autodesk, the creator of AutoCAD, sells Eagle, an EDA software. It does not go lower than you can go with a soldering iron, except for allowing solder joints that are better created with hot air / infrared heating, and fancy multi-layer circuit boards with components on the inside.

dboreham · on June 26, 2018

e.g. https://www.mentor.com/

Much of the EDA world though uses in-house tools, or a mixture of commercial and in-house tooling.

wmf · on June 26, 2018

Chip design stuff like Cadence.

oavdeev · on June 26, 2018

So now they have two cloud storage products, called Cloud Firestore and Cloud Filestore? Their branding team is on top of their game, as always.

luryus · on June 26, 2018

Between Cloud Storage, Cloud Datastore, Cloud Firestore and now Cloud Filestore the naming of these services is frustratingly confusing. Oh and of course there are also BigQuery and BigTable, which do not really give any clearer picture of the actual service. I've used GCP extensively for more than a year now and I still get confused about the different storage services way too often.

mehblahwhatevs · on June 26, 2018

I came in here to say the same thing.

"Google Cloud Storage" could be a product but also the encompassing category of all the things you mentioned?

Also on the Firebase pricing page (https://firebase.google.com/pricing) they have a "Realtime Database", is that related to datastore or cloud storage?

They also have another item there just labeled "Storage". Is that one of the above?

And at the bottom you get "Google Cloud Platform" on the "Blaze Plan". Is that the products you mentioned that all start with "Cloud"?

manigandham · on June 26, 2018

AWS uses random unique names and Azure uses more standard component names, while GCP seems to be in the middle.

No approach is "the best". They're all very different services and if you're going to use them then you would've read the overview anyway.

awad · on June 26, 2018

Having a product overview is still not an excuse for terrible naming.

deesix · on June 26, 2018

Disclosure: I work on GCP

Thanks for the feedback. As the person that named both products, I can say we spent a ton of time debating this but we felt that the fact one is an enterprise file share and the other a document database service focused on mobile and web would mean very little conflict for customers. We will keep an eye on any customer confusion it might cause.

jedberg · on June 26, 2018

Please make sure you read this comment: https://news.ycombinator.com/item?id=17402427

It explains why this is linguistically bad. Basically, a billion people on this planet don't distinguish between the l and r sounds, so for 1/6 of the planet, these names are identical.

Gigablah · on June 27, 2018

Native Chinese speaker here. That post is comically wrong. For one thing, you don't pronounce either "file" or "fire" like "filer" or "firer" -- isn't English amazing ;) -- so there's no L/R sound in the first place for us to supposedly confuse.

IanCal · on June 26, 2018

I mean all of the following in the best possible way:

Perhaps also worth looking at the screenshot in the blogpost.

You have in there:

---

Datastore

Storage

Filestore

---

So, datastore is not storage, nor is filestore. What is it storage storing if not data or files? Why are files not data? I have no idea what should go where.

> We know folks need to create, read and write large files with low latency. We also know that film studios and production shops are always looking to render movies and create CGI images faster and more efficiently. So alongside our LA region launch,

So I couldn't create, read or write before without low latency? I thought this was already a feature of your other products

> we’re pleased to enable these creative projects by bringing file storage capabilities to GCP for the first time with Cloud Filestore.

For the first time? I couldn't store files before?

I'm not trying to be an arse, but I really don't get from this what the key difference is from everything else you offer.

derefr · on June 26, 2018

> What is it storage storing if not data or files?

Objects. Cloud Storage is the S3 competitor.

> Why are files not data?

“Data” as in rows in a database. Like Dynamo.

Everything on a computer is data. The thing you’ve got to understand is that the terms we use, “objects”, “files”, “data” — these don’t refer to types of data, but rather to access paradigms for data. The semantics of their storage, indexing, mutability, etc.

An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.

“Data” is a structured tuple that a database knows how to index into, and sort by the columns of. You insert rows, update columns of rows by a key, or delete rows by a key.

“Files” are seekable streams where you can index anywhere into a file by position and then read(2) or write(2) data at that position, and where other clients can see those updates as soon as you sync(2), without needing to close(2) the file first.

All could be used to implement the other (S3 is implemented in terms of Dynamo rows holding chunks of object data, for example.) But each access semantics has use-cases for which it is an impedance match or mismatch.

IanCal · on June 27, 2018

Thanks for the explanation, the file/object/data difference makes sense.

> An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.

And yet they refer to the objects inside as "Files" and support seeking

https://cloud.google.com/appengine/docs/standard/python/goog...

https://stackoverflow.com/questions/14248333/google-cloud-st...

I know this is just bikeshedding about names and terms but it feels confused.

I think some of the confusion in the list is because of the mix of generic and product naming.

Data can be stored in datastore. But also in "spanner" or "bigtable", which are not parts of "datastore", or in "SQL" which is a language. Object can be stored in the object store called "storage" which is also within an entire category itself called "storage". So there's "Storage" which is a group of all these kinds of stores, and "Storage" which is a very specific type of store.

ttul · on June 26, 2018

This will be extremely confusing for Japanese and Korean speakers.

cellularmitosis · on June 26, 2018

I think its worth adding some additional phonetic context around this remark (I made a similar remark and my coworkers thought I was making a racist joke).

The reason native Japanese speakers struggle with "R" and "L" sounds is because they just have one phoneme to work with, which sounds (to a native English speaker) like a combination of "R", "L", and "D". If you aren't exposed to phonemes at a young age, it is difficult to expand your set later in life.

An analogous difficulty might exist for English speakers if a Chinese company came up with two product names which used the exact same sequence of syllables, but had "tonal" differences in pronunciation.

yongjik · on June 26, 2018

Actually, not that much for Korean speakers, because "file" becomes "pa-il" and "fire" becomes "pa-i-eo". (Damn English triphthongs...)

Hopefully they won't launch Cloud Pyrestore any time soon...

lioeters · on June 26, 2018

Similarly for Japanese, "file" is pronounced/written as "fairu" and "fire" as "faiyaa". If the service docs are translated, then they would look like (and sound as) difference names.

nerpderp83 · on June 26, 2018

By the heat of the fire, and play their songs from LyreStore.

Floating plastic would go in the GyreStore.

If you had a bunch muck it would go in MireStore.

borplk · on June 26, 2018

This will be extremely confusing for anyone.

deesix · on June 26, 2018

Disclosure: I work on GCP

Thanks for highlighting. We were aware of this and working with the local sales teams to make it as easy as possible.

vasco · on June 26, 2018

Even if the products are for different types of customers all of them will still need to click the right one.

oavdeev · on June 26, 2018

I mean.. I get it, there isn't a lot to work with, it is a thing that _stores files_ after all, and it most closely resembles a normal filesystem.

But it does make GCP's storage product naming even more confusing overall (after "Cloud Storage" vs "Cloud Datastore" mentioned above).

pinewurst · on June 26, 2018

What about when the names are spoken, especially by non-native English speakers? They're close homonyms.

spaniard89277 · on June 26, 2018

I'm non native English speaker and I didn't notice the difference.

Please, use useful names that do tell what is it about.

ak217 · on June 26, 2018

Not only is it confusingly similar to other storage products, but the abbreviations collide among all of them. GCF means how many different things now? (I always considered it to mean Google Cloud Functions - kind of an important product.) Don't underestimate the importance of these things. I can't even talk to my colleagues about your products without us all misunderstanding each other.

foobiekr · on June 26, 2018

What was so special about the name that it was even worth the debate? Why not avoid the problem?

some_account · on June 26, 2018

You made the wrong choice.

niftich · on June 26, 2018

Update the chooser flowchart [1] when it's out of beta or when appropriate. This page also has brief descriptions and comparison tables.

[1] https://cloud.google.com/storage-options/

mandeepj · on June 26, 2018

> I can say we spent a ton of time debating this but we felt that the fact one is an enterprise file share and the other a document database service focused on mobile and web would mean very little conflict for customers.

How about naming one as Cloud FileStore and other as Cloud DbStore?

kuwze · on June 26, 2018

Maybe you could give them pokemon/ikea-like nicknames? Stupid idea, but it might really help people searching for stuff.

bigtones · on June 27, 2018

Oh oh, it's already causing a lot of confusion. Just check the 70+ comments in this thread alone hating on the naming and professing their confusion and it's been out less than 12 hours... and that is from a self selecting very tech savvy audience. I think you've got a problem.

borplk · on June 26, 2018

Spent a ton of time and still decided to stick to names that differ by a single "r vs l" letter.

Wow. Don't mean to be rude but go for a walk outside and speak to 3 people outside the bubble.

deesix · on June 26, 2018

Disclosure: I work on GCP

Thanks for the feedback. We spoke to number of existing GCP customers for feedback, but it's fair to say we can always talk to more non-customers.

stingraycharles · on June 26, 2018

To be fair, I think “Filestore” in itself is a pretty descriptive name, and the real problem is the name “Firestore”. I can imagine the internal discussion when a bit like that as well, and here we have the result.

dymk · on June 26, 2018

You don't see the irony in that statement, as you toss in your low-effort criticism just to stoke the HN feeding frenzy?

borplk · on June 26, 2018

> low-effort criticism

It's a clear blunt suggestion to get out of their bubble.

No one outside Google would hear that explanation and say "Yeah totally makes sense one of them is an enterprise file share and the other a document database service focused on mobile and web, crystal clear and very little confusion.".

What more do you want out of my comment for it to not be low effort? Write a 3 page essay about it carefully making a case based on peer reviewed scientific evidence?

dymk · on June 26, 2018

You misunderstand, the comment provided no value at all; it’s not that I wanted peer review studies.

faitswulff · on June 26, 2018

Between Cloud Storage, Cloud Firestore, and Cloud Filestore, I can't make heads or tails of the situation without diving into the docs.

I was confused when they introduced Cloud Firestore to compete with their Realtime Database (https://firebase.google.com/docs/database/rtdb-vs-firestore). Now it seems like they're doing it again with Storage vs Filestore, not to mention the horrendous choice of names.

lathiat · on June 27, 2018

Cloud Filestore specifically gives you a multi-host NFS interface (for traditional filesystem applications) like Amazon EFS - for applications you cant easily modify to use other APIs it appears like a normal filesystem.

Cloud Storage is an API-level object store (e.g. S3) that requires specific application support.

mholt · on June 26, 2018

I had to read your comment 3 times before I noticed the difference.

lazharichir · on June 26, 2018

Firestore is a NoSQL database, Filestore is a file storage system.

If you can discern between "now" and "not" then you can deal with "Firestore" and "Filestore"...

klodolph · on June 26, 2018

Most people can tell the difference between /naʊ/ and /nɑt/. I mean, just look at them... one ends with a consonant, one doesn't, and the vowels are reasonably different.

The difference here is between /faɪl/ and /ˈfaɪəɹ/, which is much more subtle. It comes down to the difference between /l/ and /əɹ/. The [ə] is an uncommon vowel in languages, unstressed, and mostly subsumed by nearby sounds. And worse, more than a billion people on the planet grew up speaking a language which doesn't distinguish the [l] and [ɹ] sounds (they're both approximants with only slight differences in articulation). So when you say "file" or "fire" these people can't distinguish which one you're saying, and when they say it they use something like the tap [ɾ] or retroflex [ɻ] instead, both of which sound ambiguous to native English speakers. Or some non-native speakers will use [l] exclusively, for both /l/ and /ɹ/.

_0nac · on June 26, 2018

FWIW, while Japanese doesn't distinguish L and R, "fire" is transliterated as ファイア faia while "file" is ファイル fairu. So the difference is reasonably clear.

klodolph · on June 27, 2018

The difference is only clear after transliteration. The problem is that transliteration is difficult to begin with.

lazharichir · on June 26, 2018

That's very scientific but most people will read it, not write it. And most people will use one, and not even acknowledge the existance of the other.

Sure, they are one-letter away from the other, it's a fact. But to turn this into a problem, well.. no...

jfrankamp · on June 26, 2018

The parent gave a phonetic explanation that has nothing to do with writing but rather hearing and speaking about it. For a billion people they aren't one letter away from each other, they're ~ the same. That's the point.

klodolph · on June 26, 2018

This is a problem if you work with an international team, especially if you talk over video links which are usually less than ideal.

ademup · on June 26, 2018

I am a bit embarrassed, but I had to read your comment twice to see that you mentioned two different products in your first sentence. So for me at least, GP comment is accurate.

chrisjc · on June 26, 2018

I still had to read your comment to notice the difference.

phamilton · on June 26, 2018

Except "w" and "t" aren't similar in pronunciation. While "r" varies a lot in pronunciation across languages and regional dialects, there are many languages where the two ("r" and "l") require the same tongue position and a few languages where the two are (overly simplified) equivalent.

pvg · on June 26, 2018

'now' and 'not' are almost always trivially identifiable from context, even when misspelled. Firestore and Filestore are not.

gigatexal · on June 26, 2018

premium tier iops is a fixed perf vs the per GB iops scaling of AWS, nice.

ebikelaw · on June 26, 2018

"Typical" availability number is both low and wishy-washy.

danra · on June 26, 2018

Given Google's history, is there any good reason to believe this service would still be supported by Google in a few years and not be replaced by yet another new-and-of-course-much-better-than-the-old-one iteration?

numbsafari · on June 26, 2018

It's currently in Beta, which means it will hopefully eventually go GA. As with any product from any company ever, just because it's in beta doesn't mean it necessary ever goes GA.

In the case of Google Cloud Platform products, many of them [1] are subject to the deprecation policy [2]. Basically it states that they'll give you one year advance notice of any intent to deprecate those products. This is functionally the exact same policy as that offered by AWS [3].

[1] https://cloud.google.com/terms/deprecation

[2] https://cloud.google.com/terms/ (see Section 7)

[3] https://aws.amazon.com/agreement/#2._Changes.

outworlder · on June 26, 2018

Is this history relevant for the Google Cloud Platform? Or are we still talking about RSS readers?

paulddraper · on June 26, 2018

I wasn't aware of Google retracting or abandoning any GCP offerings.

Rather, they seem to double down on all of them.

I guess you could say this is the replacement for Cloud Storage FUSE [1], but it's not and if it were I don't see the problem with that.

[1] https://cloud.google.com/storage/docs/gcs-fuse

remus · on June 26, 2018

Google's long history of supporting their paid products, you mean?

tonfa · on June 26, 2018

Are you referring to a particular GCP feature that got turned down after reaching GA?