A few questions come to mind after reading the docs (minus the currently 404ing ones):
(1) is this a zonal or regional product? ie does it replicate data across zones in a region?
(2) roughly what latencies should I expect to see? eg for the following: a 1KG, 1MB, 1GB read and write. Any info here would be helpful.
(3) does it have close-to-open consistency? or something weaker / stronger?
(4) any plans to add gcp pub/sub integration somehow? would be great to be able to subscribe to changes like with other gcp storage products.
(5) any plans to move from NFSv3 to NFSv4?
(6) backups available or planned as a feature? eg to GCS
(7) can you share anything about how it is implemented?
Further minor versions add other things that could be useful for cloud usage, e.g. in NFSv4.1 (https://tools.ietf.org/html/rfc5661 ) parallel data access, sessions, improved delegations, and in NFSv4.2 (https://tools.ietf.org/html/rfc7862 ) server-side clone/copy, sparse file support.
I'm pleasantly surprised by mere NFSv3. Google doesn't use other people's standards by choice. They probably have an internal file protocol that the Greys gave them to hand over to humanity at a future date.
I work for Google and am the product manager for Cloud Filestore. I would have responded sooner, but I was busy with with announcement related events :)
If you (or anyone reading this) want to discuss any aspects of Filestore, My email is my Hacker news login with at google.com appended.
(1) Filestore is a zonal product and provides high availability (HA) within the zone. We are considering adding regional regional HA, but it's not entirely clear what the use case is, as there is a cost & performance tradeoff. I'd be happy to chat with you in more depth to understand what you would like to see here.
(2) It's hard to be precise about what latencies _you_ will see, as the set of benchmarks and workloads run against NFS is so varied. Anything I say here, will be true for some workload, but undoubtably is bound to find someone who can find a workload where it's not true :). So, TL;DR: YMMV, best to test your workload when the beta launches soon (signup to be notified when it launches in a few weeks at https://goo.gl/forms/Hx6XkobcwNo5DoA33)
(3) We support close-to-open consistency, but it's really up to the client. See this Linux NFS FAQ for details: http://nfs.sourceforge.net/#faq_a8. TL;DR: If you're running a Linux version ≥ 2.4.20, and haven't mounted with the 'nocto' attribute, then yes, you'll see CTO.
(4) We don't have any plans for pub/sub integration on the roadmap, but I'd love to talk to you about the use case (see info about my email addr above).
(5) Yes, we have NFSv4 support on the roadmap. We launched with NFSv3, because it's still widely used, and in many cases customers won't see any appreciable performance delta from NFSv4. That said, we agree that it is very important, and NFSv4 can often help wih some metadata heavy workloads, and has a more extensive authentication and authorization model which some workloads require. Ultimately we made a time-to-market tradeoff.
(6) For backups, we support any of the standard commercial backup software that's certified against GCP and can backup NFSv3 shares. We don't have a native backup solution planned, but we do have snapshots on the roadmap, which in some cases are sufficient.
(7) As to implementation, sorry no, I cannot.
And to answer a few more questions from the nested comments so this is all in one place:
* Snapshots are on the near term roadmap, and are very high priority for us to get supported.
* SMB, extended attribute, and quota support are all on the roadmap, and like NFSv4 are high priority
Unfortunately I can't be more precise about when to expect these features.
They've undercut Amazon EFS by 33% for the budget option ($.20/gb vs. $.30/gb). Hope this kicks off a little pricing war - shared filesystem is a hugely useful solution for migrating old apps, would be nice if it were cheap. Both options are still grossly overpriced (IMHO)
Can you please elaborate a little?
EFS scales with size, i.e. did you have at least a few hundred GB on there or was it mostly empty? Roughly how many IOPS did you get?
IIRC, I populated the thing with 10TB or thereabouts.
For small block sizes, I got hundreds of IOPS.
Latency was pretty terrible in almost all cases.
I wrote a great deal on this topic for internal consumption. Those benchmarks weren't really meant for the general public. My biggest conclusion is that EFS isn't useful for any workload where performance is a concern. Unfortunately, it's priced so high, that I'd never consider using it otherwise.
We use ~100GB with a lot of in/out for EFS, as a kind of 'nearline S3,' mounted to a hundred ec2 instances or so. It replaced an NFS server doing the same thing, and it ended up being slightly cheaper cause you don't pay for the compute, only the storage (efs vs. ebs + ec2).
Also I haven't had a single issue with performance. It's not particularly busy though (~800 iops?).
Underlying S3 bits do bite you when you deal with lots of small files. Our team even considered maintaining NFS or Gluster servers over horrible EFS performance.
Another file system that works well with lots of small files and that doesn't require maintaining any extra servers is ObjectiveFS[0]. It is a FUSE based log structured file system (with snapshots) using GCS or S3 for log storage with memory cache and (optional) disk cache for performance.
As Amazon starts winning fat government contracts for unaccountable sums of money, it will probably care less and less about offering "value" to its smallest private sector customers.
Amazon had better improve its customer service offering, or ramp up its partner network if it wants to win big "enterprisy" clients like government then. The HN crowd may look at the Oracles and IBMs as super-expensive dinosaurs, but they do offer an incredibly high level of help and support.
AWS has excellent customer service offerings after a certain price point. I would check the "Business" and "Enterprise" plans for examples of how to get the good support.
You get an account manager rep and upon occasion, interactions with AWS engineers, depending on what you're doing and how exciting it is.
One might say a lot about AWS - but I don't fault them on the customer service front for a serious business spending serious money.
Amazon already has great customer service for big clients, and they already have lots of government clients too. (Including a completely private region for the CIA and other intelligence services.)
Ever try getting a SQL support request to an engineer actually at Microsoft and not one outsourced and get a response within weeks? Yeah, not easy. Or if you get a response it's the equivalent of "have you tried turning it off and then on again?".
Yes, we had a SQL support case replied to by email, and then followed-up via phone/screenshare to have it resolved that day.
The only constant with tech support is that it's highly variable and depends on many factors. Enterprise MSDN/SQL support is not bad though and you should've gotten a quick reply if you're using the proper channels with an appropriate support plan.
The shop I came from has been a Microsoft and VMware shop for years and we spend millions a year on MSDN licensing. The support has been kinda shitty. But I’m not there anymore. I do remember in DBA meetings laughing about how crappy it is. And why do we pay so much for offshore support techs when they should be local on premises engineers.
Amazon does too when you are well into few mil. per month range they are pretty responsive (used to work for smaller shop 50k/month range spend AWS) the difference is night and day.
Very happy that GCP is opening in LA, however pushing the VFX angle is interesting considering the bulk of that work has moved overseas due to tax credits. Sent the link to a friend at Dreamworks that still has a sizable amount of work in Glendale.
Will also be interesting to find out how they are solving/pricing the immense bandwidth/storage requirements needed to make such work practical.
This is great for persistent storage on Kubernetes clusters. Storage has been complicated so far and while there are external solutions for pooling and replicating locally attached disks, this is much nicer and simpler for most uses.
As convenient as this seems, there are good reason that many shops prohibit the use of NFS in production.
There’s nothing to prevent any application from doing full error checks on every disk i/o call, dealing with timeouts, etc. Except that nobody wrote that stuff into software designed for local disk.
At $enterprise-dayjob we've ended up using EFS in AWS after migrating an application that was using NFS. One of the main obstacles to adopting this was perceived lack of security as EFS didn't support encryption of NFS connections between EC2 instances and EFS mount targets. AWS EFS released support for encrypting the NFS connections earlier this year, using Stunnel.
What's the corresponding story re security for cloud filestore?
The application had grown and ran in production over a number of years using NFS for storage before it was moved into AWS, so this probably naturally steered devs away from trying to use the shared file storage as a high performance cache (eg they might try it, and find it to be slow, and figure out a different way of doing whatever they needed to do, probably using the db).
Our usage pattern for NFS did not generally involve needing to read or write many small tiny files with any degree of performance. E.g. we might need to do reads/writes of dozens of files, each of a few mb in a process, with the overall running time of the process being largely governed by our DB performance / patches of poorly written algorithmic code running in a slow scripting language. Amdahl's law - for us making the NFS access go infinitely fast or 5x slower hardly changes the overall total process execution time.
I can certainly appreciate that EFS is an abysmally slow for dealing with large numbers of reads/ writes of tiny files, in another part of the system one of my colleagues set up a read only cache of a large number of tiny files in EFS, due to the per-file write latency it took around a week to load in < 100gb of data.
Spin up a huge render job, use the perf of the storage, and then nuke the storage as you have a final product? That's probably what they're getting at.
This seems like a really unfortunate name choice to me. When I first read the title I misread it as Cloud Firestore, which just has one letter different to Cloud Filestore.
Extreme "Just my two cents" mode: user problems may be similar, but needs and solutions remain unique.
Any one group trying to build a solution to serve all needs is likely in line for complication, which is why great partners and options continue to be a wise choice.
(Gonna duck out of this thread, etc. now since this isn't my release and I assume people who work in storage can go a lot deeper.)
No, they are alternatives that have existed as solutions before Filestore was created. They still have unique features, better performance and other advantages if you need them.
Well, the poster probably meant whatever GFS incarnation is around these days (Colossus is v2 or v3, depending on how you look at things). In the end, almost everything at Google is backed by "GFS": Bigtable, Blobstore, GCS, Spanner, Megastore, etc. There's little else that is not backed by GFS, but talks to D directly. At least that has been mentioned in public, of course. Still, none of it is user facing/serving, though.
This is a product that gives you a file system, like attaching a disk, but over NFS so it's available for multiple servers to mount over the network.
If you don't actually need a disk-based file system and just want to read/write individual files as objects, then object storage like Cloud Storage is your best option.
wow the premium filestore is as good as a single ssd (replicated). that's really cool to use for mysql/postgres databases on gke, way cheaper than using GCP SQL if you already have a GKE cluster.
NFS is much faster to attach than PD to a GCE node, which can take minutes at times and lead to crash loops waiting for the storage to be ready. This is especially problematic if the disk needs to be moved from one node to another for some reason.
Depends. There's nothing wrong with NFS itself, although v4 is better obviously. There's a performance trade-off with going over the network and sharing the storage with multiple servers, along with the necessary metadata for each file, but if you don't have much contention then it's just like a drive with higher latency.
If that latency is low and your workload can handle disk concurrency well then it works fine. It helps if you use (or configure) a database with more sequential access and buffering for large updates rather than lots of random small writes, as well as spreading the data over several disks.
AWS EFS has latency problems which make it problematic but this product seems to have better performance profile which could work well.
Databases all use some kind of write-ahead logging so you'll be safe as long as that file is safe, and they're even capable of recovering the file all the way up to any corrupt records that may have been appended at the end.
You shouldn't use multiple servers writing to the same volume for database drives, but other than that it's no different than any other disk that might lose connection. Most VMs "local" disks are still attached over the network anyway, emulating a PCIe bus interface instead of NFS.
No, but exotic software people use to design integrated circuits, like a WiFi modem, or a Bitcoin miner. Autodesk, the creator of AutoCAD, sells Eagle, an EDA software. It does not go lower than you can go with a soldering iron, except for allowing solder joints that are better created with hot air / infrared heating, and fancy multi-layer circuit boards with components on the inside.
Between Cloud Storage, Cloud Datastore, Cloud Firestore and now Cloud Filestore the naming of these services is frustratingly confusing. Oh and of course there are also BigQuery and BigTable, which do not really give any clearer picture of the actual service. I've used GCP extensively for more than a year now and I still get confused about the different storage services way too often.
"Google Cloud Storage" could be a product but also the encompassing category of all the things you mentioned?
Also on the Firebase pricing page (https://firebase.google.com/pricing) they have a "Realtime Database", is that related to datastore or cloud storage?
They also have another item there just labeled "Storage". Is that one of the above?
And at the bottom you get "Google Cloud Platform" on the "Blaze Plan". Is that the products you mentioned that all start with "Cloud"?
Thanks for the feedback. As the person that named both products, I can say we spent a ton of time debating this but we felt that the fact one is an enterprise file share and the other a document database service focused on mobile and web would mean very little conflict for customers. We will keep an eye on any customer confusion it might cause.
It explains why this is linguistically bad. Basically, a billion people on this planet don't distinguish between the l and r sounds, so for 1/6 of the planet, these names are identical.
Native Chinese speaker here. That post is comically wrong. For one thing, you don't pronounce either "file" or "fire" like "filer" or "firer" -- isn't English amazing ;) -- so there's no L/R sound in the first place for us to supposedly confuse.
I mean all of the following in the best possible way:
Perhaps also worth looking at the screenshot in the blogpost.
You have in there:
---
Datastore
Storage
Filestore
---
So, datastore is not storage, nor is filestore. What is it storage storing if not data or files? Why are files not data? I have no idea what should go where.
> We know folks need to create, read and write large files with low latency. We also know that film studios and production shops are always looking to render movies and create CGI images faster and more efficiently. So alongside our LA region launch,
So I couldn't create, read or write before without low latency? I thought this was already a feature of your other products
> we’re pleased to enable these creative projects by bringing file storage capabilities to GCP for the first time with Cloud Filestore.
For the first time? I couldn't store files before?
I'm not trying to be an arse, but I really don't get from this what the key difference is from everything else you offer.
> What is it storage storing if not data or files?
Objects. Cloud Storage is the S3 competitor.
> Why are files not data?
“Data” as in rows in a database. Like Dynamo.
Everything on a computer is data. The thing you’ve got to understand is that the terms we use, “objects”, “files”, “data” — these don’t refer to types of data, but rather to access paradigms for data. The semantics of their storage, indexing, mutability, etc.
An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.
“Data” is a structured tuple that a database knows how to index into, and sort by the columns of. You insert rows, update columns of rows by a key, or delete rows by a key.
“Files” are seekable streams where you can index anywhere into a file by position and then read(2) or write(2) data at that position, and where other clients can see those updates as soon as you sync(2), without needing to close(2) the file first.
All could be used to implement the other (S3 is implemented in terms of Dynamo rows holding chunks of object data, for example.) But each access semantics has use-cases for which it is an impedance match or mismatch.
Thanks for the explanation, the file/object/data difference makes sense.
> An object is a blob of data named by a key, that you can retrieve entirely, or overwrite entirely, and where usually you automatically get a version history of old versions that have been overwritten that you can retrieve, with a cutoff for automatic GC.
And yet they refer to the objects inside as "Files" and support seeking
I know this is just bikeshedding about names and terms but it feels confused.
I think some of the confusion in the list is because of the mix of generic and product naming.
Data can be stored in datastore. But also in "spanner" or "bigtable", which are not parts of "datastore", or in "SQL" which is a language. Object can be stored in the object store called "storage" which is also within an entire category itself called "storage". So there's "Storage" which is a group of all these kinds of stores, and "Storage" which is a very specific type of store.
I think its worth adding some additional phonetic context around this remark (I made a similar remark and my coworkers thought I was making a racist joke).
The reason native Japanese speakers struggle with "R" and "L" sounds is because they just have one phoneme to work with, which sounds (to a native English speaker) like a combination of "R", "L", and "D". If you aren't exposed to phonemes at a young age, it is difficult to expand your set later in life.
An analogous difficulty might exist for English speakers if a Chinese company came up with two product names which used the exact same sequence of syllables, but had "tonal" differences in pronunciation.
Similarly for Japanese, "file" is pronounced/written as "fairu" and "fire" as "faiyaa". If the service docs are translated, then they would look like (and sound as) difference names.
Not only is it confusingly similar to other storage products, but the abbreviations collide among all of them. GCF means how many different things now? (I always considered it to mean Google Cloud Functions - kind of an important product.) Don't underestimate the importance of these things. I can't even talk to my colleagues about your products without us all misunderstanding each other.
> I can say we spent a ton of time debating this but we felt that the fact one is an enterprise file share and the other a document database service focused on mobile and web would mean very little conflict for customers.
How about naming one as Cloud FileStore and other as Cloud DbStore?
Oh oh, it's already causing a lot of confusion. Just check the 70+ comments in this thread alone hating on the naming and professing their confusion and it's been out less than 12 hours... and that is from a self selecting very tech savvy audience. I think you've got a problem.
To be fair, I think “Filestore” in itself is a pretty descriptive name, and the real problem is the name “Firestore”. I can imagine the internal discussion when a bit like that as well, and here we have the result.
It's a clear blunt suggestion to get out of their bubble.
No one outside Google would hear that explanation and say "Yeah totally makes sense one of them is an enterprise file share and the other a document database service focused on mobile and web, crystal clear and very little confusion.".
What more do you want out of my comment for it to not be low effort? Write a 3 page essay about it carefully making a case based on peer reviewed scientific evidence?
Between Cloud Storage, Cloud Firestore, and Cloud Filestore, I can't make heads or tails of the situation without diving into the docs.
I was confused when they introduced Cloud Firestore to compete with their Realtime Database (https://firebase.google.com/docs/database/rtdb-vs-firestore). Now it seems like they're doing it again with Storage vs Filestore, not to mention the horrendous choice of names.
Cloud Filestore specifically gives you a multi-host NFS interface (for traditional filesystem applications) like Amazon EFS - for applications you cant easily modify to use other APIs it appears like a normal filesystem.
Cloud Storage is an API-level object store (e.g. S3) that requires specific application support.
Most people can tell the difference between /naʊ/ and /nɑt/. I mean, just look at them... one ends with a consonant, one doesn't, and the vowels are reasonably different.
The difference here is between /faɪl/ and /ˈfaɪəɹ/, which is much more subtle. It comes down to the difference between /l/ and /əɹ/. The [ə] is an uncommon vowel in languages, unstressed, and mostly subsumed by nearby sounds. And worse, more than a billion people on the planet grew up speaking a language which doesn't distinguish the [l] and [ɹ] sounds (they're both approximants with only slight differences in articulation). So when you say "file" or "fire" these people can't distinguish which one you're saying, and when they say it they use something like the tap [ɾ] or retroflex [ɻ] instead, both of which sound ambiguous to native English speakers. Or some non-native speakers will use [l] exclusively, for both /l/ and /ɹ/.
FWIW, while Japanese doesn't distinguish L and R, "fire" is transliterated as ファイア faia while "file" is ファイル fairu. So the difference is reasonably clear.
The parent gave a phonetic explanation that has nothing to do with writing but rather hearing and speaking about it. For a billion people they aren't one letter away from each other, they're ~ the same. That's the point.
I am a bit embarrassed, but I had to read your comment twice to see that you mentioned two different products in your first sentence. So for me at least, GP comment is accurate.
Except "w" and "t" aren't similar in pronunciation. While "r" varies a lot in pronunciation across languages and regional dialects, there are many languages where the two ("r" and "l") require the same tongue position and a few languages where the two are (overly simplified) equivalent.
Given Google's history, is there any good reason to believe this service would still be supported by Google in a few years and not be replaced by yet another new-and-of-course-much-better-than-the-old-one iteration?
It's currently in Beta, which means it will hopefully eventually go GA. As with any product from any company ever, just because it's in beta doesn't mean it necessary ever goes GA.
In the case of Google Cloud Platform products, many of them [1] are subject to the deprecation policy [2]. Basically it states that they'll give you one year advance notice of any intent to deprecate those products. This is functionally the exact same policy as that offered by AWS [3].
Thanks