Re: Google "Nearline" service

On Mar 12, 2015, at 4:05 PM, John Readey <jreadey@xxxxxxxxxxxx> wrote:

âRight, the pricing model for AWS Glacier definitely gets you when you do retrievals. Nearline looks a million times better in this regard.

Also, I'm thinking that access to the data would be mediated by a service layer. The service could make informed decisions (based on recent access) about which objects to keep in online vs nearline storage. It maybe as simple as keep everything in nearline. On access move to online and keep for the next n days.

Notice the retrieval throughput scales with the amount of data you have in storage. 4 MB/s per TB of storage. So at PB-scale this is looking pretty good.

-john

From: owner-discuss@xxxxxxxxxxxxxxxxxxxxxxx <owner-discuss@xxxxxxxxxxxxxxxxxxxxxxx> on behalf of Matthew Turk <matthewturk@xxxxxxxxx>
Sent: Thursday, March 12, 2015 1:38 PM
To: discuss@xxxxxxxxxxxxxxxxxxxxxxx
Subject: Re: Google "Nearline" service

Hi Johns Towns and Readey,

JR: This is a pretty promising service! The discussion on hacker news was pretty interesting as well.

JT: The cost for retrieval is pretty low, and they obliquely compare quite favorably it to Glacier. Supposedly very, very fast retrieval speeds too.

-Matt

On Thu, Mar 12, 2015 at 11:44 AM, John Towns - NCSA Cog <jtowns@xxxxxxxxxxxxxxxxx> wrote:

John-- I haven't looked at this and am not online currently, but the typical problem with these services is that they tend to be "store-once-read-never" optimized. Is there a fee for data retrieval? I like the notion of cloud storage for scientific data, but the cost are almost always prohibitive for large collections of data.

-John

On 3/11/2015 8:10 PM, John Readey wrote:

Hey,

Google just announced it's nearline storage service today: https://cloud.google.com/storage/docs/nearline-storageâ . It gives you storage at 1/3 the cost ($0.01/gb/month) with some extra latency (3 sec) when the data is requested.

I think that this would be perfect for storing many scientific datasets in the cloud where much of the data is infrequently accessed.

What are this groups thoughts on the suitability of cloud storage for scientific data?

John