Re: Beyond another cloud: data service discovery for NDSLabs
On 11/5/14, 11:43 AM, Matthew Turk wrote:
[...] I'd rather we *not* provide an answer to this from the
perspective of the infrastructure within which applications can run,
but instead determine a matchmaking system for data.
Wow. That's making a ton of sense - and resonates completely with that
email from the RDA datatype registry group (did they really just send
that out earlier this week?) It looks to me like the "About" and "Scope"
pages at http://typeregistry.org were updated recently too, along these
lines?
What do you think about combining service discovery with the Datatype
Registry for matchmaking applications to data? I'd rather we supply
the ability for applications to fail than try to cover every possible
aspect of their success. As a concrete example, imagine that
applications get spawned, and they register themselves as working with
a given datatype; data gets inserted into the system and either during
a tilling step or as part of the ingestion, it's identified as fitting
into a given datatype from the DTR. When the data is selected to be
acted upon, the available services would be returned. In addition to
this, we could provide standard services as well -- generic Python,
R[Studio], shell, etc data manipulation methods.
So I'm imagining something similar to the way mime types and associated
applications are registered with web browsers right now. For each
content type, I as a user have a default application to open it in (if
I've seen that type before), but also other options available that I can
select from or change to. Perhaps the DTR could be at root an extension
of the content-type system? Except we're imaging the data handling
applications registered not with the local web browser but through some
sort of online discovery service. But I might want to add some more
local ones of my own (like the python R, etc. examples). Still not
entirely clear to me how this ought to work but it seems like there
should be a way to get there.
I really like this idea, and I think it blends very well with what RDA
is trying to come up with. So to drill down from this idea - what are
the technology components we need?
* A DTR is one piece (perhaps organized a bit differently from the RDA
example as it stands).
* Some kind of discovery service to link applications with data types
they support.
* Another service to link datasets or particular portions of datasets
(individual files, ?) to data types (or some way to represent that in
the dataset metadata?) I'm thinking something based on the Open
Annotation model perhaps?
* An interoperability layer that can link up a dataset with a default
application, either online or local, or present a list of options,
through the above services
This doesn't sound too overwhelming... Are there other pieces needed?
Arthur
Other Mailing lists |
Author Index |
Date Index |
Subject Index |
Thread Index