Interesting and certainly partly
analogous (mime types ~ data types) - I don't see how you are
using DFDL there though, can you describe that a bit more, or
point to a reference on how it's all put together? There always
seems a lot to learn in these discussions!
Arthur
On 11/6/14, 1:52 PM, McHenry, Kenton Guadron wrote:
Hi Arthur,
Would something like this be along some of those lines?:
The format portion is based off of two tools, Polyglot:
and DFDL:
On 11/5/14, 11:43 AM, Matthew Turk wrote:
[...] I'd rather we *not*
provide an answer to this from the perspective of the
infrastructure within which applications can run, but
instead determine a matchmaking system for data.
Wow. That's making a ton of sense - and resonates
completely with that email from the RDA datatype registry
group (did they really just send that out earlier this
week?) It looks to me like the "About" and "Scope" pages at
http://typeregistry.org
were updated recently too, along these lines?
What do you think about
combining service discovery with the Datatype Registry for
matchmaking applications to data? I'd rather we supply the
ability for applications to fail than try to cover every
possible aspect of their success. As a concrete example,
imagine that applications get spawned, and they register
themselves as working with a given datatype; data gets
inserted into the system and either during a tilling step
or as part of the ingestion, it's identified as fitting
into a given datatype from the DTR. When the data is
selected to be acted upon, the available services would be
returned. In addition to this, we could provide standard
services as well -- generic Python, R[Studio], shell, etc
data manipulation methods.
So I'm imagining something similar to the way mime types
and associated applications are registered with web browsers
right now. For each content type, I as a user have a default
application to open it in (if I've seen that type before),
but also other options available that I can select from or
change to. Perhaps the DTR could be at root an extension of
the content-type system? Except we're imaging the data
handling applications registered not with the local web
browser but through some sort of online discovery service.
But I might want to add some more local ones of my own (like
the python R, etc. examples). Still not entirely clear to me
how this ought to work but it seems like there should be a
way to get there.
I really like this idea, and I think it blends very well
with what RDA is trying to come up with. So to drill down
from this idea - what are the technology components we need?
* A DTR is one piece (perhaps organized a bit
differently from the RDA example as it stands).
* Some kind of discovery service to link applications
with data types they support.
* Another service to link datasets or particular
portions of datasets (individual files, ?) to data types (or
some way to represent that in the dataset metadata?) I'm
thinking something based on the Open Annotation model
perhaps?
* An interoperability layer that can link up a dataset
with a default application, either online or local, or
present a list of options, through the above services
This doesn't sound too overwhelming... Are there other
pieces needed?
Arthur
|