Fwd: [rda-dtr-wg] Data Type Registries (DTR) Output


Hi all, here is some information about DTRs, which is relevant to the discussion Arthur and I were participating in.

---------- Forwarded message ---------
From: TimeaBiro <tbiro@xxxxxxxxxxxxxxxxxxxx>
Date: Tue Nov 04 2014 at 9:54:05 AM
Subject: [rda-dtr-wg] Data Type Registries (DTR) Output
To: Data Type Registries WG <rda-dtr-wg@xxxxxxxxxxxxxx>


Responsible RDA Working Group Co-Chairs:Â
Larry Lannom - Corporation for National Research Initiatives, Virginia USA
Daan Broeder - Max Planck Institute for Psycholinguistics, Netherlands

What is the Problem?

Often researchers receive a file from colleagues, follow a link, or otherwise encounter data created elsewhere that they would like to make use of in their own work. However, they may not know how to work with it, interpret it or visualise its content, being unfamiliar with the specifics of the structure and/or meaning of the data, ranging from individual observations up to complex data sets. Frequently, researchers need to stop here since it requires too much work to look for explanations, tools, and where tools exist, install them.
Â

What is the goal?

The goal of the DTR WG was to allow data producers to record the implicit details of their data in the form of Data Types and to associate those Types, each uniquely identified, with different instances of datasets. Data consumers can then resolve the Type identifiers to Type information for gaining knowledge of the implicit assumptions in the data, finding available services that can be used for this kind of data, and any other useful information that can be used to understand and process the data, without additional support from data producers. DTRs are meant to provide machine-readable information, in addition to presenting human readable information.
Â

What is the solution?

DTRs offer developers or researchers the ability to add their type definitions in an open registry and, where useful, add references to tools that can operate on them. For example, a user who received an unknown file could query a DTR and receive back a pointer to a visualisation service able to display the data in a useful form. A fully automated system could use a DTR, much like the MIME type system enables the automatic start of a video player in the browser once a video file has been identified. We envision humans taking advantage of Data Types in DTRs through the type definitions that clarify the nuanced and contextual aspects of structured datasets.Â
Â
Data Types in DTRs can be used to extend or expand existing types, e.g., MIME types, which provide only container-level parsing information. They can additionally describe experimental context, relationships between different portions of data, and so on. Data Types are deliberately intended to be quite open in terms of registration policies.Â
Â
Two examples may illustrate the benefits of the DTR solution:Â
1. Researchers dealing with data (e.g. in a cross-disciplinary, cross-border context) find an unknown data type and can immediately process and/or visualize its content by using the DTR service.Â
2. Machines that want to extract the checksum information of a data object from a PID record to check whether the content is still the same. Without knowing the details of the PID service provider, the machine could ask for CKSM for example, since this is an information type which all PID service providers agreed upon and registered in the DTR.
Â

What is the impact?

The potential impact on scientific practices is substantial. Unknown data types as described above can be exploited without any prior knowledge and thus an enormous gain in time and/or in interoperability can be achieved. In a similar way to the MIME types that allow browsers to automatically select visualization software plug-ins when confronted with a certain file type extension, scientific software can make use of the definitions and pointers stored in the DTR to continue processing without the user acquiring knowledge beforehand. DTRs pave the way to automatic processing in our data domain, which is becoming increasing complex, without putting additional load on the researchers.Â
Â
Â
Â
Â
Â
This diagram indicates how the Data Type Registry (DTR) is working. A user or machine receives an unknown type (1) which can be a file or a term for example. The DTR is contacted and returns information about an available service (2) that will allow the user or machine to continue processing the content (3, 4) such as visualizing an image without asking prior knowledge from the user. This will make cross-disciplinary and cross-border work much more efficient and enable data driven science even to those who are not data experts.

When can we use this?

The first groups are building software to implement such a DTR concept and make the software available. The RDA PID Information Type (PIT) Working Group is already using the first DTR prototype version in its API. The latest version of a DTR prototype is made available here: http://typeregistry.org/.
Â
We expect software to become available for download around the end of 2014.Â
Â
This simple model will be the start for designing DTRs, with the intention to extend the specifications according to priorities and usage.
Â
Â
--
Full post: https://rd-alliance.org/group/data-type-registries-wg/outcomes/data-type-registries-dtr-output.html
Manage my subscriptions: https://rd-alliance.org/mailinglist
Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/46322


Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index