Workshop

CAMP-4-DATA: Cyber-infrastructure & Metadata Protocols

Science-CAMP: Cyber-infrastructure & Metadata Protocols DC-SAM logos DataONE logos

Full-day Workshop: 6 September 2013 @ DC-2013 in Lisbon, Portugal

Sponsored by: Dublin Core-Science and Metadata Community (DC-SAM) (« http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata ») / Research Data Alliance (RDA) (« http://rd-alliance.org/ ») Metadata Interest Group.

Metadata is vital to the discovery and management of scientific data. The Dublin Core-Science and Metadata Community (DC-SAM), Research Data Alliance (RDA), and related communities advocate for access to, and shared knowledge about, metadata standards that support data life-cycle management. CAMP-4-DATA participants will explore infrastructure design, applications, and policies that can advance the support of open, collective and sustainable access to metadata standards used for managing scientific data.


Introduction

  • 9:00-9:30 - Welcome, workshop goals, logistics; participant introductions/Jane Greenberg/all « Presentation »

  • 9:00-9:30 - The Metadata Zoo/Rebecca Koskela « Presentation »

  • 9:45-10:00 - DCC Scheme Directory/Alex Ball « Presentation »

Infrastructure Models & Frameworks

  • 10:00-10:15 - A 3-Layer Model for Metadata/Keith Jeffery, Anne Asserson, Nikos Houssos and Brigitte Joerg « Presentation »

    Abstract: We present a 3-layer model for metadata of which the key component is CERIF in the middle, contextual, layer. CERIF forms the lowest, most detailed level of metadata information that is common across research objects such as datasets. Its richness of representation makes it a superset over many other metadata formats allowing their congruent generation from CERIF. CERIF is used in42 countries and is an EU Recommendation to member States.
  • 10:15-10:30 - Cross-Domain Metadata Interoperability: Lessons Learnt in INSPIRE / Andrea Perego, Michael Lutz, Max Craglia and Silvia Dalla Costa « Presentation »

    Abstract: Since 2007, EU Member States have been involved in creating an infrastructure for spatial information in Europe (INSPIRE), based on a legal and technical interoperability framework. This paper presents some of the lessons learnt during the implementation of this infrastructure (which started in 2009) and during work on data and service interoperability coordinated with European and international initiatives. We describe a number of critical interoperability issues affecting both scientific and government data and metadata, and propose how these problems could be effectively addressed by a closer collaboration of the government and scientific communities, by taking advantage of their complementary competencies, and by influencing the development and adoption of standards.

Usage & Tracking

  • 10:30-10:45 - Usage data for metadata properties to support open data registries and semantic wikis/Muriel Foulonneau, Sébastien Martin, Jacques Ducloy, Thierry Daunois and Slim Turki

    Abstract: Metadata and ontology repositories are critical to ensure the discovery of existing vocabularies and the reuse of vocabularies and/or individual properties. However, these infrastructures should take into consideration the decision making process and criteria for the selection of a vocabulary of individual concept or property. Usage data in particular are important and can reassure on the maintenance of the vocabulary by a third party. This data is to some extent available through dedicated tools, such as semantic search engines. We illustrate the need for integrating usage data in the vocabulary infrastructures in order to support the reusability of vocabularies and therefore interoperability and data usability in science.
  • 10:45-11:00 - Provenance Central: More Mileage from Provenance Metadata/Bertram Ludaescher and Paolo Missier

    Abstract: We argue that to get the most value out of provenance it is critical to provide provenance integration and analysis capabilities. For the former, we are developing D-PROV, an extension of the W3C standard PROV that enriches the generic PROV model with important observables from scientific workflow systems and other provenance-enabled systems such as R. For the latter, we are developing PBase, a system prototype and associated language technologies to query and analyze provenance. PBase will be part of the as DataONE data preservation infrastructure for Earth Science Observation (www.dataone.org). Our envisioned Provenance Central will be able to load and analyze provenance in order to connect data through its provenance with other datasets, workflows, and ontologies, but also with papers, scientific hypotheses, protocols, and users (i.e., authors and scientists). Discovering these connections requires analytical techniques that have not yet been applied to provenance. For example, since such provenance metadata will include, amongst other properties, data attribution information, we propose a novel type of analysis, which involves mining provenance through the entire repository, to elicit implicit social connections amongst the owners of the data. In summary, Provenance Central will be a new way of making data and social connections explicit, thus increasing data (re)usability in unprecedented ways.

PID (Persistent Identifiers)

  • 11:30-11:45 - Persistent Identifiers for Terms in a Crowd-Sourced Vocabulary/John Kunze, Greg Janee, Christopher Patton « Presentation »

    Abstract: Unique, persistent identifiers for vocabulary term concepts are critical for metadata (DC1, SKOS2, etc). This comes as no surprise to followers of Linked Data3, for whom this first principle of the semantic web is a sine qua non for automatic reasoning with web content. It is even more important to metadata users who need a precise way to reference a particular concept when the term may have more than one definition. Such is the case for the SeaIce Metadictionary4, a crowd-sourced online dictionary of metadata terms in which multiple competing definitions are expected to be common and to co-exist indefinitely. Anyone can register and login in order to create new terms, edit their own terms, and comment and vote on others' terms. Typical use will be that someone, without logging in, searches for and inserts terms they find into metadata that they're creating to describe their own research. If unsatisfied with the terms that they found—or didn't find—they can login and take action, which means anything from up- and down-voting terms, commenting on others' terms, or adding and editing their own terms. Typical users will be research scientists trying to describe their datasets
  • 11:45-12:00 - Separation of Concerns: PID Information Types and Domain Metadata/Tobias Weigel and Timothy Dilauro « Presentation »

    Abstract: We must define a pragmatic separation of concerns between metadata activities and the typed information associated with Persistent Identifiers. This distinction is important for ongoing debates within respective communities as well as in the RDA working groups. From a data archive's viewpoint, a useful metaphor is that of the "black box" or "envelope": Data management is increasingly done by machinery rather than human users. So the machinery must know what to do with the boxes that come in through various channels, but it cannot open them for various reasons. We propose that metadata is a concern that is—from this particular view of automated data management—located inside the black box. A metadata description may actually be a black box object that must be managed just like all the others. Still, some information must be written on the outside of the box to be interpreted by the machinery. This information may be a subset of metadata, but it may also contain additional information not interesting as domain metadata.

Applications

  • 12:00-12:15 - Ontology-Enabled Metadata Schema Generator: The Design Approach/Jian Qin, Xiaozhong Liu and Miao Chen « Presentation »

    Abstract: Metadata standards are important for normalizing descriptions of publications and research data and for information discovery and use. Large, complex metadata standards, however, can complicate the creation, sharing, and maintenance of metadata and incur high costs for metadata operations, especially in the domain of scientific data. One strategy to solve the problems of large, complex metadata standards is to break them into independent modules to allow for reuse of elements and maximal possibility of automation. To implement this strategy, we need a metadata infrastructure that contains elements, vocabularies, and other metadata artifacts and that is easy to use. This short paper describes the design approach to an ontology-enabled metadata schema generator as part of the metadata infrastructure.
  • 12:15-12:30 - Metadictionary: Advocating for a Community-driven Metadata Vocabulary Application/ Jane Greenberg, Angela Murillo, John Kunze, Sarah Callahan, Robert Guralnick, Greg Janee, Nassib Nassar, Christopher Patton, and Karthik Ram « Presentation »

    Abstract: Metadata disorder and unnecessary costs are increasing due to the expanding population of scientific data schemes and standards. Metadata challenges are reviewed; and SeaIce, a community driven metadata vocabulary application, is introduced as a potential solution. SeaIce functions and development challenges are presented. CAMP-4-DATA participants are called upon to experiment with the SeaIce application and actively participate in a discussion targeting noted metadata challenges.
  • 12:30-12:40 - CLEPSYDRA Data Aggregation and Enrichment Framework/Cezary Mazurek, Marcin Mielnicki, Aleksandra Nowak, Krzysztof Sielski, Maciej Stroinski, Marcin Werla and Jan Wglarz

  • 12:40-12:50 - RUresearch - Open Source Metadata Application Profile and Research Object Handling for Research Data/Grace Agnew and Mary Beth Weber

    Abstract: The Rutgers University Libraries have developed an open source workflow management system that includes a cataloging utility and a compound object handling system that enables the creation of metadata and intelligent object handling to fully support documenting and sharing research data. The cataloging system, which can be used independently and can work with any repository architecture, supports both MODS and Dublin Core metadata schemas. The MODS application profile includes an event-based subschema as a MODS extension schema, that can capture any useful event in the lifecycle of the data, from data capture, to data analysis, to data editing to data reuse. The application profile also includes elements for type of research, research methodology, type of data and type of subject, mapped to MODS and Dublin Core genre and subject elements. The data compound object supports documentation (lab notebooks, images, etc.) and instrumentation (data capture, data analysis, etc.). In addition to relating resources to each other using RDF, the resource handling also includes support for hierarchical file uploads, exactly as they are stored on the researcher’s computer or server. The the metadata and object handling will be presented through examples from the RUcore (Rutgers Community repository) research data portal, RURsearch, http://rucore.libraries.rutgers.edu/research/
  • 12:50-13:00 - Open discussion, setting the afternoon agenda; Brief remarks about RDA-3rd Plenary/Sandra Collins

Breakout Groups

  • 14:30-14:40 - Overview of discussion topics « Draft Presentation »

  • 14:40-15:10 - Breakout groups, Session 1: Infrastructure and design, policy, human and social aspects.

  • 15:10-15:40 - Breakout groups, Session 2 (topic rotation from session 1).

  • 15:40-16:00 - Report back from breakout groups.

Breakout Groups/Workshop Wrap-up

  • 16:30-16:45 - Delegates propose/vote on 'special' topics.

  • 16:45-17:15 - Self-selected groups discuss a topic each.

  • 17:15-17:50 - Report back from each group; discussion of possible action points.

  • 17:50-18:00 - Closing remarks.


Participation

  • Participation is open to (1) workshop presenters, and (2) general participants/viewers interested in attending the CAMP-4-DATA. Workshop registration is required.

  • A call for participation for Short Papers, Tool/Technology Abstracts, and Position Statements is available at « http://dcevents.dublincore.org/IntConf/index/pages/view/camp-4-data-cfp » with a submission deadline of 28 June 2013.


Workshop Leadership—DC-SAM and RDA Representatives:

Jane Greenberg (Workshop PI & « Contact Person »)
Professor and Director, Metadata Research Center, School of Information and Library Science, University of North Carolina at Chapel Hill, USA
Alex Ball
Research Officer, Digital Curation Center, UK
Keith Jeffery
Keith G Jeffery Consultants, UK
Rebecca Koskela
Executive Director, DataONE DataNet, USA
Jian Qin
Associate Professor School of Information Studies, Syracuse University, USA

Workshop Advisory Committee:

Sandra Collins
Digital Repository of Ireland
Elena Feinstein
Dryad Repository, USA
Brigitte Jörg
CERIF National Co-ordinator, CASRAI UK Analyst/Co-ordinator, Jisc Innovation Support Center, UKOLN, University of Bath, Bath, UK
Johannes Keizer
Food and Agriculture Organization, Italy
John Kunze
California Digital Library, USA
José Merlo
Professor, Universidad de Salamanca, Spain
Eloy Rodrigues
University of Minho Documentation Services, Portugal
Robin Rice
EDINA and Data Library, University of Edinburgh
Naijla Rettberg
Göttingen State and University Library, Germany



DCMI logo DCMI's work is supported, promoted and improved by « Member organizations » around the world:

The National Library of Finland The National Library of Korea The National Library Board Singapore
MIMOS Berhad Simmons College GSLIS (US) Information School of the University of Washington
Research Center for Knowledge Communities, Tsukuba University Infocom Corporation (Japan)

DCMI logo DCMI's annual meeting and conference addresses models, technologies and applications of metadata

Join logo
Become a DCMI member...