Program Abstracts


» Full Papers (Peer Reviewed)
» Project Reports (Peer Reviewed)
» Posters (Peer Reviewed)
» Best Practice Posters
» Best Practice Demonstrations

DCMI logo Full Papers (Peer Reviewed)

Paper Author Paper Title & Abstract
Hannah Tarver, Oksana Zavalina, Mark Phillips, Daniel Alemneh & Shadi Shakeri OCS: 235
TITLE: How Descriptive Metadata Changes in the UNT Libraries' Collections: A Case Study
ABSTRACT: This paper reports results of an exploratory quantitative analysis of metadata versioning in a large-scale digital library hosted by University of North Texas. The study begins to bridge the gap in the information science research literature to address metadata change over time. The authors analyzed the entire population of 691,495 unique item-level metadata records in the digital library, with metadata records supplied from multiple institutions and by a number of metadata creators with varying levels of skills. We found that a high proportion of metadata records undergo changes, and that a substantial number of these changes result in increased completeness (the degree to which metadata records include at least one instance of each element required in the Dublin Core-based UNTL metadata scheme). Another observation of this study is that the access status of a high proportion of metadata records changes from hidden to public; at the same time the reverse process also occurs, when previously visible to the public metadata records become hidden for further editing and sometimes remain hidden. This study also reveals that while most changes –presumably made to improve the quality of metadata records– increase the record length, surprisingly, some changes decrease record length. Further investigation is needed into reasons for unexpected findings as well as into more granular dimensions of metadata change at the level of individual records, metadata elements, and data values. This paper suggests some research questions for future studies of metadata change in digital libraries that capture metadata versioning information.
Sébastien Peyrard, John A. Kunze, Jean-Philippe Tramoni OCS: 241
TITLE: The ARK Identifier Scheme: Lessons Learnt at the BnF and Questions Yet Unanswered
ABSTRACT: The Bibliothèque nationale de France (BnF) looks back at lessons learnt over a decade of implementing persistent identifiers (ARKs). Those lessons give an insight into what it means to maintain persistent identifiers in the mid-term. A systematic gap analysis between what is and what should be, especially in a Semantic Web context, lead to open questions about best practices and standards compliance.
Sivakumar Kulasekaran, Jessica Trelogan, Maria Esteva & Michael Johnson OCS: 242
TITLE: Metadata Integration for an Archaeology Collection Architecture
ABSTRACT: Current trends in data collections are moving toward infrastructure services that are centralized, flexible, and involve diverse technologies across which multiple researchers conduct simultaneous, parallel workflows. During the lifecycle of a project, from the collection of raw data through study to publication, researchers remain active curators and decide how to present their data so others can access and reuse it. In this context, metadata is key to ensuring that data and results remain organized and secure, but building and maintaining metadata can be cumbersome, especially in the case of large and complex datasets. This paper presents our work to develop a complex collection architecture with metadata at its core, for a large and varied archaeological collection. We use metadata, mapped to Dublin Core, to tie the pieces of this architecture together and to manage data objects as they move through the research lifecycle over time and across technologies and changing methods. This metadata, extracted automatically where possible, also fulfills a fundamental preservation role in case any part of the architecture should fail.
Tsunagu Honma, Kei Tanaka, Mitsuharu Nagamori & Shigeo Sugimoto OCS: 249
TITLE: Extracting Description Set Profiles from RDF Datasets using Metadata Instances and SPARQL Queries
ABSTRACT: A variety of communities create and publish metadata as Linked Open Data (LOD). Users of those datasets find and use them for their own purpose and may combine the datasets to add value. Each LOD dataset uses various vocabularies, structures and constraints for describing resources. In order to improve the usability of LOD datasets, it is very important for metadata designers to enhance the interoperability of their own metadata with that of other datasets. In order to create new interoperable metadata, metadata schema designers have to understand the Application Profiles of the existing LOD datasets. Dublin Core Description Set Profiles (DSP) are a component of DCMI Application Profiles. A DSP describes the structures and constraints of metadata in an application (e.g. resource classes, properties cardinality, value scheme). Metadata schema registries which collect and provide metadata schemas have a large potential for helping metadata schema designers find, compare and adopt existing schemas. However, most LOD datasets are not published with their DSPs. As a result, metadata schema designers have to look at each dataset and guess the DSPs. This paper proposes a method to extract the structural constraints of metadata records automatically from metadata instances using existing metadata schema. The goal of this study is to reduce the cost of metadata schema extraction and to increase the number of metadata schemas registered in metadata schema registries. We have experimentally extracted constraints from LOD datasets using SPARQL. In order to evaluate our approach, we applied our approach to 10 datasets in the DataHub. By comparing the structural constraints which were extracted using our approach with a manual approach, we found that our approach was able to extract more constraints.
Chunqiu Li & Shigeo Sugimoto OCS: 255
TITLE: Provenance Description of Metadata using PROV with PREMIS for Long-term Use of Metadata
ABSTRACT: Provenance description is necessary for long-term preservation of digital resources. PREMIS and OAIS, which are well-known standards designed for digital preservation, define descriptive elements for digital preservation. Metadata, which is a data about a primary digital resource, has to be preserved as well as the primary resource. However, due to the changing technology and information context, metadata is at risk in damage or even loss. Thus, metadata preservation is important as well as digital object preservation. Metadata provenance is a rather new research topic but critical for keeping metadata about preserved resources consistently over time. This paper discusses provenance description in two aspects – provenance of digital object and provenance of metadata including metadata schemas. These are called digital provenance and metadata provenance, respectively. The goal of this paper is to clarify the concepts of digital provenance and metadata provenance based on some well-known standards –PREMIS, OAIS, PROV, and so forth, and to propose a novel model of provenance description for digital preservation based on the ontologies of PREMIS and PROV. The paper firstly explains digital provenance and metadata provenance. Next, we outline some major models and standards for provenance description. Then, this paper proposes to integrate PROV-O with PREMIS OWL Ontology in order to merge the provenance description model in PROV-O and the digital preservation model in PREMIS OWL Ontology. This paper also presents the merged model using some maintenance scenarios of digital objects and metadata. Lastly, we discuss metadata schema provenance, metadata object provenance and some other open issues.
Thomas Bosch & Kai Eckert OCS: 257
TITLE: Requirements on RDF Constraint Formulation and Validation
ABSTRACT: For many RDF applications, the formulation of constraints and the automatic validation of data according to these constraints is a much sought-after feature. In 2013, the W3C invited experts from industry, government and academia to the RDF Validation Workshop, where first use cases have been presented and discussed. In collaboration with the W3C, a working group on RDF Application Profiles (RDFAP) is currently established in the Dublin Core Metadata Initiative that follows up on this workshop and addresses among others RDF constraint formulation and validation. In this paper, we present a database of requirements obtained from various sources, including the use cases presented at the workshop as well as in the RDF-AP WG. The database, which is openly available and extensible, is used to evaluate and compare several existing approaches for constraint formulation and validation. We present a classification and analysis of the requirements, show that none of the approaches satisfy all requirements and aim at laying the ground for future work, as well as fostering discussions how to close existing gaps.
Xiaozhong Liu, Miao Chen & Jian Qin OCS: 262
TITLE: Interlinking Cross Language Metadata Using Heterogeneous Graphs and Wikipedia
ABSTRACT: Cross-language metadata are essential in helping users overcome language barrier in information discovery and recommendation. The construction of cross-language vocabulary, however, is usually costly and intellectually laborious. This paper addresses these problems by proposing a Cross-Language Metadata Network (CLMN) approach, which uses Wikipedia as the intermediary for cross-language metadata linking. We conducted an experiment with key metadata in two digital libraries and in two different languages without using machine translation. The experiment result is encouraging and suggests that the CLMN approach has the potential not only to interlink metadata in different languages with reasonable rate of precision and quality but also to construct cross-language metadata vocabulary. Limitations and further research are also discussed.
Richard J. Urban OCS: 263
TITLE: The 1:1 Principle in the Age of Linked Data
ABSTRACT: This paper explores the origins of the Dublin Core 1:1 Principle within DCMI documentation. It finds that the need for the 1:1 Principle emerged from prior work within the cultural heritage community responsible for describing reproductions and surrogate resources within traditional cataloging environments. As the solutions to these problems encountered new ways to model semantic data, tensions arose within DCMI community. This paper aims to fill the gaps in our understanding of the 1:1 Principle by outlining the conceptual foundations that led to its inclusion in DCMI documentation, how the Principle has been (mis)understood in practice, how violations of the Principle have been operationalized, and how the fundamental issues raised by the Principle continue to challenge us today. This discussion situates the 1:1 Principle within larger discussions about cataloging practice and semantic knowledge representations.
Konstantin Baierer, Evelyn Dröge, Vivien Petras & Violeta Trkulja OCS: 265
TITLE: Linked Data Mapping Cultures: An Evaluation of Metadata Usage and Distribution in a Linked Data Environment
ABSTRACT: In this paper we present an analysis of metadata mappings from different providers to a Linked Data format and model in the domain of digitized manuscripts. The DM2E model is based on Linked Open Data principles and was developed for the purpose of integrating metadata records to Europeana. The paper describes the differences between individual data providers and their respective metadata mapping cultures. Explanations on how the providers map the metadata from different institutions, different domains and different metadata formats are provided and supported by visualizations. The analysis of the mappings serves to evaluate the DM2E model and provides strategic insight for improving both mapping processes and the model itself.
Mark A. Matienzo & Amy Rudersdorf OCS: 267
TITLE: The Digital Public Library of America Ingestion Ecosystem: Lessons Learned After One Year of Large-Scale Collaborative Metadata Aggregation
ABSTRACT: The Digital Public Library of America (DPLA) aggregates metadata for cultural heritage materials from 20 direct partners, or Hubs, across the United States. While the initial build-out of our infrastructure used a lightweight ingestion system that was ultimately pushed into production, a year’s experience has allowed DPLA and its partners to identify limitations in that system, the quality and scalability of metadata remediation and enhancement possible, and areas for collaboration and leadership across the partnership. Although improved infrastructure is needed to support aggregation at this scale and complexity, ultimately DPLA needs to balance responsibilities across the partnership and establish a strong community that shares ownership of the aggregation process.
Thomas Bosch, Kai Eckert OCS: 270
TITLE: Towards Description Set Profiles for RDF using SPARQL as Intermediate Language
ABSTRACT: Description Set Profiles (DSP) are used to formulate constraints on valid data within a Dublin Core Application Profile. For RDF, SPARQL is generally seen as the method of choice to validate data according to certain constraints, although it is not ideal for their formulation. In contrast, DSPs are comparatively easy to understand, but lack an implementation to validate RDF data. In this paper, we use SPIN as basic validation framework and present a general approach how domain specific constraint languages like DSP can be executed on RDF data using SPARQL as an intermediate language.
Andias Wira Alam OCS: 271
TITLE: Dublin Core Metadata for Research Data–Lessons Learned in a Real-World Scenario with datorium
ABSTRACT: As a continuation of our work in a project datorium, we provide a service for autonomous documentation and upload of research data. In this paper, we discuss and share our experience developing such a service by using Dublin Core Metadata in a real world scenario. Even small and simple, DC Metadata is an appropriate standard to be taken as basic metadata e.g. in the DSpace repository system. The required elements for describing research data are mostly complex, in particular the acquired information about the data such as survey methods, survey periods, or number of variables. DC Metadata cannot over all elements needed in the research data repository. With some extended elements and front-end based manipulations, however, we show some approaches to make the application useful and cover complex description without overcoming the “simplicity” of the DC metadata.

DCMI logo Project Reports (Peer Reviewed)

Paper Author Paper Title & Abstract
Andrew Weidner, Annie Wu & Santi Thompson OCS: 218
TITLE: Automated Enhancement of Controlled Vocabularies: Upgrading Legacy Metadata in CONTENTdm
ABSTRACT: To ensure robust, reliable, retrievable and sharable metadata, the University of Houston (UH) Libraries initiated a Metadata Upgrade Project in 2013 to systematically audit and refine the quality of the metadata in the University of Houston Digital Library (UHDL). Still in progress, the Metadata Upgrade project has already produced significant improvements in the UHDL's legacy metadata. The final phase of the Metadata Upgrade Project includes aligning controlled vocabulary terms with appropriate authorities and adding and revising descriptive content in the digital library. This is a time intensive process that requires careful evaluation and entry of name and subject authority terms. To improve efficiency and accuracy during the data entry process, the metadata librarian at UH Libraries developed name and subject authority applications that automatically transform legacy controlled vocabulary terms into authorized forms. This project report will provide an overview of the University of Houston's Metadata Upgrade Project, a discussion of how the UHDL’s upgraded metadata improves discoverability of our collections, and an in-depth look at the custom tools that automate the authority alignment process in the CONTENTdm Project Client.
Stefanie Rűhle, Francesca Schulze & Michael Bűchner OCS: 231
TITLE: Applying a Linked Data Compliant Model: The Usage of the Europeana Data Model by the Deutsche Digitale Bibliothek
ABSTRACT: In 2013/14 the Deutsche Digitale Bibliothek (DDB) switched its data model from the CIDOC conceptual reference model to the Europeana Data Model (EDM). This decision was taken on the background of two major mandates the DDB has to fulfill: The DDB is as a portal and a platform providing access to digital objects from German cultural heritage and research institutions. On the other hand the DDB aims to become the German aggregator for Europeana. Using EDM as the internal DDB data model was approved as the most reasonable solution to meet these challenges. The DDB uses the model for all portal functions that require semantic links between metadata (search facets, hierarchies, links between authority files and digital objects). The application of EDM for the DDB portal raised some difficulties since not all necessary classes and properties were entirely implemented in Europeana-EDM at that time. Therefore, a DDB-EDM application profile was developed. The DDB publishes metadata under the CC0 Public Domain Dedication license in EDM-RDF/XML via an OAI-PMH interface to serve Europeana and also via an Application Programming Interface (API) for external users to develop new applications on the basis of metadata harmonized by the DDB.
Sharon Farnel & Ali Shiri OCS: 236
TITLE: Metadata for Research Data: Current Practices and Trends
ABSTRACT: Currently, there are a number of research data service providers that allow deposit of research data or gather metadata for research data housed elsewhere. Examples include Datacite (http://www.datacite.org/), Dataverse Network (http://thedata.org/), Dryad (http://datadryad.org/), and FigShare (http://figshare.com/). These services make use of a broad range of metadata practices and elements. The objective of this study is to examine the metadata standards and formats used by a select number of research data services to address the following specific research questions: 1) What is the number and nature of metadata elements available? 2) Do any of the services provide research data specific metadata elements in addition to common metadata elements? 3) Do the research data management services adhere to widely recognized metadata, interoperability and preservation standards? 4) What research data repositories benefit from and promote controlled vocabularies for subject description and access? 5) Is there support for unique identifiers (e.g., DOIs)? 6) What kind of metadata assistance (documentation, etc.) is provided? 7) What metadata elements are common and different across these services? The results of this study will contribute to a better understanding of the development and application of metadata in research data services as well as to the development of an interoperable research data environment.
Jing Wan, Yubin Zhou, Gang Chen & Junkai Yi OCS: 247
TITLE: Designing a Multi-level Metadata Standard based on Dublin Core for Museum Data
ABSTRACT: Metadata is a critical aspect of describing, managing and sharing museum data. It is challenging to develop a general standard that will meet the requirements of different museums due to the large range of data types. The capability of concise description and the simplicity of use need to be considered. In this paper, we report on a finished project that aims to design the metadata for museum in China. An extensible metadata standard based on Dublin Core is presented, which includes a core metadata, extension rules and specific metadata. For the core metadata, we introduce the terms, definitions, registration rules and detailed examples of description. The principle of choosing the terms and refinements is discussed. A specific metadata for porcelain is discussed as an extension example.
Deborah Maron, Cliff Missen & Jane Greenberg OCS: 259
TITLE: "Lo-Fi to Hi-Fi": A new metadata approach in the Third World with the eGranary Digital Library
ABSTRACT: Digital information can bridge age-old gaps in access to information in traditionally underserved areas of the world. However, for those unfamiliar with abundant e-resources, their early exposure to the digital world can be like "drinking from a fire hose." For these audiences, abundant metadata and findability, along with easy-to-use interfaces, are key to their early success and adoption. To hasten the creation of metadata and user interfaces, the authors are experimenting with "crowd cataloging." This report documents their work and Maron's Lo-Fi to Hi-Fi metadata pyramid model guiding a developing metadata initiative being pursued with the eGranary Digital Library, the technology used by Widernet in a global effort to ameliorate information poverty. The Lo-Fi to Hi-Fi model, with principles adapted from technical design processes, aligns with research that has shown that community-based librarians are better poised to identify culturally congruent resources, but many require significant training in metadata concepts and skills. The model has students crowdsource "lo-fi" terms, which domain experts and information professionals can curate and cull in "hi-fi" to enhance findability of resources within the eGranary while simultaneously honing their own computer, information and metadata literacies. Though the focus here is on Africa, the findings and practices can be universalized to eGranaries around the globe, if successful.
Jeff Keith Mixter, Patrick OBrien & Kenning Arlitsch OCS: 269
TITLE: Describing Theses and Dissertations Using Schema.org
ABSTRACT: This report discusses the development of an extension vocabulary for describing theses and dissertations, using Schema.org as a foundation. Instance data from the Montana State University ScholarWorks institutional repository was used to help drive and test the creation of the extension vocabulary. Once the vocabulary was developed, we used it to convert the entire ScholarWorks data sample into RDF. We then serialized a set of three RDF descriptions as RDFa and posted them online to gather statistics from Google Webmaster Tools. The study successfully demonstrated how a data model consisting of primarily Schema.org terms and supplemented with a list of granular/domain specific terms can be used to describe theses and dissertations in detail.

DCMI logo Posters (Peer Reviewed)

Poster Author Poster Title & Abstract
Jamie Viva Wittenberg OCS: 228
TITLE: Retaining Metadata in Remixed Cultural Heritage Objects
ABSTRACT: Increasingly, cultural heritage institutions have been working to incorporate features into their collections and websites that empower users to take ownership of cultural narratives. Annotation tools, crowd-sourced tagging, and greater social media presences are characteristic of libraries, archives, and museums. Some institutions have begun to offer digital content to patrons that they are encouraged to remix. Endorsement of remixing as a way of engaging with cultural heritage material requires a metadata infrastructure that can support description of remixed content in a way that is comprehensive, interoperable, and scalable. The movement towards enabling remixes of cultural heritage materials threatens existing metadata models because it requires systemic change in the granularity of descriptive metadata and in metadata creation workflows. This poster analyzes the viability of employing the existing MODS and Dublin Core standards to create descriptive metadata for remixed content.
Ana Cox OCS: 229
TITLE: Embedded Metadata–A Tool for Digital Excavation
ABSTRACT: In June of 2012, I began the weighty task of searching the far reaches of Phoenix Art Museum's digital storage spaces to import images into a recently acquired collection management system, The Museum System (TMS). I excavated long forgotten folders on various servers and desktops, hunting for visual documentation of the art collection and past installations. I began using embedded metadata as a tool to identify images of art objects and indicate which folders and files I had searched. Then I reorganized these assets in a digital asset management system (DAMS) with a new file and folder structure. A custom XML panel, developed by the Visual Resource Association Embedded Metadata Working Group, provided a pre-established controlled vocabulary that adheres to Dublin Core and VRA core guidelines. This tool combined with tools offered by Adobe Bridge, such as batch metadata editing and file renaming greatly increased my workflow and the quality of my data. After my initial survey over five months, I was able to import about 10,000 files into TMS, which is a 280% increase from the files imported into the previous collection management system. This poster will discuss my method for using embedded metadata to track information about digital assets as well as challenges and opportunities for further development. This method could be implemented by other cultural organizations as a low cost approach to tracking basic metadata, content creators and copyright restrictions.
Cleverton Ferreira Borba & Pedro Luiz P. Correa OCS: 232
TITLE: Dublin Core for Species Distribution Modeling
ABSTRACT: This poster presents the use of the Dublin Core for tools that make species distribution modeling. As a case study, this poster proposes the use of the Dublin Core for there to be a connection between the models generated by tools of species distribution, contributing to the area for biodiversity informatics.
Jody Perkins & Quinn Dombrowski OCS: 245
TITLE: Building Bridges to the Future of a Distributed Network: From DiRT Categories to TaDiRAH, a Methods Taxonomy for Digital Humanities
ABSTRACT: Efforts to establish centralized hubs of information relevant to digital humanities (DH) have proven unsustainable over the long term. Comprehensive hubs are currently being re-designed with a smaller scope and focused curation. However, this smaller scope comes with the risk of decontextualization–a digital humanities project is best understood through the intersection of its subject matter, methodologies and applications, not all of which are captured by any single site. This poster will trace the development and application of 'TaDiRAH' a shared taxonomy of digital humanities research activities and objects, created for the purpose of bridging the divide between related digital humanities hubs.
Adrian T. Ogletree OCS: 250
TITLE: Research Data Reproducibility Through Shared Metadata Workflows: A Survey of DataNet Federation Consortium Collaborators
ABSTRACT: This poster presents research results from a recent survey studying metadata workflows. The survey was distributed to DataNet Federation Consortium researchers and collaborators asking participants about their organizational practices involving metadata creation. Data management best practices recommend that data documentation happens at the very beginning of the research project, before data collection. However, these results indicate that more scientific metadata is created during or after the data collection process than before, and that few researchers take advantage of automated metadata generation workflows. Data curators, librarians, and archivists (or their automated systems) can assist researchers by intervening earlier in the data life cycle in order to produce higher-quality metadata and ensure long-term preservation.
Ying Feng & Long Xiao OCS: 252
TITLE: A Cooperative Project by Libraries and Museums of China: Metadata Standards for the Digital Preservation of Cultural Heritage
ABSTRACT: This poster introduces a project that aims to build metadata standards for digital preservation of cultural heritage and plans to begin in this year. Research and demonstration will be made by collaborative effort among seven libraries and museums. The objectives focus on the demands for business management, digitization, management of digital content, long-term preservation of digital content, and the establishment of a knowledge database for cultural heritage.
Naomi Eichenlaub, Marina Morgan & Ingrid Masak-Mida OCS: 256
TITLE: Undressing Fashion Metadata: Ryerson University Fashion Research Collection
ABSTRACT: The purpose of this poster is to provide insight into the processes involved in making a unique fashion research and teaching collection, discoverable in an online environment at Ryerson University. The poster will highlight effective metadata standards and elements, cross-domain metadata uses, metadata mapping and implementation. The key goals of this digital collection are to promote research, teaching and learning at Ryerson University, and to connect with a broader community by building scholarly, online exhibitions. Once the digital collection is finalized, it will be used as a pedagogical tool and it will inspire fashion students and scholars to undertake research into Canada's fashion history since the collection includes many examples from Canadian designers and retailers in addition to international designers.

DCMI logo Best Practice Posters

Poster Author Poster Title & Abstract
Jason Thomale & William Hicks OCS: 272
TITLE: A Library Catalog REST API Framework
ABSTRACT: Many library catalogs and systems remain isolated in 2014. Although we have made significant strides over the past decade to open our metadata, many individual libraries rely heavily on the ILS vendors to implement open protocols, standards, and APIs. At the University of North Texas Libraries, we have been developing a REST API framework for exposing our catalog and ILS metadata, taking our first steps toward breaking away from this limited paternalistic model. Catalog resources that we've modeled so far include bibliographic records (modified from MARC), item-level records, branch location records, item type records, and item status records. We are also working on resources that support a shelf-list browser application, which mix user-supplied data with item and bibliographic metadata and demonstrate a real-world use for the API. But, our framework is not merely an API for our particular ILS. Rather, we are developing a toolset to allow us to extract and re-model our ILS data—to use data derived from our ILS but not necessarily to adhere to ILS data models—and expose the data as RESTful, linked resources. Although our initial efforts have focused on modeling resources that do closely align with ILS entities, future development will include extended models for work- and identity-related resources and possibly extending our APIs to expose linked data (using, e.g., JSON-LD). Best practices in this area, exposing ILS metadata as RESTful resources, are hard to come by. Given the mixture of metadata practitioners, systems- and web-oriented individuals that the DCMI conferences attract, we hope that presenting a poster about the project in the Best Practices track might allow us to connect with new dialog partners. Ultimately, we believe an exchange of information about our project so far would be valuable to us and to others in the DCMI community.
Susan Matveyeva & Lizzy Anne Walker OCS: 275
TITLE: Building the Bridge: Collaboration between Technical Services and Special Collections
ABSTRACT: The poster describes the process and result of the work of a group of librarians from Technical Services and Special Collections on the development of metadata standards and practices for digitization. At Wichita State University Ablah Library, members of Technical Services and Special Collections were assigned a mass digitization project of Special Collections holdings. The departments collaborated to increase the visibility and accessibility of Special Collections and to digitally preserve their brittle rare materials. Both departments scanned collections, added metadata to the scanned images, and uploaded them to CONTENTdm. The departments faced challenges in regard to the mass digitization, such as lack of common standards, inconsistent metadata, and limited CONTENTdm expertise. Additionally, there had not been a dedicated metadata cataloger on staff in Special Collections. Staff from both departments created a metadata group which was responsible for decision making in regard to metadata fields used for manuscripts and printed books. Investigation of standards and best practices, creation of data dictionaries, and mapping templates were only a few of the topics focused on by this subcommittee. The group developed two metadata templates (minimal and core) for published and unpublished materials. The templates focused on access to collections; future migration; and preservation. Both departments agreed on common standards for rare books and manuscripts cataloging and used the same best practices on sharable metadata. This has been a positive learning experience for both departments. Bringing expertise of catalogers and uniqueness of Special Collections together has helped them to be less isolated. The implementation of metadata and cataloging standards creates a layer of interoperability, and increases the potential of users finding what they need. Additionally, the departments have a new working relationship that will hopefully continue in the future.
Jason W. Dean & Deborah E. Kulczak OCS: 279
TITLE: Best Practices for Complex Diacritics Handling in CONTENTdm
ABSTRACT: This poster is based upon a recently completed project at the University of Arkansas Libraries that dealt with metadata and items in a plethora of languages, from English and French to Quapaw, many of which required the use of unusual diacritical marks. Such diacritics and special characters are ubiquitous not only in cultural resources associated with the humanities, but also in scientific and technical materials, and their correct rendering is often necessary for meaning. The poster will describe best practices for generating, converting, and ingesting diacritics into CONTENTdm Digital Collection Management Software (used by more than 2,000 institutions worldwide) for either metadata in a tab delimited file, or an accompanying text or translation document, or a controlled vocabulary list. Best practices for encoding and diacritics are confusing at best as described in the CONTENTdm support, and this poster aims to fill this knowledge gap. Specific software to be discussed includes Excel, Open Office Calc, and Notepad ++
Carolyn Hansen & Sean Crowe OCS: 281
TITLE: Making Vendor-Generated Metadata Work for Archival Collections Using VRA and Python
ABSTRACT: The purpose of this poster is to illustrate a successful workflow for improving vendor-generated metadata for a large digital collection of archival materials by converting the metadata from the Dublin Core standard to the VRA standard using the scripting language Python.
Julie Fukuyama & Akiko Hashizume OCS: 282
TITLE: The NDL Great East Japan Earthquake Archive: Features of Metadata Schema
ABSTRACT: The National Diet Library (NDL), Japan, in conjunction with numerous other organizations, has developed the Great East Japan Earthquake Archive Project for the collection, preservation, and provision of information related to the earthquake that struck Japan on March 11, 2011. A portal site for this project was developed by the NDL and opened to the public on March 2013. The portal site enables integrated searches of many resources on the earthquake and subsequent disasters, including images, video, websites, reports, and books, that were produced by institutions, such as mass media companies, universities or academic societies. The poster presents the Great East Japan Earthquake Archive Metadata Schema (NDLKN)[1] developed for this portal. This schema is based on the National Diet Library Dublin Core Metadata Description (DC-NDL), which is our own metadata schema, based on the DCMES and DCMI Metadata Terms. There were two major issues to solve in the development of NDLKN. The first was coordination of metadata in various systems over multiple domains. The second was to satisfy requirements for archiving disaster records, which need to have geographic and temporal information. As a solution to the latter issue, for example, some terms were adopted from the Basic Geo (WGS84 lat/long) Vocabulary ([geo:lat] for the latitude, [geo:lond] for the longitude etc.) and Ontology for vCard ([v:region] for the prefecture, [v:locality] for the city etc.) for geospatial information and [dcterms:created] for the date the image or video was recorded. Furthermore, we described the name and URI of disasters in [dcterms:coverage].
Ashleigh N. Faith, Eugene Tseytlin & Tanja Bekhuis OCS: 285
TITLE: Development of the EDDA Study Design Terminology to Enhance Retrieval of Clinical and Bibliographic Records in Dispersed Repositories
ABSTRACT: Medical terminology varies across disciplines and reflects linguistic differences in communities of clinicians, researchers, and indexers. Inconsistency of terms for the same concepts impedes interoperable metadata and retrieval of information artifacts, such as records of clinical reports and scientific articles that reside in various repositories. To facilitate information retrieval and, more recently, data sharing, the medical community maintains an assortment of terminologies, thesauri, and ontologies. Valuable resources include the US National Library of Medicine Medical Subject Headings (MeSH), Elsevier Life Science thesaurus (Emtree), and the National Cancer Institute Thesaurus (NCIt). It is increasingly important to identify medical investigations by their design features, as these have implications for evidence regarding research questions. Recently, Bekhuis et al (2013) found that coverage of study designs was poor in MeSH and Emtree. Based on this work, the EDDA Group at the University of Pittsburgh is developing a terminology of designs. In addition to randomized controlled trials, it covers observational or uncontrolled designs. Among the resources analyzed thus far, inconsistent entry points, semantic labels, synonyms, and definitions are common. The EDDA Study Design Terminology is freely available in the NCBO BioPortal (http://purl.bioontology.org/ontology/EDDA). The current version has 169 classes. Some of the preferred terms have several variants, definitions (sometimes competing) labeled for source (MeSH, Emtree, NCIt) and year, as well as IDs such as concept identifiers useful for other researchers. The beta version was developed using the Protégé ontology editor v.4.3 (http://protege.stanford.edu) and distributed as an OWL file. DCMI protocols are in place for recording term metadata and OWL annotations. Further development entails adding definitions from other sources, mapping relationships among terms, and integrating terms from existing vocabularies, particularly the Information Artifact Ontology. A primary goal is to improve identification and retrieval of electronic records describing studies in dispersed data warehouses or electronic repositories.
Emily Porter OCS: 288
TITLE: Normalizing Decentralized Metadata Practices Using Business Process Improvement Methodology: a Data-Informed Approach to Identifying Institutional Core Metadata
ABSTRACT: The Emory University Libraries and Emory Center for Digital Scholarship have developed numerous digital collections over the past decade. Accompanying metadata originates via multiple business units, authoring tools and schemas, and is delivered to varied destination platforms. Seeking a more uniform metadata strategy, the Libraries' Metadata Working Group initiated a project in 2014 to define a set of core, discovery-focused, schema-agnostic metadata elements supporting local content types. Quantitative and qualitative techniques commonly used in the field of Business Process Improvement were utilized to mitigate complex organizational factors. A key research deliverable emerged from benchmarking: a structured comparison of over 30 element sets, recording for each standard its descriptive element names, their required-ness, and general semantic concepts. Additional structured data collection methodologies included a diagnostic task activity, in which participants with varying metadata expertise created (simple) Dublin Core records for selected digital content. A survey of stakeholders provided greater context for local practices. Multiple public-facing discovery system interfaces were inventoried to log search, browse, filter, and sort options, and available web analytics were reviewed for user activity patterns correlating to these options. Thematic analysis was performed on all benchmarking, system profiles, and web analytics data to map the results to a common set of conceptual themes, facilitating quantification and analysis. A weighted scoring model enabled the ranking of elements' themes: the highest scoring concepts then explicated as an initial set of core elements, mapped to relevant standards and schemas.
Sean Petiya OCS: 290
TITLE: Converting Personal Comic Book Collections to Linked Data
ABSTRACT: The comic book domain has received a great deal of attention in recent years as superhero movies dominate popular culture, and both graphic novels and manga continue to find their way to library shelves and special collections. This poster describes progress on the Comic Book Ontology (CBO), a metadata vocabulary in development for the description of comic books and comic book collections. It presents a diagram of the model and outlines the methodology and rationale for producing a core application profile composed of a subset of elements from the vocabulary, which represent the minimal commitment necessary to make a unique statement about a resource. Additionally, it illustrates how that core application profile is used to generate XML/RDF records from user data contained in spreadsheets, a popular method of cataloging personal comic book collections.
Robert H. Estep OCS: 292
TITLE: How To Build A Local Thesaurus
ABSTRACT: A step-by-step approach to building a thesaurus of subject terms, both LC and Local for a specific digitization project. The thesaurus was the responsibility of the Cataloging group which provided enhanced metadata for a large and ongoing collection of images, in the form of individual subject terms and detailed descriptions.
Constanze Curdt & Dirk Hoffmeister OCS: 294
TITLE: The TR32DB Metadata Schema: A multi-level Metadata Schema for an Interdisciplinary Project Database
ABSTRACT: This poster presents the self-developed, multi-level TR32DB Metadata Schema. It was designed and implemented with the purpose to describe all heterogeneous data, which are created by project participants of an interdisciplinary research project, with accurate, interoperable metadata. The metadata schema considers the interoperability to recent metadata standards and schemas. It is applied in the CRC/TR32 project database (TR32DB, www.tr32db.de), a research data management system, to improve the documentation, searchability and re-use of the data. The TR32DB is established for a multidisciplinary, long-term research project, the Collaborative Research Centre/Transregio 32 ‘Patterns in Soil-Vegetation-Atmosphere Systems: Monitoring, Modelling, and Data Assimilation’ (CRC/TR32, www.tr32.de), funded by the German Research Foundation.
Michael Dulock OCS: 295
TITLE: Reusing Legacy Metadata for Digital Projects: the Colorado Coal Project Collection
ABSTRACT: Libraries and other cultural institutions are increasingly focused on efforts to unearth hidden and unique collections. Yet the metadata describing these collections, when such exist, may not be in an immediately useable format. In some cases the metadata records may be as exceptional as the materials themselves. In this poster I will discuss my research into how libraries can repurpose metadata in archaic formats using the Colorado Coal Project Collection slides as a case study. The Colorado Coal Project Collection documents the history of coal mining in the western United States, primarily focusing on the early 20th century. The collection comprises ninety video and audio files of interviews with coal miners, community members, and historians, transcripts for most of the interviews, and over four thousand slides depicting life around and in the mine. The collection touches on themes ranging from mine camp life to immigration to labor conditions and strikes.

The slides are accompanied by over four thousand McBee edge-notched cards, a manual computing format that saw occasional use for medical, legal, and library records in the mid-20th century. These cards contain written notes as well as punches around the edge which indicate various features of the slides such as buildings, locations, dates, subject matter, and technical details. Transferring this rich metadata from thousands of cards into a format with which the digital initiatives team could work, and eventually import into a digital library collection, was a challenge. The poster will examine the process of transferring the robust metadata recorded on these arcane cards to a 21st century digital library collection, utilizing a combination of student labor, Metadata Services staff, MS Excel, and careful quality control.
Virginia A. Dressler OCS: 296
TITLE: Applying Concepts of Linked Data to Local Digital Collections to Enhance Access and Searchability
ABSTRACT: Currently, Kent State University Library is preparing to redesign its online exhibits and digital collections onto a different content management system. The plan will entail migrating existing digital collections to another platform, and in so, provide a more inclusive search mechanism to enhance access. In order to prepare for this migration, we are currently mapping the existing digital collections into a new metadata schema for the proposed solution, from a locally created and hosted framework into a more sustainable platform with a consolidated, searchable base for all digital objects and corresponding metadata. This work includes transferring the current tailored, in-house method of operation, and transposing the MySQL data to a RDF structure for the new solution. Principles of Linked Data will also be applied to the accompanying metadata files to further increase connections within the digital collections. The biggest resulting change from this shift of a homegrown solution into an extensible, open access platform is for the capability of searching across multiple collections. Currently, cross collection searching is not possible in the current interface, and there are related materials between several existing digital collections that would benefit as a result of this change. The poster will address the shift in the ideology into this new framework and highlight the benefits in this switch.
Michael Lauruhn & Elshaimaa Ali OCS: 298
TITLE: Wikipedia-based extraction of lightweight ontologies for concept level annotation
ABSTRACT: This poster describes a project under development in which we propose a framework for automating the development of lightweight ontologies for semantic annotations. When considering building ontologies for annotations in any domain, we follow the process of ontology learning in Stelios 2006, but since we are looking for lightweight ontology, we only consider a subset of these tasks, which are the acquisition of domain terminologies, generating concept hierarchies, learning relations and properties, and ontology evaluation.

When developing the framework modules we rely in most of our knowledge base on the structure of the Wikipedia, which is the category and the link structure of the Wikipedia pages in addition to specific sections of the content. To ensure machine understandability and interoperability, ontologies have to be explicit to make an annotation publicly accessible, formal to make an annotation publicly agreeable, and unambiguous to make an annotation publicly identifiable. An important aspect of building the domain ontology is to define an annotation schema that allows the developed ontologies to be reused and be part of linked data, we designed our schema based on annotation elements already defined in the Dublin Core standards and we also used the dbpedia schema for defining annotation elements for named entities. We developed additional annotation elements that will define domain concepts and context, also relations between concepts. These annotation elements are based on the link structure of the Wikipedia, their definition in the Wikipedia page and the category structure of the concepts. The framework modules include; domain concept extraction, semantic relatedness measures, concepts clustering and Wikipedia based relation extraction.
Timothy Cole, Michael Norman, Patricia Lampron, William Weathers, Ayla Stein, M. Janina Sarol & Myung-Ja Han OCS: 299
TITLE: MARC to schema.org: Providing Better Access to UIUC Library Holdings Data
ABSTRACT: The University of Illinois at Urbana-Champaign (UIUC) Library has shared 5.5 million bibliographic catalog records. As released these include detailed information about physical holdings at Illinois, allowing consumers to know exactly which volumes or parts of the creative work described are available at UIUC. UIUC catalog records are (or soon will be) available as MARCXML, as MODS enriched with links to name and subject authorities, and as RDF (using schema.org semantics). This poster reports on the development of workflows for this project, on the multiple views of the catalog being made available, and on the lessons learned to date.
Ann Ellis OCS: 280
TITLE: Designing an Archaeology Database: Mapping Field Notes to Archival Metadata
ABSTRACT: The Stephen F. Austin State University Center for Digital Scholarship and Center for Regional Heritage Research engaged in a collaborative project to design and implement a database collection in a digital archive that would accommodate images, data and text related to archaeological artifacts located in East Texas. There were challenges in creating metadata profiles that could effectively manage, retrieve and display the disparate data in multiple discovery platforms.

The poster illustrates the steps that were taken to map field notes into useful archival metadata. Using original notes and field record information a preliminary data dictionary was created. After collaborative edits and revisions were made, a comprehensive data dictionary was designed to represent the materials in the collection. From this, a profile was configured in the digital archive platform to allow for upload of the metadata and images, and for discovery and display of the archaeological artifacts and related works.
Lisa Federer OCS: 304
TITLE: Utilizing Drupal for the Implementation of a Dublin Core-Based Data Catalog
ABSTRACT: As funders and publishers increasingly require data sharing, researchers will need simple, intuitive methods for describing their data. Open-source systems like Drupal and extensible metadata schema like Dublin Core will likely play a large role in data description, thus making data more discoverable and facilitating data re-use. The objective of this project is to create a data catalog suitable for use within the context of biomedical and health sciences research within the National Institutes of Health (NIH) Library. The NIH Library serves the community of NIH intramural researchers, which includes over 1,200 principal investigators and 4,000 postdoctoral fellows conducting basic, translational, and clinical research on its primary campus in Bethesda, MD, and several satellite campuses. The ideal catalog would allow researchers to easily describe their data using Dublin Core Metadata Terms and subject-appropriate controlled vocabularies, as well as provide search and browse capabilities for end users to enable data discovery and facilitate re-use.

A pilot system is currently undergoing testing with researchers within the NIH intramural community. Drupal, a free and open-source content management system, was utilized as a framework for a data catalog using the Dublin Core Metadata Terms. Using the Structure function within Drupal, the research data informationist at the NIH Library constructed a pilot system that utilized Dublin Core Metadata schema and relevant biomedical taxonomies. Results will be available by the time of the DCMI 2014 conference. A data catalog that utilizes an extensible metadata schema like Dublin Core and an open-source framework like Drupal provides users a powerful yet uncomplicated method for describing their data. This pilot system can be adapted to the needs of a variety of basic, translational, and clinical research applications.
Joelen Pastva & Valerie Harris OCS: 308
TITLE: PunkCore: Developing an Application Profile for the Culture of Punk
ABSTRACT: PunkCore is a Dublin Core Application Profile (DCAP) for the description of the culture of Punk, including its music, its places, its fashions, its artistic expression through film and art, and its artifacts such as fliers, patches, buttons, and other ephemera. The structure of PunkCore is designed to be simple enough for non-experts yet specific enough to meet the needs of information professionals and to capture the unique qualities of materials classified as Punk. In the interest of interoperability and adoptability, PunkCore is drawn from existing metadata schema, and the development of PunkCore is intended to be open and collaborative to appeal to the entire Punk community. Our poster illustrates the initial development of the PunkCore standard and outlines future plans to bring PunkCore to the community.

The PunkCore DCAP is in its first phase of development, which follows Singapore Framework stages 1 and 2, including the creation of a functional requirements document and domain model. In order to capture the specificity of Punk culture, a preliminary genre vocabulary has been also been developed. The functional requirements document, domain model, and genre vocabulary will be published on a wiki for community discussion and feedback. The remaining phases of development, including the creation of a description set profile and usage guidelines, will be initiated following our review of community interest and comments. The ultimate goal of this DCAP is to reach the Punk community and achieve broad adoption. The outcome of our work would aid in the effective acquisition and dissemination of Punk materials, or their metadata, in a variety of settings. Our project will also be useful to other niche communities documenting their cultural contributions because it provides a model that incorporates community outreach with traditional metadata development to lend more credibility and visibility to the end result.
Serhiy Polyakov & Oksana L Zavalina OCS: 309
TITLE: Approaches to Teaching Metadata Course at the University of North Texas
ABSTRACT: This best practices poster discusses approaches to teaching Metadata and Networked Information Organization and Retrieval course in the Department of Library and Information Sciences, University of North Texas. The poster describes how this course was developed and evolved, teaching methods, topics covered, students' activities, and technology used. We share the experiences of using real-life projects which facilitate the development of practical skills of the students. The approaches to teaching Metadata course in UNT include combination of theoretical preparation, team work, and extensive practical experience which are very important assets on the job market.

DCMI logo Best Practice Demonstrations

Poster Author Poster Title & Abstract
(Boaz) Sunyoung Jin OCS: 293
TITLE: A Model and Roles of a Common Terminology to Improve Metadata Interoperability
ABSTRACT: Interoperability issues pose a barrier to sharing and exchanging information among digital libraries and repositories. This is due to the use of diverse metadata standards, and their different degrees of generality or specificity. This causes loss of information at all metadata model levels (e.g., schema, schema language, record, and repository). As a possible solution, roles and an abstract model for a Common Terminology (CT) based on the DCMI abstract model are proposed. The Common Terminology (CT) aims allowing communities to use their own standards while providing uniformity to searching, and improving interoperability at all metadata levels. As a best practice development, the CT has developed as a bridge across different generality and specificity levels in commonly used standards (MARC, MODS, DC, and QDC). The developed CT is a set of 12 Common Terms (properties) and 58 qualifiers (sub-properties). The CT is implemented in XML, RDF, and SKOS. The performance of CT in achieving and improving metadata interoperability is presented through empirical evaluations. The experiments are conducted using Harvard (MARC), MIT (QDC), and UIUC (MARCXML) metadata records obtained through the cooperation of three universities in the USA. The CT reduces significantly the gaps of different degrees of generality or specificity showing high lexical and semantic match rates in mappings. It minimizes loss of information at multiple levels. The planned prototype will provide a portal for Harvard, MIT and UIUC libraries with the Linked Open Data by the CT and CT union catalog. The LOD will connect several million online accessible records of three universities on the secure Web. The CT will give an innovation solution to improve interoperability for university libraries and further for libraries and organizations to cooperate in sharing information reducing loss of information at multiple metadata levels.
Matthew Miller & M. Cristina Pattuelli OCS: 297
TITLE: Ecco!: A Linked Open Data Service for Collaborative Named Entity Resolution
ABSTRACT: This demo proposal presents Ecco!, a Linked Open Data application designed to disambiguate and reconcile named entities with URIs from authoritative sources. Technically, Ecco! creates a wrapper around LOD APIs of suitable datasets such as VIAF and Freebase to retrieve data useful for supporting entity matching. The system automatically ranks and groups the results into different clusters according to various confidence levels – from exact matches to one to many or no matches. The quality of the data output can be further refined through human disambiguation consisting of validating a match or identifying the correct URI when multiple matches are possible. Ecco! is designed to enable users to quickly and easily contribute to this curation process. The system provides an intuitive user interface that supports a collaborative workflow where a community can work together in a distributed and incremental way. The combination of automated matching plus human curation has the potential to produce a superior quality of data, not currently achievable through traditional methods.




DCMI logo DCMI's work is supported, promoted and improved by « Member organizations » around the world:

The National Library of Finland The National Library of Korea The National Library Board Singapore
Shanghai Library Simmons College GSLIS (US) Information School of the University of Washington
SUB Goettingen Research Center for Knowledge Communities, Tsukuba University Infocom Corporation (Japan)
UNESP (Brazil) Universisty of Edinburgh

DCMI logo DCMI's annual meeting and conference addresses models, technologies and applications of metadata

Join logo
Become a DCMI