» Full Papers & Project Reports
» Presentations
» Posters

DCMI logo Full Papers & Project Reports

Author Title & Abstract
Corine Deliot, Neil Wilson, Luca Costabello, & Pierre-Yves Vandenbussche OCS: 420
TITLE: The British National Bibliography: Who uses our Linked Data?
ABSTRACT: The British Library began publishing a Linked Open Data (LOD) version of the British National Bibliography (BNB) in 2011 as part of its open metadata strategy. The BNB SPARQL endpoint has continued to evolve since that point with: new content, links and regular monthly updates. While organisational benefits have been gained through staff familiarisation with, Linked Data principles, data modelling and format translation, it has been challenging to identify exactly how the Linked Data has been used and by whom? While system logs capture basic information and anecdotal usage may be reported via social media, conference events or help desk feedback, the lack of independent tools similar to web analytics has made it difficult to gain understanding of how the service is used in order to justify and target investment.

This paper describes a project between the British Library and Fujitsu Ireland that examined the insights that could be gained from the development and application of Linked Data analytics. The results indicate Linked Data analytics offers publishers benefits in several areas including organisational, service management, technical, and user support. Most importantly at a time of funding restrictions, application of Linked Data analytics offers publishers the ability to accurately assess the impact of their data in order to more effectively target their scarce resources. In doing so they can begin to manage LOD services as efficiently as their web equivalents and continue the realisation of Linked Data’s potential for the community.
Hannah Tarver, Oksana Zavalina, & Mark Phillips OCS: 428
TITLE: An Exploratory Study of the Description Field in the Digital Public Library of America
ABSTRACT: This paper presents results of an exploratory quantitative analysis regarding the application of a free-text Description metadata element and data values associated with this element. It uses a dataset containing over 11.6 million item-level metadata records from the Digital Public Library of America (DPLA), originating from a number of institutions that serve as DPLA's content or service hubs. This benchmark study provides empirical quantitative data about the Description fields and their data values at the hub level (e.g., minimum, maximum, and average number of description fields per record; number of records without free-text description fields; length of data values; etc.) and provides general analysis and discussion in relation to the findings.
Shigeo Sugimoto, Chunqiu Li, Mitsuharu Nagamori, & Jane Greenberg OCS: 430
TITLE: Permanence and Temporal Interoperability of Metadata in the Linked Open Data Environment
ABSTRACT: This paper discusses longevity issues of metadata in the Linked Open Data (LOD) environment, where metadata is transferred and shared on the open Web as a digital object. The fundamental issue of metadata permanence is to keep metadata interpretable by machines and humans over time. The goal of this discussion is to clarify risks in permanence of metadata and issues in the long-term management of metadata and metadata schemas in the LOD environment.

This paper discusses metadata longevity from a few different viewpoints in order to clarify the requirements of metadata permanence in the LOD environment, which are different from those in conventional document-like object environment or database-centric environment. Longevity of metadata is, in other words, the temporal interoperability of metadata. This paper uses the Metadata Application Profile methodology supported by the Dublin Core Metadata Initiative (DCMI) and DCMI's layered model of metadata interoperability to understand the nature of metadata in the LOD environment. Then, this paper discusses metadata longevity based on a set of facets of metadata entities such as metadata schemas. Finally, it briefly discusses issues to use provenance description of metadata schemas and metadata schema registries from the viewpoint of long-term maintenance of metadata schemas
Magnus Pfeffer OCS: 460
TITLE: Automatic Creation of Mappings between Classification Systems for Bibliographic Data
ABSTRACT: Classification systems are an important means to provide topic based access to large collections. They are utilized by a number of approaches for faceted browsing, graphical search support and lately also for collection visualisation and analysis. Most of these approaches have been developed with a specific classification system in mind and often exploit some of the inherent characteristics of the system. Collections that are indexed using local or special classification systems cannot benefit from the vast majority of innovative applications developed for the more commonly used classification systems. One way to alleviate this problem is the use of mappings between classification systems. Traditionally, these mappings have been created in a manual and time consuming process involving subject specialists.

In this paper, we discuss another approach to automatically create mappings between classification systems. The approach consists of three steps: First, bibliographic data from diverse sources that contain items classified by the required classification systems is aggregated in a single database. Next, a clustering algorithm is used to group individual issues and editions of the same work. The basic idea is that for classification purposes, there is no significant difference across editions and indexing information can thus be consolidated within the clusters. Finally, the clusters containing information from both required systems are added up to create a cooccurrence table. This information can be used to describe correlations between individual classes of the two classification systems and forms the basis of a full mapping between the two systems. First results from an application of this approach to data from German union catalogues and comparing the derived mappings to manually created ones are quite promising and show the potential of this idea.
Senan Kiryakos, Shigeo Sugimoto, Mitsuharu Nagamori, & Tetsuya Mihara OCS: 415
TITLE: Aggregating Metadata from Heterogeneous Pop Culture Resources on the Web
ABSTRACT: Japanese pop culture resources, such as manga, anime, and video games, have recently experienced an increase in both their consumption, and appreciation for their cultural significance. Traditionally seen as solely recreational resources, the level of bibliographic description by cultural heritage institutions has not kept up with the needs of users. In seeking to remedy this, we propose the aggregation of institutional data, and rich hobbyist data sourced from the web. Focusing on manga, a form of Japanese comic, this paper discusses classification and aggregation, with the goal of improving bibliographic description through the use of fan created data. Bibliographic metadata for manga was collected from the Japanese Agency for Cultural Affairs media arts database, along with several English language manga fan websites. The data was organized into classes to enable property matching across data providers, and then tested with existing ontologies and aggregation models, namely EDM, to determine their suitability in working with these unique resources. The results indicate that, particularly when seeking to incorporate granular minutiae from hobbyist resources, existing ontologies and aggregation models may be inadequate for some pop culture materials. The paper discusses these inadequacies and recommendations for addressing them in future work.
Susanne Al-Eryani & Stefanie Rühle OCS: 423
TITLE: Towards the Development of a Metadata Model for a Digital Cultural Heritage Collection with Focus on Provenance Information
ABSTRACT: This paper illustrates an approach to develop a metadata schema for the contextualization of heterogeneous objects from different collections with focus on provenance information. The here presented model shows the interlinking between the description of these objects and their digital representation as well as the interlinking between the description of these objects and digitized evidences, i.e. information about them. Thus, the model is presenting three levels of information: the description of different object types, the description of relations between objects as well as different facts concerning them, and the description of provenance information of objects stored in different collections and cultural heritage institutions.

Besides the methodological way of its creation, the model’s peculiarity is reflected in the modular design of micro-ontologies and the implementing of a combination of different metadata standards, with the aim to obtain the model’s reusability and to make data accessible as Linked Open Data.

DCMI logo Presentations

Author Title & Abstract
Steven Folsom & Jason Kovari OCS: 433
TITLE: Ontology Assessment and Extension : A Case Study on LD4L and BIBFRAME
ABSTRACT: Representatives of the Andrew W. Mellon funded Linked Data for Libraries - Labs and Linked Data for Production teams will discuss their assessment strategy and alignment progress between the BIBFRAME and LD4L ontologies, including semantic patterns and ontology reuse. Further, the talk will discuss the ontology extension work underway within the LD4P program, focusing on those directed by Cornell and Harvard Universities.
Rebecca Green OCS: 435
TITLE: Data-Driven Development of the Dewey Decimal Classification
ABSTRACT: Changes involved in maintaining the Dewey Decimal Classification (DDC), a general classification system, have derived in the past from many distinct sources. These include (but are not limited to) questions/ideas/complaints from end users, classifiers, translators, or members of the Decimal Classification Editorial Policy Committee (EPC); mappings of other knowledge organization systems to the DDC; and personal awareness of events, emerging issues, and trends. On the one hand, these phenomena may bring to light ambiguity or redundancy in the current system. On the other hand, they may bring to the attention of the editorial team new topics needing provision within the system.

Without disregarding these sources, the DDC editorial team is also considering data-driven methods of (1) identifying existing areas of the DDC warranting further development or (2) identifying topics with sufficient literary warrant to justify explicit inclusion in the DDC. The use of two sources of data is under investigation. The topics and schedule areas identified through these means require investigation to ascertain if they are viable candidates for further development. Preliminary work with these data sources reveals that the strategies hold promise.
Joseph A. Busch, Branka Kosovac, Katie Konrad, & Martin Svensson OCS: 436
TITLE: Save the Children Resource Libraries: Aligning Internal Technical Resource Libraries with a Public Distribution Website
ABSTRACT: Save the Children (STC) is an international NGO that promotes children's rights, provides relief and helps support children across the globe. With international headquarters in London, STC has 30 national members and supports local partners in over 100 countries worldwide. STC International maintains technical infrastructures that are available to members and local partners including SharePoint, Drupal and other information management applications. An effort to specify and implement a common resource library for curating and sharing internal technical resources has been underway since November 2015. This has included an inventory of existing (but heterogeneous) resource libraries on STC's work in the thematic area of Health and Nutrition, and agreement on a common metadata specification and some controlled vocabularies to be used going forward. This internal resource library has been aligned with STC's Resource Centre (resourcecentre.savethechildren.se), a public web-accessible library that hosts comprehensive, reliable and up-to-date information on STC's work in the thematic areas of Child Protection, Child Rights Governance and Child Poverty. The goal is to make it easy for content curators to identify items in the internal technical resource library, and to publish them to the public Resource Centre with a minimum transformation of metadata required. This presentation discusses how this project has reached consensus on how to accommodate and balance internal research and external communications requirements by developing a light-weight application profile.
Saho Yasumatsu, Akiko Hashizume, & Julie Fukuyama OCS: 437
TITLE: Japanese Metadata Standards "National Diet Library Dublin Core Metadata Description (DC-NDL)": Describing Japanese Metadata and Connecting Pieces of Data
ABSTRACT: The National Diet Library (NDL) is the sole national library in Japan. This poster mainly presents the National Diet Library Dublin Core Metadata Description (DC-NDL), which is a descriptive metadata standard utilized primarily for converting catalog records of publications held by the NDL into metadata based on the Dublin Core Metadata Element Set (DCMES) and the DCMI Metadata Terms. The key functions of the DC-NDL are the follows: (1) Representing the yomi (pronunciation), one of the characteristics of the Japanese language, (2) Connectivity with Linked Data especially for URI, and (3) Compatibility with digitized materials. Furthermore, we describe an example of implementing DC-NDL for use with NDL Search. We concluded by pointing out issues for future research on the DC-NDL.
Mariana Curado Malta, Elena Gonzalez-Blanco & Paloma Centenera OCS: 440
TITLE: POSTDATA -- Towards publishing European Poetry as Linked Open Data
ABSTRACT: POSTDATA is a 5 year's European Research Council (ERC) Starting Grant Project that started in May 2016 and is hosted by the Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain. The context of the project is the corpora of European Poetry (EP), with a special focus on poetic materials from different languages and literary traditions. POSTDATA aims to offer a standardized model in the philological field and a metadata application profile (MAP) for EP in order to build a common classification of all these poetic materials. The information of Spanish, Italian and French repertoires will be published in the Linked Open Data (LOD) ecosystem. Later we expect to extend the model to include additional corpora.

The final goal of the POSTDATA project is: (1) to be able to publish all the data locked in the WIS, in LOD, where any agent interested will be able to build applications over the data in order to serve final users; (2) to build a Web platform where: (a) researchers, students and other final users interested in EP will be able to access poems (and their analyses) of all databases; (b) researchers, students and other final users will be able to upload poems, the digitalized images of manuscripts, and fill in the information concerning the analysis of the poem, collaboratively contributing to a LOD dataset of poetry.
Maria Esteva & Ramona Walls OCS: 442
TITLE: Identifier Services: Tracking Objects and Metadata Across Time and Distributed Storage Systems
ABSTRACT: This presentation describes research around Identifier Services (IDS). IDS is designed to bind dispersed data objects and verify aspects of their identity and integrity, independent of where the data are located and whether they are duplicate, partial, private, published, active, or static. IDS will allow individuals and repositories to manage, track, and preserve different types of identifiers and their associated data and metadata. The IDS data model which focuses on research processes and the relationship between their data inputs and outputs will significantly improve provenance metadata of distributed collections at any point of their lifecycle.
Richard Wallis OCS: 456
TITLE: Cognitive and Contextual Computing—Laying A Global Data Foundation
ABSTRACT: A search of the current computing and technology zeitgeist will not have to look far before stumbling upon references to Cognitive Computing, Contextual Computing, Conversational Search, the Internet of Things, and other such buzz-words and phrases. The marketeers are having a great time coming up with futuristic visions supporting the view of computing becoming all pervasive and 'intelligent'. From IBM's Watson beating human quiz show contestants, to the arms race between the leading voice-controlled virtual assistants–Siri, Cortana, Google Now, Amazon Alexa.

All exciting and interesting, but what relevance has this for DCMI, metadata standards, and the resources we describe using them? In a word, "context". No matter how intelligent and human-like a computer is, it's capabilities are only as good as the information it has to work with. If that information is constrained by domain, industry specialised vocabularies, or a lack of references to external sources; it is unlikely the results will be generally useful. In the DCMI community we have expertise in sharing information within our organisations and on the web. Dublin Core being one of the first widely adopted generic vocabularies. A path that Schema.org is following and in its breadth of adoption is now exceeding.

From his wide experience working with Google, OCLC, European and National Libraries, the banking industry and others, Richard will explore new initiatives and the processes being undertaken to prepare and widely share data in a generally consumable way on the web. Schema.org has been a significant success. Used by over 12 million domains, on over a quarter of sampled pages. It is enabling a quiet revolution of preparing and sharing data to be harvested into search engine Knowledge Graphs. Knowledge Graphs that power Rich Snippets, Knowledge Panels, Answer Boxes, and other search engine enhancements. Whilst delivering on one revolution, it is helping to lay the foundations of another. Building a global web of interconnected entities, for intelligent agents to navigate, these Knowledge Graphs fed by the information we are starting to share generically, are providing the context that will enable Cognitive, Contextual and associated technologies scale globally. Ushering in yet another new technology era.
Hannah Tarver & Mark Phillips OCS: 458
TITLE: Identifier Usage and Maintenance in the UNT Libraries' Digital Collections
ABSTRACT: At the University of North Texas (UNT) Libraries we work with a large number of identifiers in relation to our Digital Collections (The Portal to Texas History, the UNT Digital Library, and the Gateway to Oklahoma History). Since our Digital Collections comprise items from other library and campus departments, as well as a large number of cultural heritage institutions across Texas and Oklahoma, many of the materials have assigned identifiers that are important to the group that owns the physical materials. We document any relevant identifiers in each item's metadata record, whether they belong to an international or established standard (e.g., ISSNs or call numbers) or have a specific context (e.g., agency-assigned report numbers). Most discrete collections have partner-assigned identifiers that range from established accession numbers to sequentially-assigned numbers; these identifiers allow for a connection between a digital item in the public interface, copies of the associated digital files, and the physical object. To ensure that identifiers are unique within the Digital Collections, we routinely add codes that identify the partner institution at the front of each identifier, separated with an underscore (e.g., GEPFP_62-1). This makes it relatively easy to distinguish the original identifier from the code that we have added, but also prevents the inclusion of several hundred items identified as "0005" if a user wants to use an identifier to search for a particular object.

Internally, our digital infrastructure uses ARK (Archival Resource Key) identifiers to track and connect archival copies of files stored in our Coda repository with web-derivative copies in our Aubrey access system. We also currently use PURLs (Permanent Uniform Resource Locators) to identify and manage controlled vocabulary terms. For name authority, we create local authority records that act similarly to item records in terms of identifiers: each record has a system-unique identifier that generates a stable URL, but contains a field to include alternate established identifiers (e.g., ISNIs, VIAF record numbers, ORCIDs, etc.) that also refer to the entity, when applicable.

This presentation would discuss some of the complexities inherent in managing both locally-created and externally-assigned identifiers, why we use different types of identifiers throughout our infrastructure, and the implementation of various identifiers in our Digital Collections.
Thomas Baker, Caterina Caracciolo, Anton Doroszenko, Lori Finch, Osma Suominen, & Sujata Suri OCS: 459
TITLE: The Global Agricultural Concept Scheme and Agrisemantics
ABSTRACT: This presentation discusses development of Global Agricultural Concept Scheme (GACS) in which key concepts from three thesauri about agriculture and nutrition—AGROVOC, CAB Thesaurus, and NAL Thesaurus—have been merged. The respective partner organizations—Food and Agriculture Organization of the UN (FAO), CAB International (CABI), and the USDA National Agricultural Library (NAL)—undertook this initiative in 2013 with the goal of facilitating search across databases, improving the semantic reach of their databases by supporting queries that freely draw on terms from any mapped thesaurus, and achieving economies of scale from joint maintenance. The GACS beta release of May 2016 has 15,000 concepts and over 350,000 terms in 28 languages.

GACS is seen as a first step for Agrisemantics, an emerging community network of semantic assets relevant to agriculture and food security. Within Agrisemantics, the general-purpose, search-oriented concepts of GACS are intended to serve as a hub for concepts defined, with more precision, in a multitude of ontologies modeled for specific domains. Ontologies, in turn, are intended to provide global identity to concepts used in a vast diversity of quantitative datasets, such as sensor readings and crop yields, defined for a multitude of software applications and serialization formats.

Agrisemantics aims at improving the discoverability and semantic interoperability of agricultural information and data for the benefit of researchers, policy-makers, and farmers with the ultimate goal of enabling innovative responses to the challenges of food security under conditions of climate change. Achieving these goals will require innovation in processes for the cooperative maintenance of linked semantic assets in the modern Web environment.

DCMI logo Lightning Presentations (+ Posters)

Author Title & Abstract
Yi-Yun Cheng & Hsueh-Hua Chen OCS: 439
TITLE: A Study on the Best Practice for Constructing a Cross-lingual Ontology
ABSTRACT: Ontologies, as the fundamental building blocks for the Semantic Web, are the highest-level classification scheme in the family of Knowledge Organization Systems (KOS). With the emergence of big data, ontologies are one of the keys to unraveling the information explosion problems. Under the big data situation, many language cultures are in a pressing need to construct ontologies. Cross-lingual ontology research has thus become a pivotal concern in this global age. Researchers worldwide try to be interoperable with ontologies written not only in English, but also in other languages. Yet, constructing a cross-lingual ontology can be difficult, and a detailed mapping method is often hard to find. The purpose of this study is to establish a feasible practice on building cross-lingual ontologies. The study will focus on the construction of an English-Chinese ontology from an existing source ontology and a KOS source. This study will also address the synonymy and polysemy problems of the target language (Traditional Chinese).

By adopting a three-phase research design, this study begins with the pretest of our mapping practice on a small ontology–W3C's Semantic Sensor Network Ontology (SSN ontology). This phase is to ensure our SPARQL code to parse all the classes in SSN ontology is feasible. In phase two, we try to map our source ontology–the Semantic Web for Earth and Environmental Terminology (SWEET) ontologies—with the KOS term-lists from National Academy of Educational Research (NAER) in Taiwan. In phase three, we model the mapped English/Chinese ontology in Protégé software to explore the prospect of this method.

The results in phase one shows that our SPARQL code can automatically helped us retrieve all 117 classes in SSN ontology into plain text format in a click, suggesting that our practice is a workable one. In phase two and three, a cross-lingual ontology between English and Traditional Chinese is constructed through the implementation of Protégé. The mapping results between the 3,770 SWEET ontologies classes (in English) and the NAER term-lists (in Traditional Chinese) reveal an accuracy of 80.66% on the exact-match terms, while the Chinese synonyms and related terms expressed by SKOS labels are all proven searchable in our primary evaluation. These promising results demonstrate the feasibility of the practice proposed by this study, and further suggest that such approach is suitable to be adopted by future researchers to model their cross-lingual ontologies.
Etienne Taffoureau & Christelle Loiselet OCS: 441
TITLE: Metadata for 3D Geological Models: Definition and Implementation
ABSTRACT: BRGM (the French geological survey) is France's reference public institution for Earth Science applications which works on management and delivering geosciences data to be used for helps to decision-making for spatial planning, mineral prospecting, groundwater prospecting and protection, pollution control, natural risk prevention and the characterization of local areas. Some of this data are produced from 3D geological modelling which is now a classical tool to better constrain geometries of complex geological systems and provide a continuous description of the subsurface out of sparse and indirect data. In order to store and deliver geological model production at BRGM, we developed a programming interface distinguishing the storage of the model from the representation of the model: models are stored using native format of the tool used to generate with (software project files). This choice guarantees that there is neither loss of data nor loss of precision. Then, model discretization (e.g. meshes) are generated on demand, depending on representation purposes (1, 2 or 3D gridding). Geological organization works on geomodel management and their representation for delivering and disseminating 3D geological information.

Therefore, it needs to reference and archive geo models and/or representation to access and to deliver information related to. We propose to define a metadata profile compliant with INSPIRE Directive to describe 3D geological models and their representation. The profile is implemented using the ISO 19115/19139 standard (used for geographic data) (1) to allow web application to edit and to manage data with GeoSource/GeoNetwork application; (2) to ensure interoperability in the delivery. 3D geomodel metadata are indexed by a search engine and displayed in a geoscientific portal such as Infoterre (http://infoterre.brgm.fr/viewer). Our approach allows to call the programming interface which queries 3D geological model and retrieves all the topological information from the model to be represented and stored or visualized by using OGC standards.
Karen Andree & Reem Weda OCS: 444
TITLE: The Dutch Art & Architecture Thesaurus® put into practice: the example of Anet, Antwerp
ABSTRACT: Anet is a network of scientific libraries located in Antwerp, Belgium. Among the connected institutions are research, higher education and museum libraries. They share common software (Brocade, developed at the Antwerp University since 1998) and cataloging practices. In 2014 was decided to adopt a new subject heading system for cataloging the library collections with an art or heritage scope. The Art & Architecture Thesaurus® (maintained by the Getty Research Institute) was eventually selected, under the express condition that it can be used in a flexible way by the libraries. This includes, if needed, the usage of non-AAT compatible subject terms.

AAT was chosen because of software compatibility, extensiveness of its content and multilingualism. The thesaurus is being fully translated into Spanish, Dutch, German, Chinese, and partly in other languages, such as Italian and French. The local subject heading systems (terminologies) were converted to the new authority environment (Anet-AAT). Automatic mapping via tools was considered. However, manual mapping was performed because of the different application as subject heading system and the opportunity to acquaint the librarians with AAT.

Future challenges for the Anet-AAT vocabulary consist of staying updated with changes that occur in the ‘Mother AAT’ (Getty Vocabularies) and adding to its content to create more library specific subjects - AAT is presently quite focused on the description of (museum) objects. But, since Anet is using AAT, it's been noticed that the content is quite well suited for indexing the special libraries. Nevertheless, the usage by the network did bring to light issues in the structure of AAT, particularly some concerning the Dutch translation. The necessity to address these issues has resulted in regular contacts between Anet and the RKD-Netherlands Institute for Art History that manages the Dutch translation of the AAT.

The AAT has a long history of development. The original AAT by Getty already started in the late seventies in the United States. The RKD manages the Dutch AAT, or 'AAT-Ned' since the mid-nineties. Work on the expansion and improvement of the Thesaurus is an ongoing process. Being a 'living terminology', this has impact on the usage as a standard by others. The publication of the Getty Vocabularies as Linked Open Data only made this more apparent. Together with user communities such as Anet the RKD tries to adapt the content of the AAT for the better. Particularly concerning the Dutch translation, but it also tackles other issues concerning scope notes (definitions), or hierarchical relations that are not compatible with the views of the Dutch-speaking heritage community. Because of the scope and size of the content of AAT The RKD cannot discover all the issues by itself, and needs the input of users from the heritage community to improve the system. In this manner, Anet contributes to the improvement of the Dutch translation as well as the ‘mother- AAT’. The adaptation of AAT by Anet has proven to be a promising showcase for the potential of this ‘museum thesaurus’ as a subject heading system for libraries as well.
Richard Smiraglia & Hyoungjoo Park OCS: 447
TITLE: Using Korean Open Government Data for Data Curation and Data Integration
ABSTRACT: This presentation addresses cultural heritage data-sharing practices through the use of Republic of Korea open government data for data-curation and data integration. Data curation enables data-sharing throughout the data management life cycle to create new value for new user needs. Our research employed a visualization phase, in which we used domain analytical techniques to better understand the contents of the population of 375 library-related open government cultural heritage data available at the Korean Open Government Website (http://data.go.kr/). Researchers translated all records from Korean to English. Data were in unstructured and in heterogeneous formats such as file formats, data formats and or web addresses. For data curation and integration, we employed the meta-level ontology known as the CIDOC-CRM, which we applied qualitatively to small sets of carefully selected records. To map instantiation of records, which is required for data integration, we used FRBRoo (Functional Requirements for Bibliographic Records—object oriented), an extension of the CIDOC CRM, to map the instantiation of data records in a typical data-sharing scenario. Then, equivalent mapping processes were comparatively tested with visualizations to demonstrate the effective harmonization between the CIDOC CRM and FRBRoo, which enables the integration of metadata and data curation from unstructured and heterogeneous formats. This presentation may contribute to the cross- or meta-institutional integration of curation across institutional boundaries in cultural heritage.
Mairelys Lemus-Rojas OCS: 451
TITLE: Remixing Archival Metadata Project (RAMP) 2.0: Recent Developments and Analysis of Wikipedia Referrals
ABSTRACT: This presentation will cover an analysis of referrals from all Wikipedia pages created using the Remixing Archival Metadata Project (RAMP) tool. It will also feature a demo of the tool, and will highlight some of the recent developments, which include a major overhaul of the interface, more secure Wikipedia log in, easy upload capabilities, and an effective and convenient installation process. With this recent development, we are providing the library community with a tool that is easy to use and install and that offers a convenient way to share data with other communities on a global scale.
Li Yuan & Wei Fan OCS: 452
TITLE: A Survey of Metadata Use for Publishing Open Government Data in China
ABSTRACT: Open government data (OGD) is one important type of open data which grows fast all round the world. Many governments and organizations have already put their data online to the public. At the same time, linked data which conducted by W3C provides the publish mechanism and technical recommendation to explore the linkage of open data. Linked data promote the openness and availability of open data. Currently, 1,443 government related datasets are retrieved from datahub.io (2016-7-1).

From document to dataset, metadata still plays key functions for describing, locating and managing OGD. Although most OGD has some basic categories, tags and properties, more comprehensive metadata vocabularies need to further study. Utilizing metadata to achieve high quality, findable, machine readable and understandable OGD is the fundamental task for government chief data officers.

In recent 5 years, there are some remarkable development of OGD in China. National Bureau of Statistics of China has built national data portal for publishing monthly, quarterly and annual data, as well as the regional data and census data, which has nearly 8 million data. Beijing, Shanghai, Wuhan and Guiyang cities provide public data service. Zhejiang province integrates the public data category. In future, China national open government data portal will be established in 2018 from Promote the development of Big Data Platform for Action (2015).

This presentation will report the state of metadata use of OGD in China. We investigate 8 typical cases of China OGD which includes three levels (nation, province and city). It contains three parts as following:
  1. Analyze the actual usage of metadata elements for datasets and data entry in these selected OGD portals and point out some usage issues;
  2. Discuss the adaption of existed metadata standards (DCAT, Schema.org, GILS and etc.) for China OGD and propose a metadata vocabulary for China OGD.
  3. Comparatively analyze data share and application of OGD between US and China by metadata interoperability.
Shu-Jiun Chen OCS: 453
TITLE: A Pilot Study on Linked Open Data in Cultural Heritage: A Use Case of the Taiwan Digital Archives Union Catalogue
ABSTRACT: The Taiwan Digital Archives Union Catalogue (http://catalog.digitalarchives.tw/), with more than 5 million digitized objects described with Dublin Core-based metadata, comes from the Taiwan E-Learning and Digital Archives Program (TELDAP) which was built on a national scale over the past 15 years. Academia Sinica Center for Digital Cultures (ASCDC)(http://ascdc.sinica.edu.tw/en/) is now in charge of the sustainable operation. The presentation aims to report how we adopt Lined Open Data (LOD) approach to publish these structure data, in order to make metadata and the digitalized objects get connected with related resources in the world.

The Taiwan Digital Archives, similar to the Europeana, has collected digitized collections from more than 100 libraries, archives, museums, academic institutions, and government agencies, such as the National Central Library, Academia Historica and National Palace Museum. The collection includes books, newspapers, artworks, photos, specimen and sounds. Most of the metadata descriptions and contents are in Chinese and are Asian culture oriented. In the LOD initiative, 850 thousand records with Creative Commons licensing have been selected as experimental pilot since January 2016.

The presentation will report 72 collections across 16 categories such as biodiversity, photos, architecture, anthropology, rare books, Buddhist texts and paintings, discussing the LOD design methods, issues, outcomes of the preliminary results and lessons learned, which covers the data model, cleaning for data quality, reconciling, publishing and applications. In addition, the different ways of LOD applications will also be demonstrated including online exhibitions and the reuse in digital humanities researches.
Etienne Taffoureau OCS: 455
TITLE: Metadata on Biodiversity: Definition and Implementation
ABSTRACT: SINP (Information system on nature and landscape) and ECOSCOPE (Observation for research on biodiversity data hub) are two distinct scientific infrastructures on biodiversity relying on different data sources and producers. Their main objective is to document and share information on biodiversity in France. INPN (https://inpn.mnhn.fr) is the reference information system for data related to nature. It manages and disseminates the reference data of the "geodiversity and biodiversity" part of the SINP, and deliver the metadata and data to GBIF (Global Biodiversity Information Facility). For SINP and Ecoscope projects, working groups composed of scientific organisations have defined two compliant metadata profiles, also compliant with INSPIRE Directive, to describe data on this thematic. These profiles are implemented using existing metadata standards: ISO 19115/19139 (for geographic metadata) for SINP and EML (Ecological Metadata Language) and ISO 19115/19139 for ECOSCOPE. A mapping has also been processed between the two profiles, as well as several thesaurus for keywords and a classification system for taxonomic identification are used, so as to ensure interoperability between systems. The profiles are implemented in web applications for editing and managing data (GeoSource/GeoNetwork for SINP and an ad hoc application for ECOSCOPE). These applications allow the harvesting of metadata using OGC/CSW (Catalog Service for the Web) standard.

Next steps will be to increase metadata visibility through the automatization of web-services.

DCMI logo Posters

Author Title & Abstract
Miika Alonen & Suvi Remes OCS: 418
TITLE: Interoperability Workbench—Collaborative Tool for Publishing Core Vocabularies and Application Profiles [Peer Reviewed]
ABSTRACT: Interoperability workbench is a collaborative data modeling tool for creating and publishing Core vocabularies and Application profiles. Workbench is based on the interoperability framework that integrates workflows for managing controlled vocabularies, metadata models and reference data. Presented framework and implemented tool provides a single tool for creating and publishing core vocabularies and domain specific application profiles.
Marina Morgan & Naomi Eichenlaub OCS: 426
TITLE: Content Management Systems: Open Source or Not Open Source? [Peer Reviewed]
ABSTRACT: The objective of this poster is to provide an overview of a number of existing open source and proprietary content management systems for digital collections. We hope that this work will assist libraries and other institutions in their process of researching and decision-making when considering implementing a content management system for their digital collections.
Anna Rovella, Nicola Ielpo & Assunta Caruso OCS: 427
TITLE: Using DC Metadata in preservation content: the case of the Italian "Protocollo Informatico" [Peer Reviewed]
ABSTRACT: Digital preservation of administrative records is an essential requirement within public organizations. The purpose of this poster is to present a metadata element model to support a coherent Submission Information Package (SIP) from a Records Management System to an Open Archival Information System (OAIS).
Icholas M. Weber & Andrea K Thomer OCS: 431
TITLE: Modeling Cultural Evolution with Metadata Collections [Peer Reviewed]
ABSTRACT: Descriptive metadata is typically used to record information about digital artifacts. Collections of descriptive metadata records can also be used to study institutional attributes of the cultures that produce, use, and cooperate in provisioning digital artifacts. In this poster we describe an approach to modeling cultural evolution, phylomemetic analysis, using collections of metadata records that describe digital artifacts.
Clément Arsenault & Elaine Ménard OCS: 434
TITLE: Dolmen: A Linked Open Data Model to Enhance Museum Object Descriptions [Peer Reviewed]
ABSTRACT: This paper presents the DOLMEN project (Linked Open Data: Museums and Digital Environment), offering to develop a linked open data model that will allow Canadian museums to disseminate the rich and sophisticated content emanating from their various databases and to, in turn, make their cultural and heritage collections more accessible to future generations. The rationale, specific objectives, proposed methodology and expected benefits are briefly presented and explained.
Paul Walk OCS: 414
TITLE: How to Develop a Metadata Profile with Agility
ABSTRACT: This poster gives an outline of the processes, borrowed from agile software development, used to develop the RIOXX metadata application profile.
Wei Fan & Feng Yang OCS: 443
TITLE: A Component Service for Developing Metadata Application Profiles
ABSTRACT: Metadata application profiles (AP) are built on existed metadata elements to accommodate specialized user requirements with data resource type and user scenario. The elements of AP usually derived from more than one metadata schema. Additionally, some new elements are created for local application. Traditional metadata registry services provide the functions necessary to identify and find the metadata elements, but do not effectively support the reusability of metadata schemas. Thus, a bridge is needed to explore the usability with method and technology in building metadata AP process.

How to use these vocabularies to build specialized metadata AP is a practical challenge. In this work, we take metadata elements as component and utilize related linked data technology to automatically generate AP schema followed by user interactive selection and assemble. We develop a AP prototype system to guide user to select, reuse and inherit metadata elements from existed metadata schemas on the front end and automatic generate RDF schema as AP delivery on the back end.

In DC 2016 conference, we will report the progressive work and demo of metadata design process with prototype system.
David W. Talley, Abigail Evans, Joseph Chapmann & Michael D. Crandall OCS: 457
TITLE: Loosely Coupled Metadata Repositories for Discoverability of Linked Data Learning Resources
ABSTRACT: This poster reports on the technical architecture underpinning a relatively complex metadata structure that supports the IMLS-funded Linked Data for Professional Education (LD4PE) project. The project aims to develop a competency-based prototype referatory of learning resources for teaching and learning Linked Data practices in design, implementation, and management. Implementation involves a number of free-standing repositories of descriptive metadata relevant to learning resources, some with their own data-creation and –management interfaces and tools, which combine to empower a searchable and browseable platform for learning resource discovery built from easily implemented or adapted components within a WordPress environment.
Deborah A. Garwood OCS: 449
TITLE: Exploring the Schema.org "Movie" Standard Metadata for Documentary and Independent Films
ABSTRACT: This poster reports on an exploration of Schema.org's "Movie" standard for creating metadata for this medium, with special emphasis on documentary and independent films. The Schema.org microformat "Movie" is found on the website http://schema.org/Movie. The standard was tested with 4 movies, selected on the basis of programming at Film Forum, New York, NY, during the week of May 23, 2016. We found that semantic interoperability and low requirements for authorities are overall challenges. With 86 properties in total, 68 comprise "Properties of Creative Work", and ten are "Properties of Thing". The remaining eight are customized for "Movie", and include Actor, Country of Origin, Director, Duration, Music By, Production Company, Subtitle Language, and Trailer. The recommended authority standards include date, duration, and language. We suggest that Schema.org's "Movie" standard appears to accommodate commercial aspects of films, while metadata for documentary and independent films could be improved. The exploration leads to some recommendations that may improve Schema.org's "Movie" profile.

DCMI logo DCMI's work is supported, promoted and improved by « Member organizations » around the world:

The National Library of Finland The National Library of Korea The National Library Board Singapore
Shanghai Library Simmons College GSLIS (US) Information School of the University of Washington
SUB Goettingen Research Center for Knowledge Communities, Tsukuba University Infocom Corporation (Japan)
UNESP (Brazil) Universisty of Edinburgh ZBW (Germany)

DCMI logo DCMI's annual meeting and conference addresses models, technologies and applications of metadata

Join logo
Become a DCMI