Strecker, Dorothea (2022). Quality of metadata describing research data and the influence of repository characteristics. Young Information Scientist 7, 13-27. https://doi.org/10.25365/yis-2022-7-2
Abstract
Objective — This article captures the status quo of metadata for research data, and identifies factors at the repository level that influence metadata quality.
Methods — Based on a joint analysis of DataCite metadata records and re3data repository descriptions, this paper evaluates the quality of metadata records describing research data and analyzes differences in metadata quality between repositories of different types and between repositories with or without formal certification to determine if these factors correlate with high metadata quality.
Results — Of individual metadata elements, mandatory elements are used most frequently, followed by recommended and optional elements. More than half of all metadata elements are used in less than 5 % of metadata records. With the exception of related identifiers, persistent identifiers are rarely used. The average descriptions has 487.3 characters. On average, 18.7 elements are used in metadata records, which corresponds to 24.7 % of the elements available. The homogeneity of metadata records varies considerably between repositories, on average, 50.9 % of metadata records use the same common set of metadata elements. The analysis revealed statistically significant differences across repositories of varying type and
certification status in the use of individual metadata elements, the comprehensiveness of descriptions, and the completeness of metadata records.
Conclusion — This paper presents a first systematic analysis of metadata quality for research data and the influence of repository characteristics on metadata quality. It discusses difficulties of using a generic metadata schema for describing diverse research data. The results show that some repositories appear to have established successful metadata practices and workflows, but some metadata elements remain underused. There is evidence of repository type and certification status affecting metadata quality, but more research is needed to identify specific factors.