New EOSC-Life publication: An iterative and interdisciplinary categorisation process towards FAIRer digital resources for sensitive life-sciences data
A succession of global challenges (e.g., climate change, epidemics, loss of biodiversity, resource scarcity, economic dislocation) has underlined the need to pool together data and digital resources from the life sciences. However, a high proportion of the data generated by life sciences research is sensitive, and this in many cases serves as an argument against data sharing and reuse. Cross-domain categorisation and discovery of digital resources related to sensitive data present major interoperability challenges. Even if the creation of vocabularies, encouraged since 2016 by the publication of the FAIR principles (Findability, Accessibility, Interoperability, Reusability), is exponential, as demonstrated by the creation of general catalogues (e.g. Schema.org) or disciplinary ontology portals (e.g. Bioportal), and facilitated by free and well recognized tools (e.g. Protégé), the subject of sensitive data is poorly treated, its cross-disciplinary nature and dependence on national legislation makes it more difficult to understand. Sensitive data can include personal data, environmental data, proprietary data, Dual Use Research of Concern (DURC) data and classified information.
To support the FAIRification of sensitive data resources, a toolbox demonstrator aiming at improving the discovery of digital objects related to sensitive data (e.g., regulations, guidelines, best practice, tools) has been developed. Searching the toolbox makes use of a categorisation system developed and harmonised across a cluster of life sciences research infrastructures (ECRIN, BBMRI-ERIC, EATRIS, EMBRC, ERINHA, Euro-BioImaging) in the framework of the EOSC-Life project. The objective of the toolbox is to facilitate the work of researchers who want to share or re-use sensitive data.
The toolbox demonstrator does not contain de novo information; instead, it helps scientists to navigate through previously collected, best quality content, available throughout the EOSC-Life collective infrastructure landscape. Its categorisation system was developed through an iterative process including a careful evaluation at each stage by senior experts. Three different categorisation system versions were built, tested by subsequent pilot studies, finally leading to a system with 7 main categories: sensitive data type, resource type, research field, data type, stage in data sharing life cycle, geographical scope, specific topics. 109 sensitive data resources labelled with these tags were used as the initial content for the toolbox demonstrator, a software tool allowing the searching of digital objects linked to sensitive data with filtering based upon the categorisation system. The toolbox demonstrator in beta version and its 109 resources are available here: https://tsdo.ecrin-rms.org/.
The next steps in the process of the toolbox development will focus on a broad evaluation of the usability and user-friendliness of the portal beyond the EOSC-Life partners, extension of the content with more resources, broader adoption by different life science communities, and a long-term vision for maintenance and sustainability.
Read in more details about the toolbox development and the iterative process of agreement on its categorisation system in the Scientific Reports publication: https://www.nature.com/articles/s41598-022-25278-z