Leveraging Standards to Turn Data to Capabilities in Agriculture
CGIAR is a global research partnership of 15 Centers primarily located in developing countries, working in the agricultural research for development sector. Research at these Centers is focused on poverty reduction, enhancing food and nutrition security, and improving natural resource management to address key development challenges. It is conducted in close collaboration with local partner entities, including national and regional research institutes, civil society organizations, academia, development organizations, and the private sector. Thus, the CGIAR system is charged with tackling challenges at a variety of scales from the local to the global; however, research outputs are however often not easily discoverable and research data often resides on individual laptops, not being well annotated or stored to be accessible and usable by the wider scientific community.
Innovating in this space and enhancing research impact increasingly depends upon enabling the discovery of, unrestricted access to, and effective reuse of the publications and data generated as primary research outputs by Center scientists. Accelerating innovation and impact to effectively address global agricultural challenges also requires that data be easily aggregated and integrated, which in turn necessitates interoperability. In this context, “open” is inadequate, and the concept of FAIR (Findable, Accessible, Interoperable, Reusable) has proven more useful. CGIAR Centers have made strong progress implementing publication and data repositories that meet minimum interoperability standards; however, work is still needed to enable consistent and seamless information discovery, integration, and interoperability across outputs. For datasets, this generally means annotation using standards such as controlled vocabularies and ontologies.
The Centers are therefore working to create an enabling environment to enhance access to research outputs, propelled by funder requirements and a system-wide Open Access and Data Management Policy implemented in 2013 (CGIAR, 2013). Guidance and the impetus for operationalization is being provided via the CGIAR Big Data Platform for Agriculture, and its Global Agricultural Research Data Innovation and Acceleration Network (GARDIAN). GARDIAN is intended to provide seamless, semantically-linked access to CGIAR publications and data, to demonstrate the full value of CGIAR research, enable new analyses and discovery, and enhance impact.
There are several areas in which standards and harmonized approaches are being leverages to achieve FAIRness at CGIAR, some of which are outlined below:
Data sourcing, handling. Research at CGIAR Centers focuses on different commodities, agro-ecologies, disciplinary domains, geographies and scales, resulting in varied data streams—some born digital, often characterized by large size and speed of generation, and frequent updates. Data ranges from agronomic trial data collected by field technicians in a variety of ways and formats, through input and output market information and socioeconomic data on technology adoption and enabling drivers, to weather data and high-throughput sequencing and phenotypic information and satellite images. These datasets cannot all be treated in the same manner; the curation and quality control needs differ significantly, for instance—necessitating somewhat customized approaches depending on the data type. Yet, to address key challenges, they must be discoverable, downloadable, reusable, and able to be aggregated where relevant. As a first step towards these goals, Centers have agreed on and mapped repository schemas to a common Dublin Core based set of required metadata elements (the CG Core Metadata Schema v.1.0).
Enhancing interoperability. Interoperability is critical to providing meaning and context to CGIAR’s varied data streams and enabling integration between linked content types (e.g., related data and publications) and across related data types (e.g. an agronomic data set and related socioeconomic data). CGIAR’s approach to interoperability and data harmonization focuses on the use of standard vocabularies (AGROVOC/GACS), and strong reliance on ontologies developed across CGIAR (efforts such as the Crop Ontology, the Agronomy Ontology – AgrO, the in-development socioeconomic ontology – SociO), and other entities (ENVO, UO, PO etc.)
Discovery framework. Recognizing the need to democratize agricultural research information and make it accessible to partners – particularly those in developing countries – CGIAR’s aspirations focus on enabling data discovery, integration, and analysis via an online, semantically-enables infrastructure. This tool, built under the auspices of the Big Data platform, harvests metadata from CGIAR Center repositories, and includes the ability to relatively seamlessly leverage it with existing and new analytical and mapping tools. While there is no blueprint for building such an ecosystem in the agriculture domain, there are successful models to learn and draw from. Of particular interest are the functionalities demonstrated by the biomedical community via the National Center for Biotechnology Information (NCBI) suite of databases and tools, with attendant innovations for translational medicine and human health. CGIAR efforts to enable similar functionalities to NCBI’s are underlain by strong and enduring stakeholder engagement and capacity building.
Harmonizing data privacy and security approaches as appropriate. Concern regarding data privacy and security is becoming increasingly significant with recent breaches of individual privacy and the GDPR. Any CGIAR repositories and harvesters of data need to provide assurance of data anonymity with respect to personally identifiable information, yet this presents a conundrum when spatial information is so integral to the ability to provide locally actionable options to farming communities. Related to these issues is the concern around ethics, particularly with respect to surveys. The Big Data Platform is therefore focusing on facilitating the creation of and continued support for Institutional Review Boards (IRBs) or their equivalent at Centers, including via guidelines on ethical data collection and handling. Lastly, whether agricultural data is closed or open, it needs to be securely held in the face of such threats as hacking and unanticipated loss.
It is important to recognize that without incentives and a culture that encourages and rewards best practices in managing research outputs, technical attempts to promote the use of standards and enable FAIR resources will meet with limited success, at best. Among some factors influencing these goals: Clarity on incentives (e.g., from funding agency incentives to data contributors understanding the benefits of sharing data) and easy processes, workflows and tools to make data FAIR, with continued support for stakeholders. Researchers need to be accountable for making their outputs FAIR (e.g., through contractual obligation, annual performance evaluation and recognition, funder policies etc.) Only through a multi-faceted approach that recognizes and addresses systemic and individual constraints in both the cultural and technical domain will CGIAR succeed in leveraging its research outputs to fuel innovation and impact, and transform agricultural research for development.
Medha Devare is Senior Research Fellow with the International Food Policy Research Institute (IFPRI) and leads its Big Data Platform efforts to organize data across the CGIAR System’s 15 Centers. She has led CGIAR food security projects in South Asia, and its Open Access/Open Data Initiative. Medha also has expertise in data management and semantic web tools; while at Cornell University she was instrumental in the development of VIVO, a semantic web application for representing scholarship.