This post has been cross-posted on the DataCite blog.
A critical piece of open data infrastructure that has received insufficient attention is the evaluation of data usage. We still lack a clear understanding and a body of evidence on how data are being accessed, utilized, and incorporated into research activities.
While interest in this topic is increasing, there has so far not been a dedicated event for discussions on the evaluation of data usage, and on the development of the data metrics required to support such evaluation across both research and government. To address this need, we hosted the Make Data Count Summit on 12-13 September 2023, as a forum to bring diverse stakeholders together to tackle nuanced issues about the importance of open data metrics. The event brought together over 120 attendees in Washington DC, including representatives from research institutions, funders and government, researchers, publishers, and infrastructure providers with the goal to hone in on actionable items for agencies and institutions to advance data metrics and the evaluation of data usage.
Strong Foundations for Data Metrics
The first day of the event included presentations from ongoing efforts toward data metrics. Daniella Lowenberg (University of California, Office of the President) provided an overview of the work of Make Data Count since 2014. The lessons learnt from the initiative’s work on developing standards and engaging the community have paved a renewed focus on supporting open infrastructure and on building evidence on data usage practices to continue to refine data metrics for diverse uses. Her takeaway: it’s time for a new focus in the open data world, and that focus is undivided attention to the development of open data metrics.
Matt Buys from DataCite and Carly Strasser from the Chan Zuckerberg Initiative provided an update on the ongoing collaborative project to build an Open Global Data Citation Corpus. This momentous project seeks to aggregate citations to data from a variety of sources, including citations from DataCite metadata as well as those from other sources, such as data mentions extracted from full-text articles through machine learning and identifiers from EMBL-EBI. The goal is that once completed, the corpus will provide data usage information to the community, openly, and at a scale not possible before.
Julia Lane from New York University presented her vision for data as a public asset and her work on the ‘Democratizing Data’ project, which has developed algorithms to identify mentions to data as part of full-text articles. This project is mining the content of articles in Scopus to surface mentions to data and visualize those through dashboards for stakeholders to explore.
In the panel discussion ‘Policy and Administrative Priorities for Data Metrics in the US’, panelists from different agencies discussed recent developments in the United States seeking to open data, such as the Evidence-act and last year’s OSTP memo. These policies have provided important impetus for not only opening up administrative and research data but also for agencies to consider what their priorities should be for understanding and evaluating the use of data that has been opened up.
Data Metrics Must Be Embedded Across the Ecosystem and Supported by Evidence
On the second day, Nancy Potok, CEO at NAPx Consulting and former Chief Statistician of the United States provided an overview on the foundations and lessons learnt from the five years since the US Evidence-act, which promoted the release of data to make it more accessible to the public, as well as the use of data to inform policy development.
The subsequent sessions explored the needs for data metrics across different areas of the research process, including funding agencies, institutions, and scholarly communications. Institutional processes for tenure and promotion were highlighted as an area of particular importance in order to drive awareness among researchers and adoption of data evaluation and data metrics.
We also heard the latest evidence on data usage practices and data citation from a panel of bibliometricians, who highlighted the discoverability of datasets and metadata completeness as areas of improvement. The panelists called for further research to inform meaningful data metrics so that we avoid the pitfalls of defaulting to oversimplified and opaque metrics.
Prioritizing Data Metrics Now
During the Summit, we invited attendees to share their experiences and suggestions in two breakout group discussions. While these discussions highlighted that data usage evaluation is a complex subject that will require many nuanced conversations across sectors, the message was also clear that we must iterate in incremental steps, and not let perfect be the enemy of good as we drive the conversations forward.
A few of the topics highlighted during the breakout conversations include:
- Data metrics are nuanced, we will need to provide clear information and resources for a wide range of stakeholders so that they have the information they need to join the conversation and also to lead it within their communities
- There is a need to raise awareness and engagement on data metrics across all institutional levels – from individual researchers and administrators to institutional leaders and program managers.
- Data metrics must be anchored on transparent information and built upon consistent practices, while also being mindful of domain-specific needs
We thank all the attendees for their engagement during the different sessions. It is clear that there is a shared interest in driving the evaluation of data usage and an understanding that we should collectively work towards meaningful evidence-based metrics. The Make Data Count initiative will be taking forward these conversations and we invite everyone interested to collaborate with us, as we all move forward to advance meaningful data metrics.
|You can access the slides for the talks presented at the Summit on Zenodo.|