Make Data Count: Building a System to Support Recognition of Data as a First Class Research Output

The Alfred P. Sloan Foundation has made a 2-year, $747K award to the California Digital Library, DataCite and DataONE to support collection of usage and citation metrics for data objects. Building on pilot work, this award will result in the launch of a new service that will collate and expose data level metrics.

The impact of research has traditionally been measured by citations to journal publications: journal articles are the currency of scholarly research.  However, scholarly research is made up of a much larger and richer set of outputs beyond traditional publications, including research data. In order to track and report the reach of research data, methods for collecting metrics on complex research data are needed.  In this way, data can receive the same credit and recognition that is assigned to journal articles.

Recognition of data as valuable output from the research process is increasing and this project will greatly enhance awareness around the value of data and enable researchers to gain credit for the creation and publication of data” – Ed Pentz, Crossref.

This project will work with the community to create a clear set of guidelines on how to define data usage. In addition, the project will develop a central hub for the collection of data level metrics. These metrics will include data views, downloads, citations, saves, social media mentions, and will be exposed through customized user interfaces deployed at partner organizations. Working in an open source environment, and including extensive user experience testing and community engagement, the products of this project will be available to data repositories, libraries and other organizations to deploy within their own environment, serving their communities of data authors.

Are you working in the data metrics space? Let’s collaborate.

Find out more and follow us at:, @makedatacount

About the Partners

California Digital Library was founded by the University of California in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. University of California Curation Center (UC3), one of four main programs within the CDL, helps researchers and the UC libraries manage, preserve, and provide access to their important digital assets as well as developing tools and services that serve the community throughout the research and data life cycles.

DataCite is a leading global non-profit organization that provides persistent identifiers (DOIs) for research data. Our goal is to help the research community locate, identify, and cite research data with confidence. Through collaboration, DataCite supports researchers by helping them to find, identify, and cite research data; data centres by providing persistent identifiers, workflows and standards; and journal publishers by enabling research articles to be linked to the underlying data/objects.

DataONE (Data Observation Network for Earth) is an NSF DataNet project which is developing a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.

How to engage with MDC?

Making Data Count (MDC) team members are at the center of many initiatives that focus on aspects of metrics, including DLM. We leverage existing channels to build a new data usage standard and to promote integration and adoption amongst data centers and data consumers.

If you want to get in contact and start collaborating with us, please:

  • Join our mailing list
  • Follow us on Twitter
  • Contact us directly!



Understanding the problem

Journal articles are the currency of scholarly research. As a result, we as a community, use sophisticated methods to gauge the impact of research and measure the attention it receives by analyzing article citations, article page views and downloads, and social media metrics. While imprecise, these metrics offer us a way to identify relationships and better understand relative impact. One of the many challenges with these efforts is that scholarly research is made up of a much larger and richer set of outputs beyond traditional publications. Foremost among them is research data. In order to track and report the reach of research data, we must build and maintain new, unique methods for collecting metrics on complex research data. Our project will build the metrics infrastructure required to elevate data to a first class research output.

In 2014, members of this proposal group were involved in an NSF EAGER research grant entitled, Making Data Count: Developing a Data Metrics Pilot . That effort surveyed scholars and publishers and determined which metrics and approaches would offer the most value to the research community. We spent one-year researching the priorities of the community and exploring how ideas common to article level metrics (ALM) could be translated to conventions in data level metrics (DLM) and building a prototype DLM service. We determined that the community values data citation, data usage, and data download statistics more than they value the metrics focused on social media. Based on this research, the project partners went a step further and isolated the gaps in existing data metrics efforts:

  • there are no community-driven standards for data usage stats;
  • no open source tools to collect usage stats according to standards;
  • and no central place to store, index and access data usage stats, together with other DLM, in particular data citations.

This project proposes to fill these gaps by engaging in the following activities:

  1. We will work with COUNTER to develop and publish code of practice recommendations for how data usage should be measured and reported
  2. We will deploy a central online DLM hub based on the Lagotto software for acquiring, managing, and presenting these metrics
  3. We will integrate new data sources and clients of aggregated metrics to serve as exemplars for the integration of data repositories and discovery platforms into a robust DLM ecosystem
  4. We will encourage the growth and uptake of DLMs through an engaged stakeholder community that will advocate, grow, and help sustain DLM services

Throughout each of these activities, we will encourage the growth and uptake of DLM through an engaged stakeholder community that will advocate, grow and sustain the services. As a result, the community will finally have the infrastructure needed to build relationships and better understand the relative impact of research data.