Scholarly: An open, freely accessible dataset of the academic citation network

Authors: Rybacki, Harry, UNCG; Carp, Joshua, University of Michigan; Spies, Jeffery, Center for Open Science

Track: Posters

The primary purpose of this application is to provide the general public a free, open, and comprehensive dataset containing meta-data for academic citations as well as corresponding references. This dataset will provide the public with a vital resource from which they can access, analyze, and distribute public, academic citation meta-data without restriction. While there are currently many resources available to gather information on citations such as CiteSeerX, Google Scholar, and PubMed, each has its own set of limitations. Some common issues include: incomplete or inaccurate data, exclusion of linked citations, and inability to easily access or obtain data in mass for analysis.

The goal is to overcome each of these limitations while demonstrating the necessity and advantages of the open source model. We plan to address the inaccurate and incomplete dataset issue by utilizing crowd-sourcing, negotiating with publishers, and obtaining article meta-data from a wide range of other, accessible sources. Additionally, by analyzing and combining the meta-data obtained we will ensure each citation maintains a completed set of linked citations. Finally, the completed dataset as well as any analysis conducted on it will be free and easy to access by the general public. As an initial step, we expect to show that the relationships between articles within disciplines demonstrate small-world network distributional properties.