Conference site ยป Proceedings

Text and data mining scientific articles with allofplos

Elizabeth Seiver

M Pacer

Sebastian Bassi


Mining scientific articles is hard when many of them are inaccessible behind paywalls. The Public Library of Science (PLOS) is a non-profit Open Access science publisher of the single largest journal (PLOS ONE), whose articles are all freely available to read and re-use. allofplos is a Python package for maintaining a constantly growing collection of PLOS's 230,000+ articles. It also efficiently parses these article files into Python data structures. This article will cover how allofplos keeps your articles up-to-date, and how to use it to easily access common article metadata and fuel your meta-research, with actual use cases from inside PLOS.


Text and data mining, metascience, open access, science publishing, scientific articles, XML



Bibtex entry

Full text PDF