Massive Online Collaborative Research and Modeling using Synapse and Python
Authors: Omberg, Larsson, Sage Bionetworks
Synapse is an open source software as a service (SaaS) platform built by Sage Bionetworks to enable collaborative and reproducible science. Having RESTful APIs at its base, Synapse is able to easily link to analytical software such as Python. In this talk I will present the Python bindings to this platform and, more specifically, how it fostered a collaborative environment for over 140 individual researchers spread across 25 institutions in The Cancer Genome Atlas (TCGA) consortium. Synapse enables tracking of provenance of data from individual genome sequencing centers, processing and quality control, and all the way through results generated from models of cancer genomics. Synapse is designed as an information commons. Allowing any user not only to access data but also contribute results and models. This allows the TCGA collaboration to accelerate discovery by using partial contributed results as starting points for downstream analyses. One sub-project that has emerged from the collaboration is an online machine learning competition to predict expected survival time of cancer patients given molecular phenotype. All submitted models are immediately open sourced allowing derivative models to be built. These collaborative competitions provide an alternative approach to performing computational science which tools like Python and Synapse can greatly accelerate.