Why Jupyter Notebooks Won't Replace Academic Papers

Posted on Sun 29 July 2018 in science

Recently there has been some buzz around Jupyter Notebooks in science, especially in light of the LIGO team sharing their detection and analysis of gravitational waves in Jupyter Notebooks. Others have claimed that Jupyter Notebooks will render traditional academic journal articles obsolete. The notebook or literate programming format improves on the reproducibility and disseminating of scientific work, however several key factors limit the notebook as a way to communicate science. In the article, I'll touch on these issues and explain why I believe that the current model of sharing computational science is here to stay.

Note: throughout this article I might refer to Jupyter Notebooks as notebooks, or the method as literate programming. I'm talking about the same thing, despite the second two terms being more generic.

Why Jupyter Notebooks Are So Promising

There are completely valid reasons to want to improve on the current publishing situation and Jupyter Notebooks are capable or have solved some of these problems.

Notebooks are open. Under the hood, notebooks are plain text formatted as JSON. They can technically be run or rendered by anyone who can reproduce an appropriate environment. There's no vendor lock-in. JOSS, the Journal of Open Source Software is a completely open publisher for scientific software that could support dissemination by Jupyter Notebook.
Arfon Smith, Editor-In-Chief of JOSS said on Talk Python to Me #157 shared a common sentiment that journal articles can sometimes be more like advertisements for the research. As a consumer of knowledge and software tools, literate programming allows me to not only run and reproduce the scientific result, but to use software as an application. Reproducibility is such a well known issue and for results generated by computer programs, there's no excuse to not share the exact recipe to reproduce a result.
As a producer of knowledge and tools, I can provide a resource so people can use (and cite!) my work. I can use feedback and create a community around tools and techniques to make them better.
Notebooks can provide more features to papers, such as interactivity with visualizations or datasets, sorting tables, or extracting data.

Why Notebooks Won't Take Over Yet

I want to avoid any discussion of compatibility, trendiness, environment or the technical choices of Python/Markdown, institutional publication models, and so on. These have been covered elsewhere. Some of these are valid arguments and others are part of the inertia that fights change. I think there are more fundamental problems with the literate programming as science model than these issues. If anything, the scientific community has proven its willingness to hang its hat on what some might consider to be legacy software: new FORTRAN code is still be written today.

Technical Issues

The main issue is that there's no balance to hide complexity or abstraction. Scientific methods can be complex and to "stand on the shoulders of giants" implies that we build knowledge and tools on top of each other into more and more advanced approaches. The notebook format is limiting for this. Scientific software should be packaged and not flattened into a notebook script. See most of the JOSS papers for examples of good software. Many of these projects contain notebooks to show how to run or use the software, but few of them have the software in the notebook, even though that's the key part we're supposed to be sharing.

There are many best practices lists for writing scientific software. The Scientific Cookiecutter project and the Data Science Cookiecutter project both give good ones. Some highlights are keeping I/O separate, writing for readability, functional programming approaches, and managing data separate from code. All of these conflict with the notebook model, which promises convenience of putting everything in one place. It's hard to write flexible, sustainable software in a notebook which encourages lumping everything together in what boils down to a script with inline outputs.

In addition, notebooks are challenging to maintain and iterate on. If I want to test a different idea, then I am copying cells, changing things and running them over and over. To share the notebook, I have to restart the kernel several times and make cleanup passes to make sure that everything is consistent. A scientist who comes along later might want to integrate their changes with my own, but they are suddenly stuck because everything has changed in my test and cleanup loop. This is a problem if we are expecting collaborators and derivative works to improve on the approach a publish the results, especially because usage and citations are key metrics for many academics.

A lot of software uses data. If not to gain insight, then at least as a comparison to validate a model or approach. How should this data be controlled? Where should the schema be described and validated? How can users supply their own data or extend the format schema?

Cultural Issues

Adam Rule discussed his work on seeing how Jupyter Notebooks are actually on Talk Python to Me #171. They analyzed a dataset of over 1 million notebooks and found several interesting conclusions that are discussed in their paper and on the podcast.

They found that people often used the notebooks for exploration and iteration, leaving behind messy notebooks without an overarching narrative. There's also no real guidelines or ideas of what the best practices are. What should a publication quality notebook look like? Once we have exposed all of the underlying steps in the notebook (distraction aside, see above), how do we strike a balance between letting data and methods speak for themselves and clarifying what part of a work is new knowledge? Academics with PhDs have arguably at least 10 years of training in writing more scientifically. As pointed out by Rule *et al.#, there's hardly formal training to use a Jupyter Notebook to the same standard of training to use a lab notebook.

There's also cultural inertia to what is considered scientific contribution. Rule mentioned in the podcast and the paper that PIs and group members alike see notebooks as half-baked, preferring slide decks and results over code. Much of scientific community largely seems to agree that the currency for sharing ideas in talks and at conferences is a polished slide deck. Even among the community at JOSS, I have seen that reviewers are much more critical of the software design and utility, even to go so far as requesting new features before recommending work for acceptance. Detailed vetting can lead to better reproducibility than the current status quo, but clearly a balance should be struck. Time will tell how efforts like JOSS or growing conferences like SciPy (see my favorite talks from 2017 and 2018) will shape the scientific community's view on software contributions, citations, and meaningful work.

In my experience, some people just don't like working in the open. Academia is competitive in jobs and funding. Not everyone wants to share. People seem to be getting better at publishing their software or data at the same time as their paper, but I have gotten pushback when suggesting that software and data be open from the start as we are doing with pycalphad and ESPEI.

Conclusion

Notebooks are a promising paradigm to share computational science, however there are some large and small roadblocks to overcome before they can be considered a replacement for current academic papers.

I want to mention that Jupyter Notebooks are a key part of my own daily workflow as a PhD student working in a computational science and as a maintainer of Python packages. Sketching ideas in a notebook that I can factor into a package is a huge productivity boost for me. I fully endorse the use of literate programming tools for doing science, but I don't think they are the right medium for sharing knowledge.

I am in complete support of the Jupyter team and highly suggest using Jupyter Lab, which has been ready for daily use since February 2018.