Supporting the executability of R markdown files
Abstract
R Markdown files are examples of literate programming documents that combine R code
with results and explanations. Such dynamic documents are designed to execute easily and
reproduce study results. However, little is known about the executability of R Markdown
files which can cause frustration among its users who intend to reuse the document. This
thesis aims to understand the executability of R Markdown files and improve the current
state of supporting the executability of those files.
Towards this direction, a large-scale study has been conducted on the executability of
R Markdown files collected from GitHub repositories. Results from the study show that a
significant number of R Markdown files (64.95%) are not executable, even after our best
efforts. To better understand the challenges, the exceptions encountered while executing
the files are categorized into different categories and a classifier is developed to determine
which Markdown files are likely to be executable. Such a classifier can be utilized by search
engines in their ranking which helps developers to find literate programming documents as
learning resources. To support the executability of R Markdown files a command-line tool
is developed. Such a tool can find issues in R Markdown files that prevent the executability
of those files. Using an R Markdown file as an input, the tool generates an intuitive list
of outputs that assist developers in identifying areas that require attention to ensure the
executability of the file. The tool not only utilizes static analysis of source code but also uses
a carefully crafted knowledge base of package dependencies to generate version constraints
of involved packages and a Satisfiability Modulo Theories (SMT) solver (i.e., Z3) to identify
compatible versions of those packages. Findings from this research can help developers
reuse R Markdown files easily, thus improving the productivity of developers. [...]