Google Summer of Code and Pander - Part II

19 Oct 2015

This post is intended to be the second part of the review of my participation in Google Summer of Code with R project working on pander package. This review is about GSoC 2015, post about GSoC 2014 is available here.

My plan for GSoC 2015

During the course of GSoC 2015, I worked on many things, including bug fixes, documentation and testing, my main focus areas that I have stated in the proposal for the summer were:

My full application with detailed proposal is available here.

Adding new S3 methods

As well as during GSoC 2014, I worked on extending generic S3 method with new supported classes. My effort resulted adding support for 20 new classes:

In general, I am very satisfied with the work in extending pander's support for new classes, since overall while working on pander I added around 30+ new S3 methods, bringing the total number to 73 and making pander one the most functionality-rich packages among the onces with similar aim like xtable, ascii, etc.

Testing and code coverage

Between GSoC 2014 and GSoC 2015, Jim Hester created covr package for measuring test coverage. Around the same time I was also inverstigating possibilities to measure R code coverage as part of my research, so I was searching for work done and covr presented a robust and general solution for the problem. Since it was quickly picked up by coveralls.io and codecov.io, so I integrated it in pander's Travis configuration as soon as it was available. The existing test suite in pander gave 54% coverage, which was a good start, but didn't feel enough, since I wanted to do the refactoring and more robust testing before that seems like a great benefit. So firstly, I decided to extend the test suite. As most of uncovered code was in S3 methods, which made me think about the best way to test such methods. After some evaluation, I decided to test the length of output and some characteristics certain not to change even in underlying implementation of pandoc.table changes. Main PRs:

Tests for S3 methods - PR#174 and PR#169
Tests for evals - PR#183
Tests for different uncovered parts - PR#187

While working on that I also realized that due to limitations in R, covr won't treat one line if statements without braces correctly. From this emerged the idea of refactoring, to combine linting and change of braces which was implemented in PR #196 with addition of PR #215 when I figure out how to create a linter to enforce single quotes.

All those efforts brought coverage to 78%, which we considered good, since large uncovered area in evals is hard to test, however that might change once robust package like vtest is developed for R.

Refactoring pandoc.table and evals

Over the time pandoc.table and evals grew to be rather complex functions, which made it much harder to introduce any new functionality. So I enjoyed the idea to refactor them.

My first destination was to refactor pandoc.table, which I have refactored in PR#186. As Gergely pointed out, a lot of redundant complexity in pandoc.table comes from the fact that it supports input of different dimensions, which in turn required a lot of checks in many places. I implemented an intermediate representation in the form of matrix in 91564e5, which total allowed to remove more than 50 lines from pandoc.table.return and made the function more readable. Also that allowed me to fix 2 old bugs: #164 and #186.

After that, I refactored evals in PR#192 and finished implementation of logging (most of which was implemented by Gergely). Most of the refactoring in evals were rather straightforward, especially because a large portion of the code is centered around style unification, which can't be changed much.

Documentation and vignettes

My other main objective was to improve the documentation. pander has been a stable and mature package for quite some time, but it didn't have vignettes for most common use cases, which seemed an important improvement. Moreover, Gergely had an idea to post vignettes to rapporter.github.io/pander, to be able to refer to them in StackOverflow questions for example. I have created 4 vignettes:

The best part was learning to use knitr for writing vignettes, which in my opinion is a great simplification from how it was done before. There exists a nice starting guide, which I greatly recommend.

After finishing writing vignettes and doing updates to README and other parts I had some time left before the end, so I decided to implement Gergely's idea of automatically updating gh-pages with Travis. This was quite an interesting experience, probably one of the best this summer, because a) I had no idea that was possible and b) I had no idea how to do that. I implemented that in PR#220 and even wrote a separate blog post on the topic, so I won't go into much detail here.

Some statistics and aftermath

Short summary of my work during this awesome experience:

148 commits
39 accepted pull requests
20 new S3 pander methods
+8764, -3342 changes through all commits (a lot of this is due to refactoring)
7 bugs fixed

I feel that second year of my participation was quite different from the first one. I had more knowledge of the pander internals, which I think resulted in a bit more productive work this summer. I still learned many new things including using knitr, lintr, logging with futile.logger, automatically updating gh-pages and may more smaller things. Looking back I feel gave me a lot hand-on programming experience, which proves to be useful every day. Also, I wanted thank Gergely Daroczi for being an awesome mentor and spending all this time reviewing my changes and helping me to grow as a programmer. GSoC with Pander was definitely a remarkable experience.