Google Summer of Code and Pander - Part II
This post is intended to be the second part of the review of my participation in Google Summer of Code with R project working on pander package. This review is about GSoC 2015, post about GSoC 2014 is available here.
My plan for GSoC 2015
During the course of GSoC 2015, I worked on many things, including bug fixes, documentation and testing, my main focus areas that I have stated in the proposal for the summer were:
- extend generic S3 method with new classes
- extend test-suite and introduce unified coding style
- refactor pandoc.table and evals
- improve documentation and create topic specific vignettes
My full application with detailed proposal is available here.
Adding new S3 methods
As well as during GSoC 2014, I worked on extending generic S3 method with new supported classes. My effort resulted adding support for 20 new classes:
- pander.tabular
- pander.summary.table
- pander.randomForest
- pander.gtable
- pander.irts
- pander.nls/summary.nls
- pander.manova/summary.manova
- pander.arima
- pander.ols/rms/Glm/cph/lrm/summary.rms
- pander.polr/summary.polr
- pander.survreg/summary.survreg
In general, I am very satisfied with the work in extending pander
's support for new classes, since overall while working on pander
I added around 30+ new S3 methods, bringing the total number to 73 and making pander
one the most functionality-rich packages among the onces with similar aim like xtable, ascii, etc.
Testing and code coverage
Between GSoC 2014 and GSoC 2015, Jim Hester created covr package for measuring test coverage. Around the same time I was also inverstigating possibilities to measure R code coverage as part of my research, so I was searching for work done and covr presented a robust and general solution for the problem. Since it was quickly picked up by coveralls.io and codecov.io, so I integrated it in pander
's Travis configuration as soon as it was available. The existing test suite in pander
gave 54% coverage, which was a good start, but didn't feel enough, since I wanted to do the refactoring and more robust testing before that seems like a great benefit. So firstly, I decided to extend the test suite. As most of uncovered code was in S3 methods, which made me think about the best way to test such methods. After some evaluation, I decided to test the length of output and some characteristics certain not to change even in underlying implementation of pandoc.table
changes. Main PRs:
- Tests for S3 methods - PR#174 and PR#169
- Tests for
evals
- PR#183 - Tests for different uncovered parts - PR#187
While working on that I also realized that due to limitations in R, covr won't treat one line if
statements without braces correctly. From this emerged the idea of refactoring, to combine linting and change of braces which was implemented in PR #196 with addition of PR #215 when I figure out how to create a linter to enforce single quotes.
All those efforts brought coverage to 78%, which we considered good, since large uncovered area in evals
is hard to test, however that might change once robust package like vtest is developed for R.
Refactoring pandoc.table and evals
Over the time pandoc.table
and evals
grew to be rather complex functions, which made it much harder to introduce any new functionality. So I enjoyed the idea to refactor them.
My first destination was to refactor pandoc.table
, which I have refactored in PR#186. As Gergely pointed out, a lot of redundant complexity in pandoc.table
comes from the fact that it supports input of different dimensions, which in turn required a lot of checks in many places. I implemented an intermediate representation in the form of matrix
in 91564e5, which total allowed to remove more than 50 lines from pandoc.table.return
and made the function more readable. Also that allowed me to fix 2 old bugs: #164 and #186.
After that, I refactored evals
in PR#192 and finished implementation of logging (most of which was implemented by Gergely). Most of the refactoring in evals
were rather straightforward, especially because a large portion of the code is centered around style unification, which can't be changed much.
Documentation and vignettes
My other main objective was to improve the documentation. pander
has been a stable and mature package for quite some time, but it didn't have vignettes for most common use cases, which seemed an important improvement. Moreover, Gergely had an idea to post vignettes to rapporter.github.io/pander, to be able to refer to them in StackOverflow questions for example. I have created 4 vignettes:
- Using pander with knitr
- Rendering markdown with pander
- Rendering tables with pandoc.table
- Capturing evaluation information with evals
The best part was learning to use knitr for writing vignettes, which in my opinion is a great simplification from how it was done before. There exists a nice starting guide, which I greatly recommend.
After finishing writing vignettes and doing updates to README
and other parts I had some time left before the end, so I decided to implement Gergely's idea of automatically updating gh-pages
with Travis. This was quite an interesting experience, probably one of the best this summer, because a) I had no idea that was possible and b) I had no idea how to do that. I implemented that in PR#220 and even wrote a separate blog post on the topic, so I won't go into much detail here.
Some statistics and aftermath
Short summary of my work during this awesome experience:
- 148 commits
- 39 accepted pull requests
- 20 new S3 pander methods
- +8764, -3342 changes through all commits (a lot of this is due to refactoring)
- 7 bugs fixed
I feel that second year of my participation was quite different from the first one. I had more knowledge of the pander
internals, which I think resulted in a bit more productive work this summer. I still learned many new things including using knitr, lintr, logging with futile.logger, automatically updating gh-pages
and may more smaller things. Looking back I feel gave me a lot hand-on programming experience, which proves to be useful every day. Also, I wanted thank Gergely Daroczi for being an awesome mentor and spending all this time reviewing my changes and helping me to grow as a programmer. GSoC with Pander was definitely a remarkable experience.