When I started experimenting with graphing changes in topic-proportions over time, I didn’t pay much attention to the design of the graph. I could see that it was far too busy, but I assumed that this would be relatively easy to adjust using
ggplot2‘s many parameters.
It wasn’t. It didn’t take me too long to figure out that I needed to change the data from discrete to continuous in order to see anything like a sparkline, but it was also apparent from the other data sets I was working with that taking the mean at intervals was the only way to make a reasonably clean graph. I ended up using the
aggregate function to create the n-year averages, though I read some intriguing descriptions of the power of data.tables in R. (I refuse to ask for help on stackoverflow, even though it would have saved many hours worth of work. Character flaw.)
I now need to learn how to use the reshape package, with its wonderfully named ‘melt’ and ‘cast’ features, to rewrite the code I’m using to change rows to columns. A simple for-loop iteration over a data-frame in R can take hours, I’ve learned; and I expect that this other solution would finish the job in seconds.
The full list of topics can be found in my previous post.