Friday, October 15, 2010

Under-appreciated REPT() - Quick Lightweight Excel Data Analysis

One of the most frequent tasks I face is to start forming a point of view on 50-1000 rows of information.  Common data sets are a large list of projects, a list of sales figures, an interesting bit of log file data, my own list of credit card transactions, or pricing patterns in a large corporation.  

I almost always use a variation of REPT() to get started.  I haven't seen anyone else use this method in the corporate world, oddly.  The common solution I see many people use is to sort the data, scan it visually, attempt to coalesce the data, then form an Excel chart around it.  Excel charts are difficult to place inline with other data, though, and Excel charts only show about 200 different bars before they become unwieldy.  

REPT("|",100) will draw a bar 100 units tall very simply.  For me, this solves 80% of my charting needs when I'm looking at new data.

Below is a bar chart of the 700 major IT projects the Federal Government has pursued in 2010, from data dot gov.
It becomes straightforward to see the relative size of 700 projects.  I simply added the portions in green to the raw data set in about 5 minutes, then sorted.  The full descriptions are available in the xls version of this data below, and they contain some interesting descriptions of government initiatives.






I am an enthusiastic reader of Edward Tufte's work.  He formed the idea of a sparkline -- intense, simple, word-sized graphics.  
A Frenchman who was writing macros around the sparkline first turned me on to the REPT() function in Excel as a way to visualize data in a bar chart fashion.  I've lost that link, but now Juice Analytics has done a very complete piece on the idea.

Excel 2007, 2010 now have a sparklines like concept, called "Data Bars" in Excel 2007.  For me, I've found that plain old REPT() is still the quickest, most flexible way to move my analysis forward.

No comments:

Post a Comment