CFSpreadsheet Bug. Or is it? You decide…
During the CF9 Beta process I was among several people who submitted bugs about the headerRow attribute of the new CFSpreadsheet tag, and how it might not be working as well as it could:
When specifying the headerrow attribute in [CFSpreadsheet], the column names are retrieved from the header row, but they are also included in the first row of the query. The expected behavior is that they should not be included in the query at all, only as the column names. [#72740]
I was contacted by an Adobe engineer about my bug report, and after some back and forth essentially I was told that this isn't a bug. Before we get any further, let me make sure I've explained this perfectly clearly. Let's start with a spreadsheet with some data in it, and column headings:

Now, let's read the contents of that spreadsheet into a query, setting the column names based on that first row of data:
<cfspreadsheet
action="read"
headerRow="1"
query="result"
src="#expandpath('./test.xls')#"
/>
<cfdump var="#result#" />
And here's what gets dumped:

It might be a little subtle, so if you didn't immediately notice, let me point it out: The column headers remain in the data. I would expect that the use of the headerRow attribute would "eat" the headers from the data. I can't think of a logical reason they should remain in…
I verified that this behavior is the same for both Excel 97 (xls) and OOXML (xlsx) file formats.
Do you agree that the header rows should be "eaten"? Disagree? I'd like to know.
Despite the public bug tracker, Adobe is strangely quiet on this issue. I've had some correspondence with one of the engineers, who told me it wasn't a bug (my own bug was closed and marked as a duplicate, but #72740 remains open, marked as not a duplicate, verified (usually meaning reproduced), and somehow, not reproduced…).
In my correspondence with Adobe, I was told that to get the desired behavior I should do the following:
<cfspreadsheet
action="read"
headerRow="1"
query="result"
src="#expandpath('./test.xls')#"
rows="2-65536"
/>
The only difference between this and my original code is the addition of the rows attribute; so it's not that much extra work. What it's doing is specifying to skip the header row and return all rows of data up to the theoretical maximum (the number is the Excel 2003 row limit, (2^16) and Adobe confirmed that this is the limit for ColdFusion as well).
This specifies that I want to skip the header row, does in fact get me the results I want, and in general isn't terribly difficult. On the other hand, I've basically been asked to commit the number 65536 to memory, or to put it on a post-it on the side of my monitor, so that I can refer back to it later. So for that reason, I don't consider it an acceptable work-around.
To me, an acceptable work-around might be implementing one of these syntaxes: rows="2+" or rows="2-" or rows="2..". For what it's worth, the documentation for the rows attribute currently reads:
The range of rows to read. Specify a single number, a hypen-separated row range, a comma-separated list, or any combination of these; for example: 1,3-6,9.
Still, a work-around would be a black eye in this case, in my humble opinion. What reason is there to leave the header values in the data set? Can anyone out there give a practical example for wanting them to stay? Until I see one, I will continue to believe that they should be removed; and while I would begrudgingly accept a work-around, I'd really much rather see the issue addressed because I want to see ColdFusion become the best platform & language it can be.
If you agree with me, I would encourage you to leave a comment on the open bug indicating so, and vote for it. If enough people agree with me, perhaps Adobe will answer the call.
Posted in ColdFusion | 8 Responses
So, while I'm not sure it's a "bug" it's not the behavior I'd expect and it doesn't make sense to me to include it--especially since I think it would be a corner case that people really want the data added to their query object.
I like that option. As much as I agree that the header row shouldn't be included, I would hate for a fix to break existing code that already worked around that (or took advantage of it).
I'm glad it's being fixed. :)