Squish It!
Something that comes up often in code reviews at my office is that you should combine and minify JavaScript and CSS to reduce the number of http requests that need to be made to render the page, and to minimize the amount of data that needs to be returned for those requests. Usually this involves pointing out that the YSlow and PageSpeed extensions for FireFox will easily point out this and other things that will improve your page load times.
Knowing is only half of the battle, though. The other half is doing; and sometimes people think that the effort involved in creating and maintaining these combined and minified resources is a hassle, and not without reason. It definitely can be.
There are tools available that make the process fairly easy, but none that I have found that make it transparent and dynamic, so I set out to do just that.
While writing this post, I saw Ray's post about his exploration into the topic, and that he mentioned the Combine project. I'm a big fan of not reinventing the wheel, so I had been building on that project in the weeks before I started writing this post. I don't know how long Ray was working on his code before posting, but I was trying different iterations of mine in between my other work off and on for a week or two, before totally scrapping my original approach and going with something simpler but easier to control. I'm not implying that Ray stole my idea (or did he?!)... I just found it amusing that we both have been working on the same ideas at around the same time.
This idea is also a little different from the Squeezer ColdFusion Builder extension created by Guust Nieuwenhuis (the first entry in Ray's CFB Extension Contest). The code I will present below runs on every request and will dynamically combine and compress files for you. Squeezer requires up-front work of combining and compressing JS/CSS, and then you're required to maintain those optimized files as the originals change as well. While my solution does require a little extra code to run on each request, it is less maintenance work for developers, and also has several aggressive caching mechanisms. (I can't take any credit for those features -- they come by virtue of projects I'm making use of.)
The combined and compressed files are written to disk so that they only need to be re-combined and re-compressed when one of the originals changes; and ETags are supported so that if the cached file hasn't changed from the last time the user requested it, it won't be sent. The only shortcoming I see with this method is that it makes use of the YUI JS and CSS compressor, instead of Google's Closure Compiler. The difference is a little more than negligible, but for now, this provides significant improvement over including lots of disparate files, so I'm pleased. I may try to get Closure Compiler added to the Combine project in the future, though.
My original goal was simple: Use Combine in a way that is completely invisible to the developer to modify the page buffer after it's been assembled by ColdFusion, but before it is returned to the user. (Ironically, that was also what the first comment to Ray's post was about.) I wrote some code that did this, and it worked well, but in the end I decided that there were just too many caveats to make it useful to anyone else in that form -- so I would have to rewrite it. First, I'll show some pseudocode for how I accomplished this, and then I'll talk about the potential pitfalls with this method.
I used Application.cfc's onRequest event handler to capture and modify the page buffer:
onRequest(string thePage){
buffer = saveContent(include(thePage));
if (application.squish){
buffer = squishJsCss(buffer);
}
output(buffer);
}
Remember, this is just pseudocode...
The squishJsCss method went something like this:
squishJsCss(string buffer){
//find all instances of javascript and css
matches.css = regexMatch('<link href="...">', buffer);
matches.js = regexMatch('<script ...></script>', buffer);
//strip out existing tags, since we'll be replacing them
for each match in matches.css {
replace match with '';
}
for each match in matches.js {
replace match with '';
}
//just before the </head> tag, insert our combined CSS
buffer = regexReplace('</head' with '<link href="/combine/?files=..." /></head');
//just before the </body> tag, insert our combined JS
buffer = regexReplace('</body' with '<script src="/combine/?files=..."></script></body');
}
I had accounted for different media-type possibilities for CSS (resulting in one combined request per media type), and excluded externally-hosted files (ala google-hosted jQuery) completely. I tell you, it was a thing of beauty. It worked exactly as I had intended, and was reasonably quick in my tests. And, just as I had planned, it was totally transparent. I didn't have to make any changes to my HTML at all, it just worked.
When I started to show it to some of my peers though, they asked some very smart questions. What if I don't want my files combined/compressed for some reason? What about files included inside IE's conditional comments? What happens if my page has a <CFFlush/>? Since I'm capturing and manipulating the entire page buffer as a string, how does it perform on larger pages? (My test was a very lightweight page.) What if I need a certain script to be loaded in the head tag instead of in the footer area for functional reasons?
To answer the first and last questions together, I went with an opt-in approach. I modified my regular expressions to only match <script squish='true'></script> and <link squish='true'/>. That worked well enough, and allowed you to put scripts wherever you wanted by not opting-in to compression. Then I started thinking about the other questions. Dealing with scripts and css included inside conditional comments would not be easy. That would make my relatively manageable regular expressions into some hairy beasts, for sure. And the <cfflush/> question? That was a deal breaker. You can't do a flush inside a <cfsavecontent> -- it throws an exception.
So I scrapped the whole thing. As cool as it would be, there are just too many problems that could come up, and trying to explain them to someone would make the project sound like more hassle than it was worth. The invisible approach is possible, but I wouldn't recommend it for a team, only for something you're working on yourself.
My backup plan, though, was to do something with custom tags. They are much more explicit and easily grokked. Output will be in the same place in the code that I, the developer, put the custom tags; and the custom tags look a lot like my existing <script> and <link> tags. This is what I ended up with:
<cf_squish>
<cf_squishFile href="assets/main1.css" media="screen" />
<cf_squishFile href="assets/main2.css" media="screen" />
<cf_squishFile href="assets/main3.css" media="screen" />
<cf_squishFile src="assets/print.css" media="print" />
</cf_squish>
<cf_squish>
<cf_squishFile src="assets/app.js" />
<cf_squishFile src="assets/global.js" />
</cf_squish>
The above code results in the following output:
<link rel="stylesheet" type="text/css" href="/combine/?files=/myapp/assets/main1.css,/myapp/assets/main2.css,/myapp/assets/main3.css" media="screen" /> <link rel="stylesheet" type="text/css" href="/myapp/assets/print.css" media="print" /> <script type="text/javascript" src="/combine/?files=/myapp/assets/app.js,/myapp/assets/global.js"></script>
Although it doesn't have the coolness factor that the invisible solution had, it's much more explicit, it gives the developer greater control of positioning and order, and it doesn't require potentially running multiple regular expressions on a huge string buffer several times. On top of that, the average developer can look at your code and understand what's going on -- unlike with the invisible approach, where your code contains one thing and the output is completely different for no apparent reason. So a little part of my heart was broken, but hey, this is still pretty cool stuff we're doing here.
There are still some caveats, but these are much more manageable.
- Your CSS can't use relative paths in any urls, like for images or font embedding; they all need to be absolute.
- Relative paths (to the files, not in the files) will be converted to absolute paths using the conversion function I previously wrote about.
- You also need to be aware of any rules being overridden in your CSS. I believe that the order is properly maintained in all cases, but I haven't thoroughly vetted this yet. If you do notice something out of order, please let me know.
And with that, here are the two custom tags:
squish.cfm:
<cfsetting enablecfoutputonly="true" />
<!--- this tag only does processing in the end-state --->
<cfif thisTag.executionMode eq "end">
<!--- default attribute values --->
<cfparam name="attributes.media" default="" />
<cfparam name="attributes.reload" default="false" />
<cfif attributes.reload>
<cfset variables.reinit = "&reinit=true" />
<cfelse>
<cfset variables.reinit = "" />
</cfif>
<!--- all data from child tags is stored in THISTAG.ASSETS --->
<cfif not isArray(thisTag.assets) or arrayIsEmpty(thisTag.assets)>
<cfexit method="exittag" />
</cfif>
<cfset variables.resources = StructNew() />
<cfset variables.resources.css = StructNew() />
<cfset variables.resources.js = arrayNew(1) />
<cfloop condition="true">
<!--- ignore external files --->
<cfif lcase(left(thistag.assets[1].src, 4) eq "http")>
<cfoutput><link rel="stylesheet" type="text/css" href="#thistag.assets[1].src#" <cfif thistag.assets[1].media neq "">media="#thistag.assets[1].media#" </cfif>/>#chr(13) & chr(10)#</cfoutput>
<cfset arrayDeleteAt(thistag.assets, 1) />
<cfscript>
continue;//skip the rest of the loop, just jump back up to the top
</cfscript>
</cfif>
<!--- split up input files into separate arrays for CSS and JS --->
<cfswitch expression="#right(thistag.assets[1].src, 3)#">
<cfcase value="css">
<cfif len(trim(thistag.assets[1].media)) eq 0>
<cfset thistag.assets[1].media = "default" />
</cfif>
<cfset thistag.assets[1].media = listsort(thistag.assets[1].media, "textnocase") />
<cfparam name="variables.resources.css[thistag.assets[1].media]" default="#arrayNew(1)#" />
<cfset arrayAppend(variables.resources.css[thistag.assets[1].media], thistag.assets[1].src) />
</cfcase>
<cfcase value=".js">
<cfset arrayAppend(variables.resources.js, thistag.assets[1].src) />
</cfcase>
</cfswitch>
<cfset arrayDeleteAt(thistag.assets, 1) />
<!--- quit when there are no more assets to process --->
<cfif arrayIsEmpty(thisTag.assets)>
<cfbreak />
</cfif>
</cfloop>
<cfif not structIsEmpty(variables.resources.css)>
<cfloop list="#structKeyList(variables.resources.css,'|')#" index="variables.key" delimiters="|">
<cfoutput><link rel="stylesheet" type="text/css" href="/squish/?files=#arrayToList(variables.resources.css[variables.key])##variables.reinit#" <cfif variables.key neq "default">media="#variables.key#" </cfif>/>#chr(13) & chr(10)#</cfoutput>
</cfloop>
</cfif>
<cfif not arrayIsEmpty(variables.resources.js)>
<cfoutput><script type="text/javascript" src="/squish/?files=#arrayToList(variables.resources.js)##variables.reinit#"></script>#chr(13) & chr(10)#</cfoutput>
</cfif>
</cfif>
<cfsetting enablecfoutputonly="false" />
squishFile.cfm:
<cfsilent>
<!--- don't run a 2nd time if an endtag (or self-closing tag) is used --->
<cfif thisTag.executionMode eq "end">
<cfexit method="exittag" />
</cfif>
<!--- associate this child tag with its parent tag; rename the default attributes collection --->
<cfassociate basetag="cf_squish" datacollection="assets" />
<!--- set default value for input parameter(s) --->
<cfparam name="attributes.href" default="" />
<cfparam name="attributes.src" default="#attributes.href#" />
<cfparam name="attributes.media" default="" />
<!--- convert any relative paths to absolute, if they have been passed in --->
<cfset attributes.src = convertRelPathToAbs(attributes.src) />
<cffunction name="convertRelPathToAbs" access="private" output="false" returnType="string">
<cfargument name="pathIn" type="string" required="true" hint="the path as it is passed to the custom tag. may be relative or absolute" />
<cfscript>
var local = structNew();
//if the path is already absolute, just pass-through
if (left(arguments.pathIn, 1) eq '/'){
return arguments.pathIn;
}
//if the path is external, just pass-through
if (lcase(left(arguments.pathIn, 4)) eq "http"){
return arguments.pathIn;
}
/**************************************************/
/**** else... the path is relative, convert it ****/
/**************************************************/
//get the absolute path of the current request to base relative paths from
//local.baseAbsPath = ;
local.result = getDirectoryFromPath(CGI.SCRIPT_NAME);
while(true){
local.listTop = listGetAt(arguments.pathIn, 1, '/');
if (local.listTop eq '.'){
//current directory, just ignore
}else if (local.listTop eq '..'){
//up 1 level in the tree, if possible
if (listLen(local.result, '/') gt 0){
local.result = listDeleteAt(local.result, listLen(local.result, '/'), '/');
}
}else{
break;
}
arguments.pathIn = listDeleteAt(arguments.pathIn, 1, '/');
}
if (left(arguments.pathIn, 1) neq "/"){
arguments.pathIn = "/" & arguments.pathIn;
}
if (len(local.result) gt 0 and right(local.result, 1) eq '/'){
local.result = left(local.result, len(local.result) - 1);
}
return local.result & arguments.pathIn;
</cfscript>
</cffunction>
</cfsilent>
Posted in ColdFusion | HTML/CSS | JavaScript | 4 Responses
The web server is more efficient than ColdFusion when it comes to delivering static resources. My approach was to configure the web server to cache all js/css files for 30 days in a specific shared static resource directory. I wrote a script to recurse the static sub-directories and create a server-scoped struct with the hashed filenames as the key and only contain the file modification date. A custom UDF performs a check for the existence of this struct, retrieves the file timestamp and adds it as part of the filename so that it can be cached. (This also allows the ability to refresh the cached file whenever it is updated by updating the timestamp.) IIRF/mod-rewrite rules are then used to remove the timestamp from the static resource.
This is what the source code looks like:
src="#cachedURL('/_scripts/jquery.min.js')#"
This is what the output looks like:
src="/_scripts/jquery.min_20100219152229.js"
IIRF/Mod-Rewrite takes this URL:
/_scripts/jquery.min_20100219152229.js
and rewrites it to this URL:
/_scripts/jquery.min.js
Since the static files are cached for a longer period of time, the browser doesn't even bother re-requesting the file or check for eTags unless it is not in the browser cache or the file timestamp has changed forcing a fresh download.
The most important thing is how quickly the page is "perceived" to have loaded. I'm using LABjs to defer loading of scripts and it has had a huge boost in performance. For testing load time and comparing it against competitive websites, I've been using WebPageTest:
http://www.webpagetest.org/video/
As a result of doing this type of optimization, most page reloads are really fast with only the HTML document requested. 304 Requests are not preventing the DOM from displaying (which results in a slower perceived load time.)
For a working sample of this (and LABjs), check out this work-in-progress:
http://www.radiorep.com/