March 11, 2010

Pages


Search Site


Subscribe

...to receive future posts via email.

Topics



Archives

Entries Tagged as 'Best Practices'

Inner Join vs. Outer Join, and Why My Brain Sucks

February 02 2010 by Adam

This afternoon I have been having a bit of a group discussion on the perils of INNER JOINs in SQL. It started with a tweet that Inner Join's are dangerous and you should use them with caution. A bunch of people asked me what exactly I meant, and of course it's difficult to get that across in chunks of 140 characters or less, which brings us to the present and this blog post.

Consider that you have a relational database setup to store survey responses. Each respondent creates 1 ResponseSet, which gets a ResponseSetId, and then from there you can build up responses to the various questions. There are N questions. Each question can have M responses, all of them free-form text. There are probably a hundred ways to lay out the database to support this, but in the case of the database that inspired the tweet, I've got a Response table, with a foreign key to the ResponseSet table, which has a foreign key to the Survey table. A response belongs to a response set, which belongs to a survey.

In your mind, write some SQL to join these tables and get back the questions and answers for a given ResponseSet for a report (say, ResponseSetId 473240432, if that helps you think). Here is the first thing that pops into my head:

SELECT     QuestionId     , ResponseText FROM     tblResponseSet rs     INNER JOIN tblResponse r ON rs.ResponseSetId = r.ResponseSetId     INNER JOIN tblQuestion q ON q.QuestionId = r.QuestionId WHERE     rs.ResponseSetId = @responseSetId GROUP BY     q.QuestionId ORDER BY     q.QuestionSort

That's great, right? Unfortunately, no. At least, not in my case. Even assuming that all questions are mandatory (1+ responses required), this does not account for other buggy software I've written (we all do it!) that saves, copies, or modifies the data that may result in blank responses or no responses for a given question. Rather to the point, what I would rather see in the case of this poorly-maintained theoretical data (as modified by my buggy code) is the rows I expect, but with NULL values for missing responses.

In the event of the survey above, it would be more helpful to the developer trying to debug the application to know that NULLs are coming back as survey responses, than it would be to know that the question isn't being included in the result set. The join for the question data is fine — at least in as much as it isn't the cause of this theoretical row being dropped — it's the join on responses that's causing the problem. In effect, by eliminating the row altogether, we're creating a false-negative, or the appearance of a bug that doesn't exist. (The question table should be joined via outer join as well.)

Using this method of querying the data, each row in the result set will give the question and 1 of its responses. The question will repeat with each different response, making it easy to use the group attribute of ColdFusion's cfoutput tag to display each question with the various responses to it for the given Response Set. However, with the above query, if there are no responses for ResponseSetId 473240432 and QuestionId 42, then QuestionId 42 is dropped from the result set. That's a Bad Thing (TM)!

In essence, plan for the worst but hope for the best. Whenever I write a join condition, instead of asking myself "How is this data relationship supposed to work?" I should be asking myself "What do I want to happen if some data turns up missing?" By writing an inner join, I'm answering the second question with "drop the row". The answer I wanted to provide for that question was to "show nulls", which is written in SQL with an OUTER JOIN.

That is the inherant danger of INNER JOINs of which I was speaking.

Posted in Best Practices | Databases | 4 comments

Now Soliciting Your Subversion Horror Stories And Worst Practices

May 11 2009 by Adam

I'm giving a presentation at the end of the month on Subversion for my office. It's going to be recorded and I'll be sure to post the video here for anyone interested in watching.

Here's where I need your help: What are some of the craziest, dumbest, most ridiculous things you've ever seen done in Subversion (or some other version control system, as long as the scenario would still apply)?

Anything is fair game. Awful commit comments? Terrible branching or merging practices? I want to hear about it all, and I welcome your comments!

Posted in Subversion | Best Practices | 1 comments

Chrome/Safari Gotcha: Nameless Form Fields

April 29 2009 by Adam

In ColdFusion, you can loop over the keys in a structure without knowing what the key names are, by using either this syntax:

Or this syntax:

for (var key in myStruct){ foo = myStruct[key]; }

This is useful when you need to loop over a form with dynamically created fields — like editing N person records at a time.

I recently found out that in Webkit — and thus Safari and Google Chrome — form fields without a name attribute are included in a form post, just without a name. Odd, right? I know. I've used nameless form fields in the past, like a select box that's only used for UI functionality, tied to JavaScript. In IE and Firefox, the field is ignored and not posted with the form. In Chrome and Safari, though? Included! Let's look at an example form.

This page will submit to itself, and if the form was submitted, dump the form scope. In Firefox, we see what you might expect:

But in Chrome or Safari, you'll notice a small difference:

The extra table row shows that there is another key in the structure; but as we can see it doesn't have a key name. I'm not sure why the value isn't displaying, because it's there. I'm guessing it's got something to do with the null key name. (I'm filing this whole thing under odd but true.)

So if you attempt to loop over each key and output it, depending on your method, you'll run into an error. I'm personally a fan of the structName[keyName] notation, so when I first tried this, no error was thrown, and my output was:

#key#=#form[key]#

=foobar
SUBMITBTN=Submit
FIELDNAMES=SUBMITBTN

I know that not having a key name can cause problems though. Let's try outputting each value using evaluate instead of structName[keyName] notation.

#key#=#evaluate("form." & key)#

Now, I'm always saying there's almost never a good excuse for using evaluate because there's almost always a more efficient way around it. For that reason, I can't think of what you might be doing that would put you in this situation… but it's still best that you're aware of the potential issue.

My advice would be to either try to put the nameless form fields outside of a form (which would be considered malformed XHTML), or to just give it a name and ignore it in your processing. The real lesson here is not to ever rely on browser quirks, because they may not exist in your favor forever.

Posted in Best Practices | ColdFusion | 3 comments

Finding the current DB type in a Mango plugin

December 09 2008 by Adam

I was recently burned because I wrote some SQL that was specific to MS SQL Server in one of my Mango plugins. It didn't occur to me at the time that my code might not work against other databases, because 99% of the time I have control of both my code and the DBMS that it runs against. That's not the case when writing a plugin and releasing it into the wild — anything can happen. Now that I've figured out how to properly account for multiple database types inside a Mango plugin, I thought I would share that knowledge with you so you don't make the same mistake I did.

Mango's plugin API provides a couple of hooks into its core, which prove to be very useful here. Whether you're building your plugin by following the instructions on mangoblog.org, or by taking an existing plugin and modifying it to do what you need, you will have an Init function that takes 2 arguments: blogManager and preferencesManager. The BlogManager argument is your starting point for interacting with Mango and provides access to a series of other "managers" and "interfaces" that allow you to interact with Mango via the API instead of modifying the database directly — ensuring the most possible backwards compatability when Mango changes.

Hint: You can dump an instance of blogManager to see available functions, then dump the return value of those functions to see what they return.

In your init function, you need to save a reference to the BlogManager so that you can use it later, when an event is dispatched:

Now that we have a reference to BlogManager, we need to use it when an event is dispatched to find out which database type the current user is running. To do that, we need to use the QueryInterface object, returned from blogManager.getQueryInterface(). This object has a method called getDBType() that will return a string describing the current DBMS. Mango supports MySQL and MS SQL Server 2000 and 2005.

DBMSReturns
MS SQL Server 2000 mssql
MS SQL Server 2005 (Express) mssql_2005
MySQL mysql

Now that we know what values we can expect from this function, we can safely write custom SQL per potential database type. I've started writing a fail-safe default case as well; so that even if it's not the best way to go about things, at least the plugin will work. Here's some example code showing how to use the reference we saved to BlogManager to find out the database type, and then to use that to write separate SQL statements depending on the DB type, and a default.

Note that I'm not querying the database directly, I'm using the QueryAdapter's makeQuery() function to execute the SQL. This is considered the best practice. When I use a CFOutput or CFLoop to process the results of the query, I limit the rows using from="1" and to="#variables.intPopularPostCount#" so that even in the default case the correct number of rows is used.

Any questions?

Posted in Best Practices | Mango | 1 comments

Best Practice: Separate App config from Framework Config

July 29 2008 by Adam

Best Practices is something that I don't think anyone, anywhere, ever gets 100% right. There's no "right" way to do everything. But we're striving to do things better all of the time, right?

For example, in the ColdFusion world we have Unit Testing frameworks like MxUnit, CFUnit, CFCUnit, and Selenium. I know Selenium isn't just for CF, but it does integrate nicely! Test-driven Development is all the rage these days, and while that's a good thing it's not for everyone and certainly something very difficult to "get right."

Consider that in a large project — one with 1,000 test cases — if you refactor some code that affects 10% of your tests, you have to re-write 100 test cases. This isn't to say that TDD is a bad thing… far from it! Just that there is no silver bullet, and that you always have to be thinking ahead to try and do things right the first time, and not repeat yourself.

This brings me to a change I recently made in Grub.

Grub uses the Model-Glue framework to, as I like to say, "code less and do more." In Model-Glue's XML config, we have an <include> tag that allows us to separate our config into logical sections to keep maintainability high. But did you know you can do the same thing with your ColdSpring XML config?

Well, not with out-of-the-box 1.0… You have to download the Bleeding Edge Release, here. The feature was first mentioned in November of 2006, when Jared announced that he had written it. It was then blogged by a bunch of other ColdSpring users, but I don't think any of them mentioned that it wasn't yet included in the official release except Mark Drew, and even then I don't think I picked up on that fact until well after my second or third reading of his post. I ended up reading this bug report that clued me in, after about an hour of trying to figure out why the necessary method wasn't where it should be.

After you download the BER and update your local copy of ColdSpring, you can add lines like the following to your beans XML:

I used this code to separate out my Model Glue config — which will be constant in every environment, but different between them (debug, reload, allow event generation in dev; no debug, no reloading, and no generation in production or staging) — from my bean definitions, which change frequently as I work on the applicaiton. Now, with my ANT build script, I can build and deploy my project without ever thinking about config, and I know it will always be right because it will never change without me doing so manually.

I'm somewhat surprised that over a year has gone by and it's still not included in the official release. This means that I'm currently using two unreleased frameworks in Grub (the other being the Model-Glue 3 RC), and that I am going to have to include them both in the download to make it as user-friendly as possible.

Posted in Best Practices | ColdFusion | Frameworks | 5 comments