Adam Tuttle

Entries for month: September 2011

On Cascaded Hibernate Deletes

Inspired by a Stack Overflow question and one of its answers, I decided to do a little bit of research into cascaded deletes in Hibernate.

What is a cascaded delete? Put simply, when you delete one entity -- a blog post, for example -- the children of that entity are also deleted (the comments on the blog post). Cascades can continue down the chain. If you had threaded comments, for example, the deletion of a top-level comment cascaded from the blog post, could trigger (via cascade) the deletion of another comment that was in reply to the first comment, when configured to do so.

You can read about all of the various options for cascading in the ColdFusion documentation -- that's kind of out of scope for what is already going to be kind of a long article.

So the Stack Overflow answer in question suggests that all cascade deletes in Hibernate are two-step processes: first Hibernate will set the foreign key to NULL, and then it will delete the row(s). My hypothesis is that while this may be true in some cases, there is a way to configure your entities to avoid it.

To test this, I created a quick test database, sandbox application, and two related entities:

Application.cfc:

component {
    this.name = "ormFun";

    this.datasource = "ormFun";
    this.ormEnabled = true;
    this.ormsettings = {
        cfclocation = "models",
        flushAtRequestEnd = false,
        logSql = true
    };

    function onRequestStart(){
        if (structKeyExists(url, "reload") and url.reload){
            ormReload();
        }
    }

}

models/foo.cfc:

component persistent="true" table="foo"
{
    property name="id" fieldtype="id" generator="native";

    property name="bars"
        fieldtype="one-to-many"
        cfc="bar"
        fkcolumn="foo_id"
        cascade="all-delete-orphan";
}

models/bar.cfc:

component persistent="true" table="bar"
{
    property name="id" fieldtype="id" generator="native";

    property name="foo"
        fieldtype="many-to-one"
        cfc="foo"
        column="foo_id";
}

Then I stuck some sample data in the database:

filldata.cfm:

<cfscript>

    import models.*;

    foo1 = new foo();
    transaction {
        entitySave(foo1);
    }

    foo2 = new foo();
    transaction {
        entitySave(foo2);
    }

    bar1 = new bar();
    bar1.setFoo(foo1);
    transaction {
        entitySave(bar1);
    }

    bar2 = new bar();
    bar2.setFoo(foo1);
    transaction {
        entitySave(bar2);
    }

    bar3 = new bar();
    bar3.setFoo(foo2);
    transaction {
        entitySave(bar3);
    }

</cfscript>

And created a page to show the data in a simple way that illustrates its relationships:

index.cfm:

<cfset foos = entityLoad("foo") />
<cfoutput>
    <h1>Foos</h1>
    <ol>
        <cfloop array="#foos#" index="f">
            <li>
                foo #f.getid()#
                <cfif f.hasBars()>
                    <ul>
                        <cfloop array="#f.getBars()#" index="b">
                            <li>
                                bar #b.getId()#
                            </li>
                        </cfloop>
                    </ul>
                </cfif>
            </li>
        </cfloop>
    </ol>
</cfoutput>

Its output looks like:

Foos

  1. foo 1
    • bar 1
    • bar 2
  2. foo 2
    • bar 3

And lastly, a file to delete foo1, which we are expecting to delete bar1 and bar2, but not foo2 or bar3:

delete.cfm:

<cfscript>
    foo = entityLoadByPk("foo", 1);

    transaction {
        entityDelete(foo);
    }
</cfscript>

In Application.cfc, you saw that I enabled Hibernate's SQL logging, and I've got my CF instance setup to log hibernate SQL to a dedicated log file (when enabled like this), using Rupesh's awesome directions. This will be the key to understanding what happens.

For my first test, I'll leave the entities exactly as I have them defined above, and run delete.cfm. Here's the SQL that is logged:

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - 
    update
        bar 
    set
        foo_id=null 
    where
        foo_id=?

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - binding '1' to parameter: 1

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - 
    delete 
    from
        bar 
    where
        id=?

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - binding '1' to parameter: 1

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - 
    delete 
    from
        bar 
    where
        id=?

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - binding '2' to parameter: 1

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - 
    delete 
    from
        foo 
    where
        id=?

09/23 11:17:16 [jrpp-29] HIBERNATE DEBUG - binding '1' to parameter: 1
  1. The BAR table is updated, setting foo_id to NULL where foo_id matches the FOO entity I'm deleting (id 1)
  2. The BAR records that were updated to have NULL for foo_id are each individually deleted. Note that this is not delete where foo_id = null, it is looping over the id's of the BAR elements it previously updated, running 1 delete query for each. (Kind of inefficient, huh?) This is not to suggest that deleting where foo_id = null is a good solution. It isn't, because there could be other orphans for some reason, so it could unintentionally delete too many records. So while it's inefficient, it is working properly.
  3. The FOO record I originally requested to delete is deleted.

Also note that because I expected this behavior based on the Stack Overflow answer, I made my foo_id column nullable in the database. If I hadn't, then a Hibernate exception would have been thrown:

coldfusion.orm.hibernate.HibernateSessionException: Column 'foo_id' cannot be null.

This seems to gel with the Stack Overflow answer in question. But, is there a way to make this more efficient? And while we're at it, this seems to break referential integrity: the child (bar) records are orphaned until later being deleted.

I learned from Bob and Barney (both ColdFusion ORM gurus, at least in comparison to myself!) about the importance of inverse="true", so my first instinct is to add that and see what happens. Here's the updated bar.cfc (no changes necessary for foo.cfc):

component persistent="true" table="bar"
{
    property name="id" fieldtype="id" generator="native";

    property name="foo"
        fieldtype="many-to-one"
        cfc="foo"
        column="foo_id"
        inverse="true";
}

If you happen to be following along at home, don't forget to run ormReload() when making changes to entity definitions like this.

Then I cleared my database and re-populated the data, and re-ran my delete. Here's the SQL that gets logged:

Unfortunately, this did not do what I want. It results in the exact same SQL being run:

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - 
    update
        bar 
    set
        foo_id=null 
    where
        foo_id=?

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - binding '1' to parameter: 1

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - 
    delete 
    from
        bar 
    where
        id=?

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - binding '1' to parameter: 1

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - 
    delete 
    from
        bar 
    where
        id=?

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - binding '2' to parameter: 1

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - 
    delete 
    from
        foo 
    where
        id=?

09/23 11:59:12 [jrpp-37] HIBERNATE DEBUG - binding '1' to parameter: 1

So I'll remove inverse="true" and now I'll try notnull="true" (don't forget to ormReload()):

component persistent="true" table="bar"
{
    property name="id" fieldtype="id" generator="native";

    property name="foo"
        fieldtype="many-to-one"
        cfc="foo"
        column="foo_id"
        notnull="true";
}

Maybe this will tell Hibernate to do what I want? But sadly, when I re-populate the data and re-run my delete, the SQL is the same. It's a stab in the dark, but next I'll try both inverse="true" and notnull="true". Result? The same.

Thinking it might be somewhere else, I also tried cascade="delete" and cascade="all" on the bar property of my foo entity. Still, Hibernate is NULLing the relationship before deleting the records.

I don't like my findings, but so far I'm not coming up with anything that does what I want. If you know what I'm missing, please leave a comment and let me know! Until then I'm at a loss... As far as I can tell, there's no way to make Hibernate simply run something like the following when you delete the parent entity:

delete from bar where foo_id = 1;
delete from foo where id = 1;

... short of using HQL or the SQL interface that Hibernate exposes.

Published 2011-09-23 @ 11:18 in ColdFusion Hibernate

What I HATE about OSX Lion's Mission Control

I consider myself a bit of a power user of my Macs. On an average day I have a dozen or so programs open at once and I like to organize them very particularly. In addition, I have a pretty decent sized external monitor that I use at work to enable me to see more and work on things without task-switching so much. For example, I might have Aqua Data Studio open on one monitor, showing me the DB schema, and CFBuilder open on the other, writing code. Sure, I could still do the same work using a lot of Cmd+Tab to switch back and forth between the two, but that switching comes at a time cost, and a mental context switching cost -- and then what is this giant external monitor good for?

Spaces was awesome. Exposé was awesome. And I get what they're trying to do with Mission Control: combine the two. Less is more. And it could have been awesome, if it weren't for a few big mistakes.

As of Snow Leopard, when you hit your Spaces hotkey (F8), you would see all spaces given the same amount of screen space -- now with Mission Control spaces you weren't already looking at are tiny, making it almost impossible to see what's open on them. Um, hello? I'm here because I want to see what's on other spaces. How about you give them more than 100px?

Additionally, when you would combine Spaces and Exposé (press F8 and then F9), the windows on each space would do the normal Exposé behavior within the screen real estate allocated to that space. What was wrong with this? This was powerful. This was beautiful. This was perfect.

The only conceivable problem they were trying to solve here, in my opinion, was that all spaces were pushed onto the primary screen, and the secondary screen would just be black. Sure, that's some wasted screen space, but it worked well.

So now, with Mission Control, you can drag windows or even a full application-stack (drag the icon, not a window, and it will bring all of that application's windows) to another desktop... BUT only on the same screen. So if I want to move an application from the left screen of space 1 to the right screen of space 7, I have to first move it to the right screen of space 1 and then open mission control and move it to space 7. (Or move spaces then screens, order doesn't matter. The problem is that it can't be done in one swoop.) This sucks. Not only does it add additional work and take away existing functionality, I was so used to the way it used to work that even now, months after Lion's release, I still fumble around a little bit when I want to do this. It still aggravates me to no end.

If I could disable mission control and go back to Spaces + Exposé, I would in a heartbeat.

Published 2011-09-20 @ 12:15 in Apple

Sending mail from your web server in parallel with google mail

I've been suffering for quite some time now with mail issues on the Philly CFUG website. Blog post announcements were being sent but marked as spam, so nobody was aware of meetings, so attendance took a nose dive. Big problem!

I've previously written that when you're using google mail for your domain, it can cause mail sent by your web server to look spammy. One approach to mitigating this problem is to send all outgoing mail through Google's SMTP servers. But of course there are restrictions for free accounts. You're limited to 500 outgoing messages per day, and there is a limit of recipients per message. This could be troublesome if you're sending out a high volume of password recovery mail, and things like that.

Sadly, even with Google SMTP configured for the CFUG blog, for whatever reason, mail wasn't going out. But there is another option...

When an email is received by the destination mail server, it (the server) checks the sender IP address against the domain's SPF record, and the SPF record that google instructs you to add includes their mail servers and marks everything else as "neutral":

v=spf1 include:_spf.google.com ~all

So what's so bad about this? For one, everyone and their grandmother's cat is using Gmail these days, and gmail takes that neutral ruling as "spammy enough for me!" and just marks the message as spam. How do I know that? Because GMail, like most mail servers, adds some debug information to the message headers. Here's the full text of a test message I sent myself using CFMail from my web server, before fixing my SPF record:

Delivered-To: [manager]@phillycfug.org
Received: by 10.220.187.133 with SMTP id cw5cs113587vcb;
        Fri, 16 Sep 2011 20:00:42 -0700 (PDT)
Received: by 10.204.138.72 with SMTP id z8mr51164bkt.367.1316228441542;
        Fri, 16 Sep 2011 20:00:41 -0700 (PDT)
Return-Path: <noreply@phillycfug.org>
Received: from sp5067a ([65.111.169.59])
        by mx.google.com with ESMTP id k2si5767165bke.34.2011.09.16.20.00.40;
        Fri, 16 Sep 2011 20:00:41 -0700 (PDT)
Received-SPF: neutral (google.com: 65.111.169.59 is neither permitted nor denied by best guess record for domain of noreply@phillycfug.org) client-ip=65.111.169.59;
Authentication-Results: mx.google.com; spf=neutral (google.com: 65.111.169.59 is neither permitted nor denied by best guess record for domain of noreply@phillycfug.org) smtp.mail=noreply@phillycfug.org
Received: from sp5067a ([127.0.0.1]) by sp5067a with Microsoft SMTPSVC(6.0.3790.4675);
     Fri, 16 Sep 2011 22:59:20 -0400
Date: Fri, 16 Sep 2011 22:59:20 -0400 (EDT)
From: noreply@phillycfug.org
To: [manager]@phillycfug.org
Message-ID: <183870777.45.1316228360261.JavaMail.SYSTEM@127.0.0.1>
Subject: test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Mailer: ColdFusion 9 Application Server
Return-Path: noreply@phillycfug.org
X-OriginalArrivalTime: 17 Sep 2011 02:59:20.0277 (UTC) FILETIME=[CBD1F850:01CC74E5]

fingers crossed!

Take note of the line starting "Received-SPF:" -- this is added by GMail to document how spammy the message appears. The ultimate designation is "neutral", and then it proceeds to explain why. What we want it to say is "pass". The client-ip listed, 65.111.169.59, is the IP of the server sending the message; my web server in this case.

So how can we fix this? Fix the SPF record. I've looked into this a few times in the past, and it always seemed like black magic to me. Sure, there is lots of documentation available on the internet, but every time I looked at it, it looked worse than the worst man pages. I was never able to grok it.

Then last week I found OpenSPF, and their simple and clean explanation of SPF syntax.

It turns out that the fix is simple. Just change the above SPF record to the following:

v=spf1  a  mx include:_spf.google.com ~all

I've added the string "a mx" to the middle of the record. This indicates that in addition to the _spf.google.com record that allows all of google's (presumably numerous) mail servers, but also include every IP listed as an A record for the domain, and every IP from MX records. You may not need the MX part, but to be honest now that it's working I don't want to change it!

Sending another test message, the Received-SPF header line now reads:

Received-SPF: pass (google.com: domain of [manager]@phillycfug.org designates 65.111.169.59 as permitted sender) client-ip=65.111.169.59;

If my calculations are correct, group attendance is about to come back up. :)

Published 2011-09-19 @ 08:07 in Google Learning

Taking the next step

Something I spend a fair amount of time thinking about is what's on the horizon. What sorts of things should my team and I be learning and implementing to make ourselves better developers and a better team, as a whole?

Part of that involves knowing what we already do, and in some cases, do pretty well.

  • We have a pretty well defined code-review process and we try to integrate as much feedback from them as possible.
  • We are getting better at creating automated tests to make long term application maintenance easier.
  • Some of our applications have ANT scripts that make producing Production builds (combined, minified JS + CSS, etc) easier
  • We're using Distributed Version Control (git) like a champ, now!
  • We pair program, when possible. (It can be difficult to find significant chunks of matching free-time on a small team.)
  • We're starting to expose some of our reusable code as REST API's so we don't have to waste time copying and re-integrating them into every application

So knowing what we do well, what are the logical next steps? What should we learn and do to continue growing and taking those next steps? Here are some ideas and my thoughts on each. These are just some things I came up with off the top of my head. What else do you think could be useful?

Chef

Our colleague Lew G has written at some length about his experience learning and using Chef. It sounds wonderful! It sounds ideal! It sounds like it could be kind of... hard.

For one, our group has only one server that we manage currently, and it runs on Windows. The rumor mill says that the Chef support for windows is not nearly as good as it is for unix/linux. Of course, this is third- or fourth-hand information, and subjective at that; so it bears some investigation.

You may be able to find people who would argue against automated server configuration like this, but I doubt any of them will be from our team. Being small means that we have to think about our agility and plan ahead for the worst case scenarios or continually find ourselves dealing with the fall out from the previous week's issues. Having one or two people spending two or three days recovering from a total server loss would be catastrophic to our team productivity. Learning Chef and writing/testing scripts ("recipes", right?) to rebuild a server quickly and at a moments notice would definitely go a long way in our personal DR plan.

Jenkins

Continuous integration is the holy grail of automated testing, and with Jenkins now so available and approachable, it's never been easier. How nice would it be to make a commit to source control, and then a few minutes later get an IM letting you know whether or not the tests are passing? Pretty darn nice! ... And just the tip of the iceberg, of course.

Document-based data stores / NoSQL

I recently did a one-day workshop for Hadoop and learned a lot about it and its sibling nosql/document databases. The biggest takeaway for me and our team was something the trainer said. I'm paraphrasing from memory but he said something like, "If you're not working with 1TB of data or more, NoSQL is not for you." I don't think the Learning Lab has 1TB of data from all of our applications combined (maybe not even breaking into double-digit gigabytes), so it's not really an obvious win for us.

That doesn't mean there's no place for NoSQL in the Learning Lab, though. Suppose we need to write an application that will need to support 10,000 concurrent users for a short period of time. Given a Chef script, it could be pretty simple to spin up a few dozen Cassandra/Couchdb/etc nodes to scale as wide as needed, and adjust on the fly according to performance. I imagine scaling a system like this would be much more manageable than trying to get a single machine powerful enough to run MS SQL at the same scale on a temporary basis (Not to mention the licensing costs, since MS SQL is licensed based on the resources it has available!). So while we probably won't dive into this topic right away, we should keep it on our periphery.

LESS CSS

CSS is a very powerful, and when used well, terse way to style html documents. But it can be a pain in the rear to use. Over time stylesheets tend to grow to unmanageable size, and it's incredibly difficult to write them DRYly. Enter LESS CSS. To quote a friend of mine, LESS "makes CSS think like programmers do." You can create and use variables and functions, include files, and much more... and it all compiles (almost instantly, from what I hear) to the native CSS we know and ... erm... love (?) today.

This is another post originally written for my team's blog

Published 2011-09-16 @ 02:51 in Best Practices Learning