avatarharuki zaemon

Sometimes crappy tests are better than none at all

By

Some of the arguments against the tsting barrow I push are that you end up with essentially “legacy” tests. Tests that you have to continually maintain. And often tests that are brittle.

I was having a discussion with a colleage the other day and he asked me if I thought it was really necessary to test everything given developers often write crappy and brittle tests anyway. Whats worse is that inexperienced developers often go through unecessary hoops to build the tests and/or production code and possibly end up writing software that is just as bad (or maybe worse?) than if they hadn’t tried to do test first (or at all).

I couldn’t believe it but you know I’d never really thought about it too much. I had just believed my own BS and assumed that, yes, you need tests for as much as you can possibly test.

And you know what?

I still believe it!

Crappy tests almost always (I can’t think of an exception but there invariably is one) imply crappy code. That means that the brittle tests are likely testing brittle code. Now, brittle tests are bad because when I make my seemingly innocuous change, I invariably end up breaking some of these brittle tests when really it shouldn’t have. But then again, if I assume brittle tests means brittle code, the chances are I really have broken the underlying code! And how would I have know that if the tests hadn’t been there in the first place?

So I still stick by the mantra. Test everything. Even if (especially if) it’s crappy and brittle ‘cause you’ll be doing yourself and everyone else on the team a favour by ensuring they know when they’ve broken your code.

I leave developer re-education/career re-assignment as an excercise for the reader. Though I hear a 2x4 comes in real handy. I have the scars to prove it ;-)

Why I prefer constructor-based depency injection

By

On the current project I’m on, and no doubt on many of yours, we have restrictions on the number of parameters you may declare for a method. This, hopefully, forces developers to re-evaluate what they are passing around. For example, use a DateRange instead of dateFrom and dateTo.

Unfortunately there is a simple way to subvert that process by simply declaring everything as “setters” which typically have only one parameter. Then all the lovely detail becomes much harder to see as it’s hidden in the morass of the aptly name mutators (yet another reason I dislike getters and setters).

Declaring service dependencies in constructors allows me to see them all in one go. It immediately becomes apparent if a class depends on “too many”. Something that is much harder to see when you use setters. It also allows us to construct objects in a valid state with all the obvious benefits.

Another advantage to using a constructor is that only the creator need know about the dependencies. In our case it’s our ServiceRegistry. Once an object has been constructed (either by the registry configuration for singleton services defined by interfaces or by calling ServiceRegistry.newInstance(Class) allowing the construction of any class that depends on a service) client code is unaware of the dependency.

I’ve recently been converting large swathes of code on this project from static lookups and setter-based dependency injection (DI) to constructor based and in the process finding stuff that was not previously apparent. These include classes that depend on too-many services, classes that depend on services they just shouldn’t and classes that are quite fragile because callers didn’t realise they had to call the setters in a particular order! None of these was in anyway obvious previously.

The down-side of course is that constructor parameters make inheritence a pain. If you have a deep and/or wide class heirarchy, you will need to declare the dependencies in the constructors of all the sub-classes. Not only is this tedious but it necessarily exposes the dependency to classes that probably don’t directly make use of the service.

My counter to this is simple: don’t have deep and/or wide class heirarchies. I rarely find the need for them and they’re usually a code smell. The fact that we can extend classes to inherit behaviour doesn’t mean we should. No doubt a topic for another rant :-)

As always there are exceptions. Sometimes you just can’t get around the need for setter-based DI, usually when you are constrained by someone elses API. Most noteably in our case is the fact that (for reasons I don’t want to go into urgh!) we’re forced to use versions of third-party libraries (Struts) that want to directly construct our classes with no factory mechanism. This means we have Actions that must have a default constructor.

You may have noticed a distinct lack of the ‘T’ word. I deliberately stayed away from describing how all this makes classes more testable and is a natural consequence of doing TDD anyway because if you’re doing TDD you already know this and if you don’t like TDD (really?) I didn’t want to put you off the idea by implying this was all about testing, which it’s not.

Collections gotchas #472

By

I found an interesting (to me anyway) implementation detail in Suns JDK LinkedList and HashSet code.

Compare these two. The first is deep in the bowels of LinkedList.indexOf(Object) which is used by the implementation of contains(Object):

if (o.equals(e.element))
    return index;

The next is from HashMap.containsKey(Object) which is used by HashSet to implement contains(Object):

return x == y || x.equals(y);

Notice anything interesting? No?

Well I know it’s not much but HashSet which one would think probably only honours the equals(Object)code+hashCode()contract, also checks for object identity. On the other hand,LinkedList, which I would have thought probably only checks for object identity, actually only honours the equals(Object)code+hashCode() contract.

Why is this a problem? Well I found a class (in a 3rd-party library) that never returned true for equals(Object). I only found it because I was writing a test and wondered why when I added instances of the class to a LinkedList, contains(Object) always returned false even though when adding them to a HashSet, contains(object) always returned true.

I had expected List implementations to at least check object identity. Unfortunately it behaves exactly as the documented, says so I’ll just have to deal with the real problem and shoot the developer who wrote the broken code instead. But saying so upfront would have spoiled a good rant :-)

And the moral of the story is? Tests are good; Assumptions are bad; Test your assumptions; and; Sometimes the documentation is worth reading but usually only after the fact when all the subtleties hidden therein become apparent.

Extreme Smart Arses

By

Every two weeks we have our iteration close during which we hold a mini retrospective. We break our observations into four

  • What Worked Well?;
  • What Went Wrong?;
  • What Puzzles Us?; and;
  • What to Do Differently Next Time?

Mike Melia made an observation that, when one works too late, they are prone to making stupid mistakes. Therefore, it’s often better to go home, get a good nights sleep and tackle the problem the next day. Sage advice. Golf-claps all ‘round.

Of course being the bunch of smart-arse, lazy, meeting loathing developers that we are, we decided that it was such a good point we could probably generalise it a bit and save ourselves a lot of time for the next and subsequent iteration close meetings:

  • What Went Wrong? People made mistakes;
  • What Puzzles Us? Why do people make mistakes?;
  • What to Do Differently Next Time? Don’t make mistakes;

And of course the one we all hope for:

  • What Worked Well? People made fewer mistakes.

I think everyone will agree though that David (get a blog soon you git!) Pattinson does do a pretty good iteration close/kick-off especially as he is continually interrupted by people throwing stress balls around and knocking cans of coke off tables. Sheesh. Who would do such a thing I ask you? :-P

JRules - The story so far...

By

I wrote previously on my initial investigation into JRules. Having been using and playing with it in a production code environment for a bit now I thought I’d follow up with some observations thus far. Some of these no doubt relate to other rule engines such as Drools but as I’ve never used them in the real world, so to speak, I’ll have to wait to hear back on that.

One of the biggest hurdles we had to face was the so called Business Object Model (BOM). This is a mapping from your domain model to user-friendly “natural language” expressions. Our first cut on this was to load up the model in the rule editor and start mapping away. This soon became tiresome, error prone and very, very brittle. Even slight changes in the heirarchy meant wiping out rules and starting all over again.

Next we tried generating the BOM directly from our domain model but that too smelled of maintenence nightmares. So instead, we simplified the whole process by effectively flattening the heirarchy. We represent as many facts as we can at the highest possible level by “exploding” the domain model (for rule execution only). We also assert services such as navigation state, current time, calculation engines, etc. in the same way.

At first this seems odd, especially to a hardened OO developer. But after some time it soon becomes apparent that it makes a lot of sense. To start with, it’s much easier to do the mapping. We no longer need to worry about the relationships between classes and can focus on mapping the properties of individual classes. This makes our mapping much less susceptible to change. It also becomes much easier for the end users because they tend to think of pieces of information (facts) with less regard to relationships than we do. As in The customers address as opposed to the address of the customer of the order.

One huge bonus is that we no longer have to spend large amounts of time making the “nice” natural language mappings actually understandable. By that I mean tweaking the “translation” so that it reads in plain english. Because there is far less navigation going on, we decided we can probably not even worry about the language mapping until later, making it much faster for developers to work.

As an aside, I find it incredibly annoying that JRules doesn’t seem to allow you to reference constants (public static final fields). Instead we are forced to create virtual methods for each of them with a translation back to the constant. DUH! Please someone tell me it aint so.

Don’t allow the rules to hit the database if at all possible. Not something we even entertained but another project in the same building is doing this and what do you know, it performs like crap. Relatively speaking that is. We’re using Hibernate so the intention is to have all our reference data modelled as objects, cached and asserted just like all the other facts.

The cool thing is that all this is kinda like good old IoC/Dependency Injection/Whatever it’s called this week. Nomenclature aside, you tell the rules everything they need to know. They don’t call out to services by calling a ServiceLocator. They don’t call static methods, etc. Instead they simply declare they need a particular service or are interested in a particular fact, and the rule engine does the plugging. Neat, and for all the same reasons that IoC is good for “normal” java code.

IlrRuleset and IlrContext are relatively expensive to create so we are using a very simple pooling mechanism to manage them. Don’t forget to reset the context when it’s returned to the pool or it’ll hold references to objects that you probably no longer care about.

Another thing we found was that it’s best that the application not worry too much about what rules may/may not be applicable. Similary, don’t try and put too much “behaviour” into the rules. It’s tempting to want to tell the rules engine (either explicitly or via some facts) that certain rules don’t apply. The problem here is that we start to embed knowledge of our application into the rules and vice-versa.

Let the rules do their job. Namely, assert and retract facts based on certain conditions being met (or not as the case may be). The application is then responsible for performing some kind of action based on the state of the facts after firing the rules. This is probably somewhat controversial as it would be totally possible to implement an entire system in rules. (Actually, not only would it be possible but it would be pretty cool too). However in our case, and no doubt in yours, the rules are not application specific. In our application they represent company-wide knowledge about the way they do business. In fact the longer term plan is to publish these rules as an enterprise accessible service.

And finally, the non-deterministic execution of methods on your domain model during rule evaluation. By this I mean you can’t guarantee that a given method will be invoked once or many times (or at all?) and especially not in what order. This isn’t usually a cause for concern as you will probably be calling simple getters but in the few instances that isn’t the case beware. Even something as innocuous as obtaining the current time can be an issue. More often than not, you want to treat the rules as a batch and to be evaluated and fired against a fixed point in time. So for example, we assert a [Clock](https://harukizaemon.com/posts/2003/12/07/when-is-a-clock-not-a-clock/) implementation that returns the same value whenever (ie. no matter how many times) getCurrentTime is called.

Noise Reduction

By

An amusing topic coming from one such as myself that can quite clearly never shut up!

I’ve been bitten a few times this week by a low signal/noise (or a high noise/signal which ever you prefer) ratio. The first was earlier in the week when I was supposed to give a talk (more noise) and demo at the monthly MXPEG meeting.

Like most geeks, I’m continually tweaking my system. And as most geeks know, tweaking is really a euphemism for breaking something. I’m constantly updating the patches on my system and installing the latest version of this and that. Each time I perform the awesome Gentoo emerge -uD world it spews forth volumes of logging. So much logging that it’s not only hard to see if there are any messages worth reading but they fly past so quickly I probably couldn’t read them even if I tried. So after a very short time one starts to ignore pretty much all of them. Hey if it doesn’t complain at the end then it’s probably ok. Right?

So there I am at the meeting trying to get my machine to connect to the network and failing dismally. I gave up and vowed to track it down when I got home. Eventually I discovered that there were some config files that needed updating and the messages were there for all the world to see saying just that. But because I had just ignored them all I never noticed.

Similarly, we had a bunch of checks on our code base that were flagged as warnings not errors. At some point, that list of warnings got so big we all just started to ignore them. Then one-day I accidentally checked in a copy of the build file that stopped breaking the build on errors (over zelaous find and replace) . Nobody noticed. Why? Because everyone had just grown accustomed to ignoring the output because it contained so much noise. The answer was of course to either remove the warnings as we clearly didn’t care about them, or turn them into errors so we train people to look at all messages.

James and I worked on a project where there was so much logging going on, nobody noticed all the stack traces and NPEs etc. being spewed into the server logs. In fact there were so many errors that they turned off the “page me if there is an error” handling. This of course just made the problem worse.

JavaDoc comments are another pet hate of mine. We did a refactor the other day that involved a rename of a class and the subsequent getters and setters on the domain object and process beans, etc. We ended up spending so much time updating the comments from “Sets the value of the product details” to “Sets the value of the vehicle details” that I wanted to murder someone. JavaDoc comments convey little or no more than the method signature should convey. I emphasise SHOULD for good reason. Sure, there are some special cases such as “will return the default if no explicit value has been set” for example but if you feel you need to explain what getCustomer does, then throw a GetAGripException or find a new profession.

All this stuff is noise. The problem really is that you have been allowed or have chosen to ignore it. You can’t afford to. Don’t just ignore it or it will come back to bite you later on. Of course, just saying “well we ignore it so maybe we can just stop doing it” isn’t the answer either. Maybe it’s there for good reason. Find out why it’s there and stop needing to ignore it before it starts to take it’s toll on yours and the rest of the teams poductivity.

P.S. Can anyone tell I’m back on a project again? LOL

Meatballs and perls

By

I thought this article was quite amusing, especially the comparison between dining at the JavaHut versus the Perl Cafe.

So allow me to give another perspective:

…Dusting yourself off, you walk across the street to the Perl cafe. The person at the door asks what kind of seat you want, what you want to eat, and how you want to pay. They sit you at your seat, bring over your food, collect the money, and leave you to eat in peace. You look down and realise that what they brought you wasn’t what you ordered. In fact looking around, it seems everyone is eating the same dish. It’s then that you realise no matter what you order at the Perl Cafe, all you ever get is spaghetti.

I know, I know but I just couldn’t resist ;-)

Sometimes the problem does just go away

By

I’ve often joked that procrastination is the mother of invention. If I put something off for long enough, it’ll either cease to be a problem or a more elegant solution will come to me. Certainly this doesn’t always hold true nor is it always that simple but it does work out that way often enough for me to at least continue believing my own rhetoric :-)

The idea is to continue pushing the implementation of a solution further and further away. By that I mean, instead of thinking about the nitty gritty detail, start off by thinking in terms of the simplest interface possible and code to that. Imagine that it actually worked even if you’ve hard `d some behaviour.

It’s nothing new. It’s what good software developers do all the time (one hopes). It used to be called top down design. These days it’s what TDD exponents typically do.

The important thing for me is that it IS important to sit around and think through the problem as fully as possible. Think about all the possibilities. What could we do. What would happen if we did option a versus option b? What might the customer want to do?

Yes that’s right, I’m suggesting actually solving a possibly imaginary problem. Shreaks from the crowd. The BIG difference is I’m not actually going to implement any of this. All I’m trying to do is validate that my assumptions, my abstraction, my simplification of the problem, my procrastination on implementation, my (some would say blind) adherence to YAGNI isn’t going to come back and bite me if at all possible. And more to the point, trick those 95% of unused brain cells into doing more than just suck nutrients from my blood stream.

Again I stress I’m not going to do anything about it. I’m going to defer implementing anything until as late as possible. Hopefully by then my brain has been feverishly working away in the background mulling over all the things we came up with in the brain storming session (as distinct from design session) and magically come up with an elegant solution.

Sometimes it may be necessary to break the problem into smaller chunks. Flip the question around. Rephrase it. Try and push it into a different mould. See if all your assumptions still hold. Playing simple semantic games can help make a problem seem more familiar. Try calling it a Displayable instead of a Page. Maybe (and this is where the domain modelling dudes possibly get upset) Finance really is a product just like MotorVehicleInsurance even though the customer doesn’t see it that way because it’s not core business.

And you know what? More often than not, the problem just dissapears. We’ve abstracted the problem such that it no longer seems like one of many disparate, unrelated problems. It starts to conform to some other problems we’ve seen or implemented. Maybe we can tweak an existing implementation to support the extra behaviour.

To me, that’s what problem solving is all about. It’s about thinking through the problem sufficiently enough that it no longer seems like an isolated case. Some of the worst code I’ve ever seen was obviously written as a mass of solutions to completely (perceived) isolated problems.

Let me rephrase - what is a test anyway?

By

A procedure for critical evaluation; a means of determining the presence, quality, or truth of something.

My original post was (supposed to be) asking the question, when is it justifiable NOT to have a test (either new or modified) AND in particular what consititutes a test in the first place. It seems to me that many developers see refactoring as an excercise that somehow doesn’t need tests. Also, the term test has become synonymous with JUnit and this seems to me somewhat short-sighted.

Unfortunately though, I think I failed to communicate this clearly. At no time did I suggest that tools would take over the world and do everything for me. In fact quite the opposite. It’s especially the non-obvious forms of anything for which I want someone to write a test precicely BECAUSE I can’t get a tool to check for it automatically.

Andy Marks suggested that maybe rename is one refactoring that doesn’t require a test. I’m sure there are many others - introduce constant, inline local variable, etc. to name but a few - and though I probably agree that these are special cases on a simple gut feel level, I’m not convinced there has been a rigorous argument put forward.

Because, fundamentally, I don’t believe you when you say it’s an innocuous change :-) Hell I don’t trust myself. And I think you’ll find that the more experienced a developer becomes, the more sceptical he/she is about saying something like “oh yeah that’s trivial.”

It’s like speeding in a car. You wouldn’t speed if you thought you were going to kill someone right? You drive fast because you think you can handle it. And the speed you think you can safely handle is probably lower than my 18 year old cousin even though you can probably safely drive faster because you have more exprience.

And therein lies the problem. Letting people arbitrarily decide when it’s appropriate is dangerous especially on a team of inexperienced developers who by definition don’t know what they don’t know. But maybe just as much on a team of experienced developers because they believe they know everything ;-P

In light of that I think it’s fair to say I want you (and myself) to have to “justify” the code change with a breaking test of some kind (preferably automated) if at all possible.

Take the example of double-checked-locking. If I see it, I want to remove it because it’s a broken pattern. But I don’t want to have to write a test for it cause that’s pretty involved (like writing thread-safety checks!) So instead we have a machine identify them and on the basis that we all agree it’s a bad thing AND the test for it (ie checkstyle) barfs, we are all happy to change the code.

So, can we consider this to be a test? If so, then I’m satisfied that I have a broken test and on that basis I am justified in making a modification to the code. After the change, the test no longer breaks and as long as no other tests broke as a consequence, we’re satisfied.

On the other hand, if we don’t consider it a test, where is your breaking test to change the code? And are we satisfied that we don’t need one. Again, I don’t believe you when you say “it just needs to be changed and it’s only a comment so it won’t break a thing.”

From my perspective, the code base is a picasso - if you think you need to change something, you’d better have a damn good reason to do so. And just saying you don’t like impressionist paintings isn’t one of them :-)

Being forced to justify a change forces you to think rigourously about the underlying problem. Irrespective of delivering functionality, at it’s core, that’s why we believe TDD produces good quality code. And a bonus is that when you pair with a less senior person, you’ll be forced to explain to them what you’re doing. In fact as a junior developer, I won’t let the senior partner in the pair get away with just telling me it’ll be ok. I force them to stop and explain what it is they’re doing AND make them write a test (first)!

And lastly, adding a test helps to ensure the issue will never return to the checked-in code base. Without the failing test, someone may decide to just change it back when they see a seemingly unecessary change or maybe even because they don’t understand the change. At least with a test, it will force them to (re)consider.

More Business Rules Goodness

By

I’ve always loved the idea of rule-based applications but never really had the opportunity to build one. And I have to say I’m having a lot of fun using a rules engine on this project. Since we dumped the BAL in favour of the IRL in JRules, productivity has sky-rockected. FWIW, TDD and rules-engines are a perfect match (a fact I’ll blog more about in the coming days). I’m in geek heaven! About the only thing I wish I had now was an IntelliJ plugin (like the AspectJ one).

So anyway, my intention is that step-by-step, I’ll try and document my progress starting with a little of what I discovered today by way of a rather contrived example:

rule MessageIsGeneratedOnSignificantStockMarketIndex {when {?index: Index();evaluate(index.value > 3000);} then {assert Message("Today the stock market rose above the psychological 3000 barrier");}}

This example says that whenever the stockmarket index rises above 3000, we assert that it was a significant event. (Actually JRules has some other nifty stuff to do with associating timestamps with events but I’ll blog about that another time.) The important thing to notice is the assert keyword. This asserts a new fact into the “knowledge base”. This new fact will remain “forever” or at least until another rule retracts it.

Simple assertions such as this are great when you know that the asserted fact will always be true independent of the triggering condition. In the example, it will always be true that at some point in time, the stock market index rose to a significant level, even if the index drops again.

But what if we have a fact that only holds while the condition holds? In such a case, we’d need a “compensating” rule to retract the fact when the condition changed. This could get quite ugly. Thankfully, JRules provides a neat solution:

rule SellOrderRaisedWhenStockValueReachesMinimum {when {?stock: Stock();evaluate(?stock.value >= 30);} then {assert **logical** SellOrder(?stock);}}

This rule says that we will place a sell order for any stock that rises to 30 dollars. The key difference here is the use of the logical keyword. This tells JRules that the assertion onlyholds while the triggering condition holds. That is, while the stock value is at least 30 dollars, the sell order remains. However, if the stock value drops below 30 dollars, JRules will automatically retract the fact for us. What’s even better is that if the assertion of the SellOrder causes other rules to fire and therefore assert more facts, all those that were declared as logical will all be retracted as well. How cool is that?!

In our application, probably >99% of all rules will use the logical form of assert. This allows quite complex interactions between essentially independent rules.

If you find yourself having to structure your rules with priorities and worrying too much about the interactions between rules, it’s likely your individual rules are doing too much. Ensure your rules should be as atomic as possible. Seperate “inference” rules from “do stuff” rules". And don’t be tempted to simply change the state (ie property values) of existing facts. Instead, always assert new facts (as we have done in the examples above).

I thought I’d also show you the same rules using JESS.

(defrule message-is-generated-on-significant-stock-market-index(index (value ?value))(test (> ?value 3000))=>(assert (message (text "Today the stock market rose above the psychological 3000 barrier"))))(defrule sell-order-raised-when-stock-value-reaches-minimum?stock <- (**logical** (stock (value ?value)))(test (>= ?value 30))=>(assert (sell-order (stock ?stock))))

You’ll note that in JESS, the logical keyword is associated with the condition (or LHS) rather than the action (or RHS).

In many ways JESS provides a richer environment than JRules but I admit the syntax is less obvious to novice users.

Tests and Refactoring

By

TDD means no writing production code unless there is a breaking test right? Most practitioners agree on this. However the topic of TDD in the context of refactoring came up in a conversation last night. One of my colleagues challenged that because refactoring wasn’t changing the behaviour just the implementation, it can largely be done without a breaking test.

I’m not convinced that is the case. If I think about why we refactor there really is only one possibility - something is wrong with the code. Otherwise, why would we bother. It seems to me this “wrongness” then breaks down into three main causes:

  • Missing or superfluous functionality;
  • Deprecated implementation; and;
  • Smelly code.

The first case is easy to justify writing tests for - clearly, any change in functionality necessitates a test.

The second is more subtle but again, clearly there was some reason for changing the implementation (such as persisting to a database versus in memory) in which case we would need to add some kind of test(s) to verify this.

The third case is a little harder. On what basis do I determine that the code smells and therefore needs a cleanup? In the specific case from last night, Dave had renamed an abstract class to AbstractXXX and extracted an interface. But he had written no extra tests. Admittedly, I usually write a test to ensure that my class implements the necessary interface(s) so he could have done that. However the reason he made the change in the first place was because a checkstyle check was barfing. Why was it barfing? Because it had detected a likely code smell. To my mind, this is a breaking test. It’s a test for code quality as opposed to functionality.

Unfortunately, not all code smells can be picked up by machine (though I’m trying very hard lol). But I thought it was interesting to challenge my own notions of what constitutes a test and, getting back to a previous entry (as is my habit), what kind of test is appropriate in various circumstances.

I also want to see if I can come up with many cases where it is justifiable to modify code without some kind of test to prove that the change is in fact necessary. If I can’t prove that a change is required (and what better way than with a breaking test) why would I make it?

Build Speed

By

I’ve inherited the poorest performing machine on the team. Only fitting I suppose as I’m probably the poorest performing member (it’s a pretty good team!) But the main problem I have is that it takes so long (between 10 and 17 minutes depending on how long it’s been since I rebooted windows) to do a pre-checkin build (to make sure I haven’t screwed anything up) that by the time it finishes, I really need to to another CVS update. This invariably brings back a number of changes that good old paranoid me thinks necessitates another local build. This vicious cycle continues until I give up and just check in my code.

Of course I then need to wait until the integration build on the cruise machine completes before I can be happy I’ve not broken anything. Now the chances are that there is already a build in progress and, on average, about half-way through. The integration build itself takes about 20 minutes all up (and that’s on a fast machine). So, it’s now about an hour by the time I’ve got anything checked in and I’m happy I can go home.

This morning, Andy Trigg removed the generation of the test reports from the local machine build as this was taking around 2-3 minutes on my machine and besides, it’s really only needed if the tests break right?

But that didn’t really help all that much and I finally cracked it and decided to take a look at what was causing the build to take so long. How long can it take to compile and run 1200 unit tests and around 250 functional tests? I mean sheesh! A lot of time and efforts has gone into ensuring that the unit tests are as close to “real” unit tests as possible and not just functional tests in disguise so they should be blindingly fast to run.

Closer examination of the build file revealed that the tests were being forked. This was because there is some bug in the build process that causes OutOfMemoryErrors. This was clearly going to slow things down but not much I can do about it as far as quick wins goes.

The next thing I noticed was that version 1.6 of ant has a new option on the JUnit task (reloading) that allows you to confugure if a new class loader will be used for running each test class. I quickly set this to false!

And lastly, I discovered that if I ran all the unit tests in IntelliJ they took around 30 seconds but run from the Ant script they take around 3 minutes!

Then I remembered playing around with test suites some time ago. I vagualy recalled that putting tests into a suite ran orders of magnitude faster than if you ran them one at a time. A few minutes using good old of find, sed, grep, etc. and I created a class named Suite that looked something like this:

public class Suite extends TestCase {
    public static Test suite() {
        TestSuite suite = new TestSuite();
        suite.addTestSuite(BlahBlahTest.class);
        suite.addTestSuite(FooBarTest.class);
        ...
        return suite;
    }
}

Then I changed the build script to run my suite instead of each individual file, making sure that I forked it just in case.

Well blow me down. The pre-checkin-build now runs in 3 1/2 minutes! That’s a full clean and rebuild plus unit and acceptances tests. Whats more amazing is the actual unit tests now take an amazing 9.4 seconds to run. A clean unit test build now only takes 1 minute and an incremental build a mere 30 seconds. Woohooo! I can’t wait to see how the cruise box performs now as it was already at least twice as fast as my machine.

The only real issue was then the fact that we would have to maintain the suite to ensure we didn’t leave out any tests. Stuff that I thought so I quickly whipped up an ant task to generate it for me and away we went. I also suggested we could remove the need to compile the suite by generating the byte-code directly to which James just frowned dissaprovingly and accused me of being “such a geek!” Maybe he’s right? hehehe.

Next week I’ll see if I can get similar improvements in the acceptance and regression tests.

Oh yeah the test report generation now takes a whole 1.5 seconds to run so it might make it’s way back into the local build again but I doubt it. Every second counts! :-)

More Broken Tests

By

Following on from my entry yesterday on broken tests I came across another interesting problem I see from time to time in peoples test code.

Here’s a very simple example to demonstrate. As far as I’m concerned it’s a big fat broken test:

public class FileBuilderTest extends TestCase {public void testHeaderIsWritten() throws Exception {StringWriter writer = new StringWriter();FileBuilder builder = new FileBuilder(writer);BufferedReader reader = new BufferedReader(new StringReader(writer.toString()));String line = reader.readLine();assertNotNull("too few lines", line);assertEquals(FileBuilder.HEADER, line);...}}&nbsp;public class FileBuilder {public static final String HEADER = "# Created by " + FileBuilder.class.getName();private final Writer _writer;public FileBuilder(Writer writer) throws IOException {if (writer == null) {throw new IllegalArgumentException("writer can't be null");}_writer = writer;writer.write(HEADER);}}&nbsp;

Seems pretty sensible right? We check to make sure that the header gets written to the stream. Putting aside the are tests for testing or design debate for a moment, one thing remains clear - a test should break if I change the class in anyway that breaks the intended behaviour.

Right?

If so, then this test is truly crippled. Don’t believe me? Well just change the value of header to some absolute garbage and re-run the test.

Did it break?

No. Why? Because the test never checks that what gets written is what is supposed to be written. Instead, it is simply checking that whatever header the FileBuilder has defined is what gets written. So anyone can inadvertently (or deliberately as we have done) change that header and the test never breaks.

I’d rather see this:

public class FileBuilderTest extends TestCase {public void testHeaderIsWritten() throws Exception {StringWriter writer = new StringWriter();FileBuilder builder = new FileBuilder(writer);BufferedReader reader = new BufferedReader(new StringReader(writer.toString()));String line = reader.readLine();assertNotNull("too few lines", line);assertEquals("# Created by " + FileBuilder.class.getName(), line);...}}&nbsp;public class FileBuilder {private static final String HEADER = "# Created by " + FileBuilder.class.getName();private final Writer _writer;public FileBuilder(Writer writer) throws IOException {if (writer == null) {throw new IllegalArgumentException("writer can't be null");}_writer = writer;writer.write(HEADER);}}&nbsp;

I recall one of the things I found disturbing when reading (one of) Mr. Becks TDD book being his suggestion that our first example is in fact good practice because we are removing duplication. Personally, I don’t care about duplication in tests nearly (if at all) as much as I do in production code.

An interesting variation I heard about involved a developer essentially coding the same algorithm in both the production class and the test so they could verify the results.

At any rate, hopefully you can see the similarity in both cases - the tests weren’t explicit about what the expected results should be instead using a “calculated” value. I’d much rather see the test explicitly state exactly what it expected to be written. That way if someone accidentally does say a find+replace within our FileBuilder code, we will catch it.

Broken Tests

By

As I mentioned, I’ve been doing some code cleaning up as I add new functionality to the system I’m currently working on. I have to agree with Andy Trigg and say the delete refactor is one of my favourites. It’s gratifying to replace great swathes of code with something much simpler and yet more functional. The best is when I refactor some code that allows me to delete a test. This is usually because responsibility for some behaviour has been smushed over a number of classes which leads to duplication among several test classes.

If it weren’t for all those failing tests, I’m sure I would have screwed up pretty early on. As I was fixing tests here and there, I noticed something interesting. Some of the tests were breaking due to assertion failures. Just what you would hope for. However, rather more were failing due to exceptions. Usually a NullPointerException. What’s worse is the NPEs were in the test case code, not code the test was calling.

Tests should assert expected behaviour. If you expect a return value not to be null, be explicit about it. If you expect an iterator to have more values, assert it. Don’t wait for the underlying code to blow up in your face unless you’re actually testing for that behaviour of course.

I also like to keep tests as small, simple and obvious as possible. If a test case checks for a limited set of conditions, its much easier to work out where to fix the code when it breaks. Heavily factored tests may appear small, but if you were to inline all the test code, you’d find that it’s definitely not simple nor obvious.

What I’ve found most interesting is that the worst tests are generally the ones written after the fact. The “oh yeah I’m gonna need to test this” kind. They stand out like the proverbial k9’s gonads. They’re usually long (the tests not the testes), difficult to follow, test more than just a few simple concepts and tend to be very brittle. That last point seems counter intuitive at first but is definitely backed up by experience.

So what’s the point? Tests aren’t just for keeping jCoverage happy! In fact crappy tests are a liability. They will end up being a maintence nightmare if left unchecked (pardon the pun).

Code Normalisation

By

I’ve been doing some incidental code cleaning as I implement functionaility for the system I’m currently working on. It’s something I really enjoy. Code quality is somewhat of an obsession. I often wonder if it’s actually worth it. But I dislike leaving “ugly” code lying around so I fix it anyway. I’m sure some of the developers wished I’d just shut up. Some of them have even taken to not letting me actually look at the code when answering a problem for fear that I’ll get distracted :-)

Last year, James Ross and I wrote some 20+ custom checkstyle checks (most have made it into the core distribution). They were brought about by a project we were both working on at the time. We would manually identify code smells and then set about developing some heuristics for automagically finding all of them.

I’ve been applying most of those same checks to this project and, I’m happy to say, although they turned up some “violations”, for the most part the code base is pretty clean. I still come up with more checks I’d like to implement when I decide to make the time and last night I made the time and wrote a very simple check for “missing abstractions” or more properly, opportunities to normalise.

There are clearly many ways we determine that there are missing abstractions but there is one thing I’ve noticed on many projects: code containing common prefixes and/or suffixes is often a good candidate for extracting a class. Some very simple yet recurring examples might be date ranges (dateFrom and dateTo) and addresses (addressStreet, addressCity, `addressPost``), etc. I’m sure you can think of other less obvious examples.

The check simply identifies clusters of instance variables or parameters that share a common prefix and/or suffix. When I ran it against the code base this morning I was amazed to find hundreds of examples! Probably 1/3rd were in constants: MINIMUM_AGE, MAXIMUM_AGE, etc. with the remainder largely instance variables and a small number of parameters.

After trawling through the results, I’d say I had probably found a dozen or more absolute, no-bones-about-it cases for refactoring which I thought was a pretty good result. What concerned me was there seemed to be a lot of noise. But then again, was it really noise? Would it not be better to have minimum and maximum values actually represented as a Range? If so, how far do you perform this kind of normalisation? As James pointed out, humans really do have a left and a right arm. Do we really need to make an Arms class with getLeft() and getRight()? Probably not. It could be argued though that constants sharing the same prefix/suffix such as _ACTION_NAME or DISPATCH_ are really just introducing an implicit namespace that might be better moved to a separate, explicit, enumerated type?

I’d be interested to know what you think and if you’d be interested in running the check against your own code to generate some feedback. To do so, download the jar file and add it to your build class path. Then, to run the check, simply add the following code fragment to your checkstyle configuration as a child of TreeWalker:

<module name="au.com.redhillconsulting.jamaica.tools.checkstyle.MissingAbstractionCheck"/>

You’ll also find some more checks (all in the same package) that have yet to make it into the checkstyle distribution:

  • ClassDataAbstractionCouplingCheck
  • ClassFanOutComplexityCheck
  • DuplicateLiteralCheck
  • FinalFieldCheck
  • GuardingIfCheck
  • NPathComplexityCheck
  • NullPointerCheck
  • ReturnFromCatchCheck
  • ReturnFromFinallyCheck

Naming Tests

By

Your business analysts have no doubt spent days coming up with test cases for an interest calculation. Writing it all out in nice, simple, plain english so you can understand what you need to implement. They’ve probably produced half-a-dozen scenarios for you to test against. Don’t go and spoil it all by naming your method testInterestCalculation. I want to see testInterestCalculationWithFixedRateFor24Months or something similar.

I know there will be those who say that it’s redundant information to encode into the test name. I mean just look at the code right? Phooey! Ever written a method named remove that actually added something instead? If the answer is yes, go directly to jail. Do not pass go. Do not collection a hundred dollars! If, like most sane people, the answer is no, that’s because we name methods to indicate intention. We then look at the code to see how it’s been implemented.

Ever spent what seemed like hours trying to come up with a good name for your class? Why bother? Again, because it’s so important to communicate not only to others but also to yourself the role of the class, it’s purpose for being. Without an appropriate name, it’s often hard to know what the classes responsibilities should be.

If I go onto a project and need to get an idea of what the system does, the first place I look is the test suite. It should tell me almost everything I need to know about the implementation.

So do me a favour and spend a bit of time naming your test cases something meaningful. Please! I beg you. It’ll stop me wasting your time asking lots of stupid questions ;-)

Software Ghettos

By

I was sticking my nose into a code review James Ross was performing the other day. Among other things, naturally, he was checking for code duplication. Now my informal analysis of open source projects and commercial projects I have been brought onto suggests that, a figure of anywhere between 10% and 20% code duplication is pretty standard. As horrifying as that may seem (to me at least), it was rather amusing to think that the final report might include words to the effect that, “your software falls within industry norms with respect to duplication of source code.”

This lead me to question what is acceptable. Why is it that one developers clean code is anothers quagmire? James suggested that over and above the obvious differences in experience and “smarts”, it may come down to how much “smell” we can tolerate. He likened it to cleaning the house - Some people (ok me!) can leave dishes in the kitchen for days while others (like James) apparently can’t go to bed if there’s even one dirty dish lying around.

James also recalled an article he had read on Broken window syndrome and we wondered if the same thing applied to software via accumulated technical debt? That maybe as the density of code smell increases, there is some point beyond which there is no escaping. A point beyond which your team subconciously stops caring anymore because “well the rest of the code is crap so I’ll just add this hack in here.”

Maybe that’s why Windows™ is so very broken? ;-)

I recall my mum being infuriated that as kids, we would simply walk past the rubbish or toys on the floor as if they weren’t there. She couldn’t understand why it never ocurred to us to pick up the Lego and put it away. Even now I find myself noticing a piece of paper on the floor and forcing myself not to just leave it.

So while your team might start to understand the value of design patterns, TDD, whatever, it’s really important to get them to be more sensitive to identifying and eliminating rubbish as they walk through the code and not just leave it there for mum to pick up.

JRules - A brief overview

By

As requested, my brief overview of JRules. I’m sure I will have failed to answer many questions so feel free to ask.If I know the answer, I’ll add it to the entry, othwerwise I’ll either cop out and say I don’t know or go and find out.As always, if I’ve made a mistake or three, feel free to point them out.

First off, I found JRules very easy to use. It comes with very good documentation. As I mentioned in aprevious entry, foremost in my mind was to prove testability. I can safely say that rules can be fullyunit tested. So there are no excuses! ;-)

Like many rule engines, JRules usess a Rete network. The cool thing about the Rete algorithm is that rules with similar conditions (or subsets of conditions) are identified and evaluated together meaning that theconditions need only be tested the minimum number of times. This differs from java `d rules where typically everycondition in every rule must be evaluated regardless of whether we might already have tested for it in previous rule.

The centre (no, this is not a spelling mistake) of the programmatic world in JRules is the rule engine (IlrContext) also known as a context. Thecontext holds a set (IlrRuleset) of rules (IlrRule). The context also holds facts. Facts are simplejava objects and, as the name implies, are the things we know about the state of the processing “world”. As you add(assert) facts about the state of your “world”, the rule engine updates the agenda. At any given point,the agenda holds the rules that have conditions matching the current state and are therefore ready for firing.

Once a rule has fired, it is removed from the agenda and will not be made available for firing unless it, or anotherrule, modify the current state such that the rule conditions would once again be satisfied. This is pretty neat as itmeans you can have rules assert facts that cause other rules to fire.

Rules can be assigned a priority to assist in determining the firing order. The specific ordering rules are clearlydocumented so I wont repeat them here.

Rules can be grouped into “packets”. Packets can then be referenced (and executed) by name. This can be very usefulif you have certain classes of rules that you know you will/wont need during processing and therefore wish toprogrammaticaly enable/disable at runtime. JRules 4.5 also provides a RuleFlow mechanism which is supposedlythe “preferred” method of achieving the same thing. I must admit that in our case, ruleflows didn’t seem appropriate so we have so far stuck with packets.

There is one more facility for filtering rules from the agenda. These firing filters are applied to rules whosconditions have already been evaluated and are in the agenda. This differs from packets and ruleflows which selectivelyadd/remove rules prior to evaluation.

Rulesets support in, out and inout variables which are then available to all rules withinthe ruleset. This is one mechanism you could use to parameterise your rules.

JRules out-of-the-box provides three languages in which rules can be written:

  • IRL - ILog Rule Language;
  • TRL - Technical Rule Language; and;
  • BAL - Business Action (Natural) Language.

I started off using IRL as it was the language given in the documentation examples. It’s a fairly low-level language.I think from a geeks perspective it’s the one to start with as it gives you a good understanding of what actually goeson. IRL can be written using the text editor of your choice. A simple example of IRL might look like:

when {?order: Order(product == Product.CHAMPAGNE);Customer(state == State.VIC; status == CustomerStatus.GOLD) from ?order;} then {?order.applyDiscountPercentage(15.00);System.out.println("Discount applied!");}

This essentially says, whenever an order for champagne is raised by gold customers in victoria, apply a 15% discountand print a message to that effect.

Each condition in the when clause is evaluated in the order declared. The conditions are AND’ed together.If the condition(s) are met, any statements in the then clause are executed when the rule is fired. Forcompleteness, there is also an else clause.

As you can probably see, the IRL is a Java-like language supporting, as far as I can tell, almost all usual javasyntax (including imports) plus the additional syntax required for describing rule conditions.

TRL is pretty much the same as IRL but is more verbose. For example:

when there exists an Order(product == Product.CHAMPAGNE)...

Strictly speaking, the BAL is actually an example language that demonstrates how you can buildyour own language on top of the underlying languages. Having said that, the BAL is almost feature complete and manybusinesses will choose it over the TRL and IRL. The intention is that business people can write in BAL. In actualality, I see this being about as likely to happen asbusiness people using Crystal Reports to write custom reports. If you can make it happen, great but I very much doubtit. However, BAL is much easier to read especially for business people and even for developers. So when, as a developer,you need to validate that the rule you’ve just written is correct, BAL is a god send. Even mildly complex rules arequite hard to follow in raw IRL and, for that matter, TRL.

BAL also requires a mapping layer between your Java objects and natural language. This isso that instead of writing order.getTotal() we can instead use the total amount of the order. On anagile project that is ramping up, I can see this as being an small but certainly not insurmountable issue due to theever changing domain model.

JRules comes with a GUI Editor that allows you to maintain rules in any of the supported languages. While it ispossible to create rules by hand in IRL and stored as plain text files, TRL and BAL require the GUI. The rules arestored in a repository (directory structures) that can then be exported to an IRL file for use by an application. Again,Not so much of a problem really but being forced to point and click when I really just want to type is very frustrating.If I could ask for one thing, it would be the ability to edit BAL rules in my editor of choice and then import them intothe repository.

BAL and TRL are both converted to IRL. IRL can be converted back into TRL. Unfortunately theconversion from BAL to IRL is a one-way street. Taking a look at the generated IRL was quite an eye-opener. The ruleeditor converts BAL into IRL that is almost impossible to understand by a human.

From a purely technical perspective, it matters little what language the rules are written in as they end up as IRLwhich is then parsed and executed by your application. Loading and running a test requires only one jar file(jrules-engine.jar) and can be as simple as:

public void testEngineIntegration() {// Define a simple rulefinal String rule = "rule TestRule\n" +"{\n" +"    when\n" +"    {\n" +"        not String();\n" +"    }\n" +"    then\n" +"    {\n" +"        assert Integer(57);\n" +"    }\n" +"};";// Parse the ruleIlrRuleset ruleset = new IlrRuleset();assertTrue("Rule parsing failed", ruleset.parseReader(new StringReader(rule)));// Create a context (rule engine)IlrContext context = new IlrContext(ruleset);// Execute the rule and ensure it was actually runassertEquals(1, context.fireAllRules());// Check the final state matches the expected stateEnumeration enum = context.enumerateObjects();assertNotNull(enum);assertTrue(enum.hasMoreElements());assertEquals(57, ((Integer) enum.nextElement()).intValue());assertFalse(enum.hasMoreElements());}

You can also load rules directly from the repository if you wish. So far we haven’t attempted this and for the timebeing at least it’s not a high priority as we will be loading the ruleset from the exported script.

The repository, apart from being a pain to use, has some other mildly annoying issues when working in an agile team.These are mainly to do with locking rules for editing. The rule editor allows you to create rule packages (not to beconfused with packets) and it is only at these levels that locking is possible. Again, not too much of an issue.Besides, you will quickly find your rules unmanageable if you don’t group them at all. Packages are also useful if/whenwriting ruleflows. Remember however that these packages are purely logical and affect only the names of the generatedrules. Something you will most likely rarely have to worry about anyway.

Generally, I would recommend keeping your rules as simple as possible. If your find your rules blowing out to four pages, stop and think again. Rules can and should be decomposed into smaller rules. James Ross has been reading a great book on the subject, Principles of the Business Rule Approach, which he says is a must read for anyone embarking on business rules analysis.

re: Spiking and TDD

By

Jon - Six Million Dollar Man - Eaves raises some interesting points on spiking versus TDD. A very good summary on a tried and true approach for learning new stuff. While I agree totally with the article (I do this myself quite often so it must be good hehehe), I’ve really started to try and use JUnit as a mechanism for doing spikes as well.

Just recently, as part of a new 3-month gig (many thanks to David Pattinson and James Ross and much to the obvious displeasure of one who shall remain nameless) I started doing some investigation into ILog JRules, something I had never used before. My primary responsibility was to mitigate any technical risk associated with using JRules. Naturally, testability was one of the main concerns for the team.

Now usually in this kind of scenario, one would usually whip up a quick-and-dirty main method, and start writing some code to both learn about JRules and to prove what it can/can’t do. Instead, I started off by writing a test. Of course I had no idea what the test would do but I wanted to force myself to do it anyway.

At first it was a very simple test to work out what jar files I would need to just get the engine up and running. Then as I had more and more dicsussions with various team members about what our overall requirements would be and we thrashed out the different ways we could achieve the same thing with JRules, I simply added more and more sophisticated tests to test out our assumptions.

Doing my spiking as fully-blown tests, forced me to think much more critically about what I was doing rather than just spew stuff to stdout and look at the results. I was forced to clearly state my assumptions and the tests proved conclusively if these assumptions were right or wrong.

The great thing now is that we have a full compliment of integration tests that both prove our assumptions about, and demonstrate our chosen use of, JRules. This will be extremely valuable when the rest of the team start to write their own rules as part of implementing user stories. It also means we can happily upgrade to newer versions of JRules if/when they are made available with much less worry about breaking our code. This is of course why you all write integration tests isn’t it ;-)

A few days ago, James and I did much the same thing when doing some swing development. I can’t say I’ve perfected it. I still do some debugging and println but in the context of writing a test. But, I’m forcing myself to be as “pure” about it as I can to see how far it’s practical to push the technique.

As a complete aside, I found JRules pretty damn easy to use and test. The documentation is very good. If anyone is interested, I’ll post a brief overview.

It wasn't me, I was framed!

By

People who know me well, know well (and it’s not even christmas) my disdain for frameworks in general. I must have missed it but at some point the ideal of component based development was hijaked by the framework. Was this because it didn’t actually work or because someone decided to play a cruel joke on us to see how long we, as an industry, could keep going around in circles re-inventing the proverbial wheel?

Almost every large-scale project I’ve ever seen starts of with lofty goals of re-using the framework dejour and soon realises that it just wont cut it. Then comes the process of deciding if it is better to stick with it and forego the application you really want/need, roll-your own for maximum flexibilty or bastardise it so as to render your application largely incompatible with any future releases of said framework.

Sourceforge is littered with every-man-and-his-dogs attempt to build a better mousetrap. And they’re just the open sourced ones. I dread to think how many companies have their own flavours!

I’ve had much success building applications from scratch with no framework, just some user stories and test cases. Then letting the infrastructure requirements emerge and hunting around for libraries that will satisfy my needs. This seems to work much better than the times I’ve tried to decide up front which frameworks and tools I’d like to use, invariably leading to more work than it was worth. The key point here is identifying real requirements and addressing them. “We will need struts” is not a requirement!

Off the top of my head, I can think of maybe half-a-dozen libraries that I’ve been using regularly that don’t lock me into someone elses idea of how my application should be structured. Some for persistence, some for rendering, others for business rules, etc.

Sometimes there are so-called “strategic” reasons for using a particular tool. Unfortunately, the average whiteboard architect has usually been drinking way too much Gartner Koolaide. To be re-usable, software components should solve a particular problem without trying to dictate my overall design. And more importantly, they should solve the problem well.

Yikes! Is that bile I can smell? Hmmmm