avatarharuki zaemon

Comparing Collections

By

After a long week, Achilles finds he has too much time on his hands. His friend the Tortoise takes pity and indulges him with a bit of IM’ing.Achilles:I’ve done nothing but read blog entries this weekend.Tortoise:You must be bored! Anything interesting?Achilles:I just read an entry that reminded me of some stuff I refactored during the week.Tortoise:Do you ever get any real work done?Achilles:Now that Java has a LinkedHashSet can you think of any reason to use a simple List except for “performance” reasons?Tortoise:Won’t it just look like a List?Achilles:Sort of but, importantly, it’s also a Set. Since when do you actually mean to add the same item to a List more than once? I’m being pedantic here.Tortoise:But it happens, life is full of duplicates.Achilles:I’m sure it does but I can’t think of many examples where that’s actually what you want. It just seems too often people use Lists when they should actually be using a Set. Clearly, Lists are useful but ArrayList has to be the most abused Collection class aroundTortoise:People generally think in terms of Lists - it’s a simple concept.Achilles:Yes, but people also think that AND and OR mean exactly the opposite. What we think and what we mean aren’t always the same and programming is about expressing what you mean.Tortoise:Do people really think about the correct Collection type to use?Achilles:No, they probably don’t but they should.Tortoise:I try to but I can’t guarantee that I won’t be lazy and default to ArrayList.Achilles:Exactly! And then the code ends up iterating over stuff and assuming a particular order on things that have no order. I see it all the time and this damn CollectionUtils.isEquals(Collection, Collection)code just makes it worse. Its ludicrous. It basically allows you to compare a Listwith aSetand see if the contents are the same. Which is just wrong! AListand aSetare not the same thing. They are symantically very different and thinking that it's just a matter of comparing the contents is, IMHO, flawed.</td></tr><tr><td align="left" valign="top">Tortoise:</td><td>Which takes us back to your original question - if you want to allow duplicates then you can't use aSet, so when would you want to allow dups?</td></tr><tr><td align="left" valign="top">Achilles:</td><td>Very rarely I suspect. In fact how often do you ever want to allow duplicates and how often does order really matter? Part of the problem I think is a misunderstanding of what equals(Object)actually means. It implies substitutability and therefore must be reflexive. But many people don't realise that theirequals(Object)method isn't so that we end up witha == bbutb != a.</td></tr><tr><td align="left" valign="top">Tortoise:</td><td>I've not seen that happen.</td></tr><tr><td align="left" valign="top">Achilles:</td><td>It usually happens with inhreitence and using instanceofinstead of class comparison.</td></tr><tr><td align="left" valign="top">Tortoise:</td><td>You must have looked at a lot of shitty code!</td></tr><tr><td align="left" valign="top">Achilles:</td><td>You mean you can't tell? Why do you think I bitch so much :-)</td></tr><tr><td align="left" valign="top">Tortoise:</td><td>Ok what if I have a situation where it is possible to have more than one object of the same type and content? ThatCollectioncould not be stored in aSet, correct?</td></tr><tr><td align="left" valign="top">Achilles:</td><td>Correct. So you just want a Collection, not a List. I repeat NOT A List.</td></tr><tr><td align="left" valign="top">Tortoise:</td><td>Then what implementation class do I use?</td></tr><tr><td align="left" valign="top">Achilles:</td><td>The implementation can be a Listbut the variable should be aCollectionas inCollection things = new ArrayList();code because a List implies ordering and so far you haven’t mentioned anything about order being important.Tortoise:Ok so then I decide that ordering is important.Achilles:Sure make it a List but the key thing is that you don’t just assume that order is important because then people will try and write tests assuming something about the order and then they’ll build screens assuming something about the order, etc. etc.Tortoise:I’ve just remembered…I added a method to compare Collections (for that domain object) to see if there had been any changes - there is no check to see if they are the same implementation of Collection so i could be iterating over a List and a SetAchilles:Why can’t you just call Collection.equals(Object)? Thats what it’s for.Tortoise:On the Collection?Achilles:Yes. I see people writing “convenience” methods for comparing Collections all the time when they already have an equals(Object) method that does a perfectly good job.Tortoise:I assumed that it wouldn’t do a deep comparison.Achilles:It iterates over the contents, calling equals(Object) and or checking object identity (whatever is appropriate for the Collection). I use assertEquals(Object, Object)code on Collectionsall the time.<tr><td align="left" valign="top">Tortoise:</td><td>Hmmm, that didn't get picked up in the tech review.</td></tr><tr><td align="left" valign="top">Achilles:</td><td>Probably because everyone on the project usesCollectionUtils.isEqual(Collection, Collection)code!