Earlier today I learned that ClearBooks had ‘unexpected downtime.’ Whenever you see that phrase you should substitute: ‘total cock up.’ To its credit, ClearBooks tried to communicate the issue on GetSatisfaction but in doing so exposed fundamental weaknesses of which buyers should be aware. From the GS trail:
We are currently experiencing a problem with our database server.
Downtime is currently unforseeable and could be for several hours.
Unfortunately, this has caught us out with only one system admin on site. Two system admins were typically on the road for a meeting this morning and are now retracing their steps fast to assist and get this issue resolved.
We had two local powercuts last night in our datacentre which is located in Aldershot, Hampshire. On both occassions the backup generators kicked in to keep serving up the site. This may or may not be related to the problems we are having now.
Worst case scenario as it currently stands is that we will have to set up a new db server and restore backups from two days ago.
I have previously called out ClearBooks for overstepping the marketing hype machine. Now it seems hype meets reality. This is a disaster scenario. Andrew Taylor immediately responded:
As a sys admin I’m not used to worrying inappropriately so we’ll wait to see how this plays out, but, this part of your update concerns me:
“Worst case scenario as it currently stands is that we will have to set up a new db server and restore backups from two days ago.”
Surely you have more recent backups than 2 days ago; I’m suprised you don’t have a replicated system which you can fail-over to but I’m shocked that your most up-to-date backup is 48 hours old.
I use Clearbooks because I don’t have a book-keeper and need something which helps me get my admin and invoicing done quickly; I don’t have the time to re-do all that work again.
I can live with the site being unavailable, I can’t live with losing my work.
He is right to express such concerns. The original problem was notified via GS at around 11.00 am (according to my time line.) As of 40 minutes ago, the situation was:
The RAID rebuild is about 75% complete – we can’t be sure if this will work, so we are also still working on plan B to restore from our off site backups. Our preferred option is to restore from the RAID rebuild as it should result in more data being recovered. This looks like it will finish around midnight, although we can’t be certain. If this route fails, then we will use the backups from Sunday (taken between 00:00 a.m. and 06:00 a.m)..
I wouldn’t recommend anyone stays up to wait for this tonight – but hopefully it should be ready working for you in the morning.
Rephrased: we think we know the issue but in fact we don’t know whether our current fix proposal will work. Customers do not yet know whether data has been lost or the extent to which the database is damaged.
So let’s do some analysis here. SaaS should ‘just work.’ Failure rates are generally an order of magnitude lower than on-premise systems due to the fact SaaS providers have to build in fault tolerance and resilience from the get go. BUT – if the SaaS provider doesn’t fully understand the implications of what they are building then any large scale failure wipes everyone out. That WILL happen from time to time. The question comes: what is the provider doing to ensure minimal disruption and what about potential data loss?
I recall FreshBooks had a catastrophic failure. It communicated the problems to users in clear terms, explaining that at worst, there would be a 32 minute data loss in the window when things went pear shaped. Customers lived with that and praised FreshBooks for both communicating and understanding the issues. Right now, no-one knows how much data has been lost from this ClearBooks issue. Hence the concerns of those who are commenting on a long ClearBooks GS thread. Right now, ClearBooks is looking a lot like a bunch of fake SaaS amateurs.
There are technical underpinnings to the ClearBooks story that need understanding. However, I think the more important point comes in the question of standards. In recent times I have been involved with discussions around the proposed Cloud Industry code of practice. I have advocated strongly for industry business standards while at the same time calling to account CIF for muffing its efforts, largely on the grounds it is a vendor led initiative. They are trying to fix that and for that I give them a partial pass. As the ClearBooks story unfolds, it is interesting to note two Tweet messages:
Roan Lavery – FreeAgent talking about the ClearBooks problem:
@DuaneJackson Problem is that reflects badly on the whole industry.
…and Duane Jackson, Kashflow responding:
@roanlavery agreed. Makes accreditation schemes more appealing. I was a skeptic until today. (cc @garyturner )
It is especially gratifying to see Duane see the value in establishing standards. Customers need to draw comfort and industry standards can provide that. BUT – they require end user input. CIF understands this and is the only body attempting to get something positive done. Regardless of their past failures, they are listening and trying to act. Bottom line: if you’re not part of the conversation and are not qualified performing due diligence then don’t belly ache when things go wrong.
For more on this topic consider attending the ICAEW Cloud Computing for Accountants meeting on 24th September. I shall be speaking on these topics as part of my selection criteria talk. This incident has now become part of my presentation because as my good friend Frank Scavo says: ‘Just cuz you’re SaaS doesn’t mean you get a pass.’




