Friday 31 May 2013

Book of the month: Inviting Disaster (Lessons from the edge of technology) by James R Chiles


"Inviting Disaster" has a number of praise quotes on the first page; the quote that made this book seem relevant to a simulation and human factors audience was:
"The lessons are clear and disconcerting: a sequence of minor human errors when combined with elementary technical problems can lead to unimaginable catastrophe."
From a human factors perspective the book gets off to a bad start but redeems itself in later chapters. The 'Special Introduction' to the paperback edition deals with the 9/11 terrorist attacks on the Twin Towers in New York.  It diverges from the rest of the book by not focusing on human error and complexity. Chiles could have discussed the attackers' movements and training in the US and considered the lack of communication between federal agencies (human errors) and the difficulty of tracking thousands of potential suspects (complexity). Instead this special introduction looks at the towers' design, construction and their evacuation. With regard to these factors there really wasn't a sequence of minor errors but instead perhaps a lack of comprehension that airplanes might be used as terrorist weapons.

After the disappointment of the special introduction, the book begins to redeem itself with the first chapter. Chiles describes the sinking of the Ocean Ranger oil rig off the Canadian coast in 1982. The specifics are unfamiliar but the types of error lining up to cause the sinking are familiar ones. They include:


  1. Equipment design


    1. The eight legs which support the oil platform are so big that the designers decided to use the space inside them for storage and other functions, such as the ballast control room (see below). Unfortunately the legs are also closer to the water than the platform, which means they are more likely to be hit by a wave. Also, because the legs are being used for storage, the space inside them could flood during a storm. There is a particular risk of flooding of the four corner legs because they have five-foot holes in them for bringing wire ropes and chains onboard. The conditions required for water to enter these holes (severe tilting of the platform in bad weather) are considered an 'impossibility'. Additionally there is no alarm to warn the crew that water is entering the rig legs.
    2. The ballast control room allows the operator to control the pumps which move the seawater ballast between the ballast tanks and the ocean. There are 16 tanks but only three pumps and this means that valves have to be set to certain positions in order to make sure the correct pump is connected to the correct tank. Under crisis conditions it might prove difficult to work out which pump was working which tank.
    3. The ballast control panel is not protected against seawater. There is a cutoff switch to stop electrical short-circuits from opening and closing the valves automatically, however this switch also disables all the electrical displays. Once the cutoff switch is active there is no way of knowing the valve positions, pump state or depth of ballast within the tanks.
  2. Training

    1. In order to qualify for the job of ballast control operator, a worker had to spend "several hours each day hanging about the control room and watching over the ballast operator's shoulder"(p.27). There had been a training programme for new operators but this had ended some time previously. Lack of training means that the ballast control operators do not know how the pumps work, or that the pumps will not work if the tank to be pumped is much lower than the pumping chamber (as occurs when the rig is listing.)
    2. There was no training for evacuation during a storm. Evacuation drills would occasionally consist of counting the people on the rig, at other times they would lower the lifeboats in calm conditions.
  3. Human error
    1. The glass portholes to the ballast control room can be protected by steel storm covers. These were not put in place before the storm
    2. The most knowledgeable ballast control operator on the rig found out about an emergency workaround which would allow the pump valves to be controlled without electrical power.  This undocumented and untested emergency procedure would allow the control operator to open specific valves. Unfortunately the ballast control operator thought that the procedure would close them instead.

During the storm on the 14th February 1982 a powerful wave smashes one of the ballast control room portholes and seawater splashes onto the control panel. Because the equipment is short-circuiting the cutoff switch is activated. Although not ideal, the rig is in a safe, 'holding' position. The valves to the tanks are all closed, the anchors holding the rig are secure and the rig is straight in the water. In this condition, the rig would not have sunk.

Unfortunately the crew decide at some stage to reconnect the power to the control panel. Electrical short-circuits and possibly trial and error attempts by the crew allow water to enter into the bow section via open valves. The emergency workaround is used at some stage which leads to additional valves opening. Eventually the rig is tilting to such an extent that the 'impossible' event of water entering the five-foot holes in the corner leg on the port bow occurs. As more water enters the rig, the listing worsens until even small waves can reach the hole.

At 01:05 AM the rig's support vessel is radioed to approach the rig which is now being abandoned. Unfortunately the 80mph winds cause the lifeboats to smash against the side of the rig and crack open. Additionally, once the lifeboats reach the water,  the ropes securing them to the rig can only be released when there is no tension on the rope. This means that the lifeboats cannot get clear of the rig and continue to smash into it. When the support vessel reaches the rig at 02:00AM only one lifeboat is still afloat. The eight men inside must have thought they were saved. Unfortunately as they try to climb onto the supply ship's deck, they tumble from the pitching lifeboat and are swept away. The support vessel has no gear to drag the men onboard and the men are too immobilised by cold to climb into the support vessel's life rafts. A total of 84 people die.


The subsequent chapters of Chiles' book detail events which are varied in time, place and industry, but similar in causation. They include the shutting down of the wrong engine on a Boeing 737, the Three Mile Island nuclear accident, the R.101 airship, the Challenger disaster and more. The in depth analyses of the various incidents are one of the book's great assets but also its major weakness. At times the description of the incident is so long that the reason for it being mentioned in the first place is forgotten. There is a lack of a logical progression from chapter to chapter so that, at times, the story seems to be drifting from one disaster to the next without an underlying structure.

However, Chiles does offer some suggestions for reducing human error. In a reference to unknown unknowns he states:
"When meeting a new system, people need time to know its workings under good conditions and bad. The most dangerous time is when the operators don't know what they don't know" (p.39)
Chiles also describes the benefits of a "fresh pair of eyes" in relation to the Three Mile Island accident (p.61) and the importance of situational awareness and communication.

Most importantly Chiles mentions the characteristics of a high-reliability organisation (HRO) (p.62):
  • A priority on safety from top to bottom
  • Deep redundancy so the inevitable errors or malfunctions are caught in time
  • A structure that allows key decisions at all levels
  • A premium on learning lessons from trials and errors 
  • Workers who keep their skills sharp with practice and emergency drills
It is in relation to this last bullet point where simulation can help to build a HRO. Simulation allows repeated, coached practice of routine and emergency drills.

In summary, read this book for its interesting stories of disasters and easy prose, but if you are looking for an in-depth discussion of human factors or human error then skip this one.

Tuesday 28 May 2013

Human error, boats and tigers.

In the Man Booker Prize-winning "Life of Pi", Yann Martel tells the story of a boy who survives a shipwreck to find himself sharing a boat with the only other survivor, a tiger. This blog is not about that book.

However it is about a boat, a tiger and... human factors.
When the Costa Concordia  ran aground on the 13th January 2012 off the coast of Italy 32 people died. The waters were well-charted and calm. The weather was clear. One had to wonder how this cruiseship with over 4000 passengers and crew managed to hit rocks and become partially submerged. The company which owned the cruise ship issued this statement on the 15th January:
"It seems that the commander made errors of judgement that had serious consequences: the route followed by the ship turned out to be too close to the coast, and it seems that his decision in handling the emergency didn't follow Costa Cruises' procedures which are in line, and in some cases, go beyond, international standards."
The CEO of Costa Cruises on the 16th January told the world media that the captain made an "unapproved, unauthorised" deviation in course. Lloyd's List published an illustration, widely distributed, which showed how far off the "correct" course the ship had been.



The facts seem very clear: The captain of the Costa Concordia deviated from company protocol and somehow managed to ram his ship into rocks. However a later illustration, also from Lloyd's List, starts to raise more questions. This illustration shows that on a previous voyage the Costa Concordia actually deviated further from its route than on the day it sank.




According to this BBC review of the incident, the captain committed a number of errors. However there are other factors to consider:
  • Why did the helmsman not question the captain's decision to come so close to shore? What are the power gradients on board large cruise ships? Can any member of the team voice a concern and expect it to be listened to?
  • What are the cruise owner's protocols for dealing with violations? According to Richard Meade, the editor of Lloyd's List, the owner would have known about the route taken in August.


Leaving the boat behind we turn to the tiger and the tragic case of Sarah McClay. Ms McClay was a zoo keeper who died on the 24th May this year after being mauled by a tiger. A statement on the zoo's Facebook page on the 25th May explained:
"...from the investigations that have taken place it is clear that this tragedy was caused by a sad error of judgement and breach of protocols, in essence keeper error. This is not blame, it is not anything but defining the facts as they appear. This does not mean Sarah killed herself on purpose it means simply she died from her own tragic mistake."
The BBC on the 25th May reported the zoo owner, David Gill, as saying:
"We have very strict protocols and procedures for working with big cats, but it seems she failed to follow correct procedures. For inexplicable reasons she opened a door and walked into the enclosure. We will never know why she entered without telling anyone. There was no reason for her to go in there."
And on the same webpage Mr Gill is quoted as saying: "It would not do any good to close the park as there is no safety issue."

However, the most recent news report now suggests that Sarah McClay did not walk into the enclosure. The tiger dragged her into it from a staff area, to which the tigers are not supposed to be able to gain access.

The parallels with the Costa Concordia incident are clear. From an initial blaming of the individual the widening investigation starts to show up human and technical errors. As part of an error chain these incidents lead to a sequence of events which result in the loss of life. In the end, whether it's boats, tigers, or healthcare, we need to move beyond the scapegoating of individuals. In a complex system, we need to focus instead on the series of mishaps which culminate in a catastrophic error.

Updates:

On the 11th February 2015 Costa Concordia captain Francesco Schettino was sentenced to 16 years in jail for multiple manslaughter. According to the maritime trade union Nautilus "There has been an absence of meaningful action to improve safety in response to the Costa Concordia accident, and this trial has simply served as a distraction from the important underlying issues." The ship's operator, Costa Cruises, paid a 1-million euro fine in April 2013 to settle potential criminal charges and in the February 2015 court case the company and Schettino were jointly ordered to pay £22,000 to each passenger.

On the 18th September 2014 an inquest into Sarah McClay's death concluded that the tiger was able to reach her through a door which should have been locked with the possibility that the door was faulty.

Friday 10 May 2013

Decisions, decisions, decisions...

Have I got time for a cup of tea before work?
Do I need a haircut?
Left sock then right sock or right sock then left sock?

The majority of the decisions we make are trivial, having only a small effect on our lives or the lives of others. However, at times we make decisions which do affect others.

Will I drive home after a few beers?
Am I alright to have the last cigarette of the day in bed?
Should I alert somebody to that suspect package?

Although these decisions may have greater and graver consequences, the answers are still straightforward (No. No. Yes.) The decisions that we see being made in the simulation courses are of a different type. They are often time-sensitive, the consequences are poorly understood and the number of possible decisions can be colossal.

In her book "Sitting in the hot seat", Rhona Flin describes the traditional decision making (TDM) process (p.141):

  1. Identify the problem
  2. Generate a set of options for solving the problem
  3. Evaluate these options concurrently using one of a number of strategies
  4. Choose and implement the preferred option
Unfortunately TDM is not applicable in the pressured, time-critical, life-threatening environment of a healthcare crisis. It is here that naturalistic decision making (NDM) occurs. Although there are a number of NDM models, they are similar to the extent that there is no step-wise progression from problem to solution. Instead problem-identification, option-appraisal and problem-solving occur almost concurrently and cyclically.

The principal NDM model is recognition-primed decision making (RPD). In RPD the decision-maker is thought to try and recognise the situation, choose the correct response and then implement this response. Flin (p.145) describes the 3 variants of RPD although Wikipedia does a better job of making them easy to remember:
  1. If... then... : The standard response as detailed above
  2. If??? then... : The situation is unclear/not recognised. The decision-maker knows a number of correct responses but does not know which one to use. The decision-maker must gather further information.
  3. If... then???: The situation is clear but a lack of knowledge means the decision-maker does not know which response to choose. The decision-maker must mentally simulate the consquences of a given response and consider whether these are acceptable. If they are not then the decision-maker must choose more responses until he/she finds one with the desired outcome.
RPD relates back to the post about experts and their unknown knowns. Experts, because of their time spent doing the job, are more likely to recognise the situation and are more likely to know what the correct response to a given situation is.

Flin (p.147) tells us that the key features of the RPD model include a "focus on situation assessment" and an "aim to satisfice, not optimize". "Satisfice" may not be a word you see every day but it relays the need to look for a solution that is "good enough" not "brilliant/amazing/superb". And Flin's placement of the need for situation assessment as a key feature is spot on. Without appropriate situation assessment all subsequent decisions will be prone to failure.

The benefit of simulation-based education is that everybody can review the decisions they made by using video-assisted debrief. Alternatives can be discussed and, because the simulator can be "reset", those alternatives can be played out to see what the consequences are on outcome. The goal is to make all of us better decision-makers when time is short and the stakes are high.


Naturalistic decision-making
8 characteristics of NDM: p.52 of stress and human performance:
A NDM environment

  1. ill-structured problems
  2. Uncertain and dynamic environments
  3. Shifting, ill-defined, or competing goals
  4. Action/feedback loops
  5. Time stress
  6. High stakes
  7. Multiple players
  8. Organizational goals and norms
    Time for some NDM