Sunday, June 24, 2007

Communication of Software Risks

One of the interesting challenges in managing a project is estimating technical risk.

Here is an example of a classic technology risk scenario:

The development team is building an online transaction system that sends orders from an e-commerce system to a warehouse. The e-commerce system is internal, the warehouse is external. The warehouse works on an AS400 system and no one on the team has ever worked with AS400 platforms before. There is a defined web services layer between the two systems which should hopefully hide the workings of the AS400 system behind the layer.

This is a pretty typical high level description of requirements from a manager, a request for proposal, or an email.

Most developers look at this and start analyzing all the things that could go wrong...there are dozens of risks just in this one paragraph.

A project manager might realize that there are some risks involved and if the PM has a technical background might be starting to ask some risk based questions that show some understanding of the potential problems. The worst case scenario is a non-technical PM who thinks that this project is the equivalent of ordering pizza from Domino's - you simply call and expect the order delivered by the guaranteed delivery window.

Let's assume that the entire team is on the ball, everyone's working together and the team comes together to review the project plan. A risk matrix gets created and the team comes up with the following risks:

Risk: Web Service might not work
Probability: Medium
Impact: High

Risk: AS400 might have performance problems
Probability: Medium
Impact: Medium

Risk: Transactions might not work across the two distributed systems.
Probably: High
Impact: Low/Medium

and so on....

This is the critical point where there is a massive communication gap between what a typical PM thinks a risk impact is and what a developer think of a risk impact.

A PM may think, "Well we have some risks so we should allocated some contingency in the budget. This is a high risk project, so we'll add in 30% contingency to our estimate and a month onto the time line."

A Developer may think: "If this server cannot move data fast enough, we're dead in the water. It won't matter how much time or money you throw at it because short of hauling out the entire AS400 out the warehouse we're toast."

Big difference in risk!

I've seen projects where there were mature team members and did all the traditional risk analysis that is recommended on software projects. But the project still failed because there were fundamental differences in assumptions in what "High Risk" actually meant to cost, time line and project success.

Here are some common reasons why there are differences in assumptions around risks.

1. Team members don't all have the same assumptions on impact of a risk. For example, a task is estimated at 3 days but the impact of the risk is "High". So what does that mean to the estimate? Add 10%? Add 50%? What about 10X? In my experience, project managers in particular do not understand the sheer magnitude of software risks - they'll see a risk as "High" and put in contingency of an additional 30% and think they're good to go.

A 10X risk impact is not unheard of - I've seen it myself on many projects. We had one project where one web service call was supposed to take 3 days and took 30 days because of technical complications.

2. There are some risks that result in project failure because in some cases requirements are not variable. For example, financial data cannot be more or less correct - its either right or its wrong. So getting to 80% right isn't really valuable - if you cannot hit the 100% mark then your software project fails. So if you have a risk that the data isn't going to be right then it isn't so much that your project went over budget or took too long to deliver - its a total failure because it cannot meet the basic requirements. Same thing with external software dependencies - for example we had a case where we were integrating with a web services framework and there was a bug in one of the supplied web services from the vendor. It wasn't a matter of "let's just find a work around" - the project was effectively dead in the water until that bug was solved.

Project managers tend to be solution oriented and optimistic, so while they like the idea of contingency, they tend to shy away from the concept that a risk might actually kill their project with no way to get around it.

3. The "I don't know" estimate will get quantified too early on in the process or not be listened to at all. If your team is saying, "I don't know" as a project manager, what is your approach? Many PMs are forced into providing estimates too early by management, so they simply fudge the "I don't know" into a contingency or pad the estimate. They think that by taking the developer's estimate and doubling it they've been given themselves a massive breathing space to worry about the problem later.

The cone of uncertainty on typical projects is a lot bigger than most PMs realize - Steve McConnell's work on estimation pegs the variance at 1600%!



If the team doesn't have a common understanding of how to quantify the impact of risks, then everyone can be communicating using a traditional risk process but the shared understanding of the impact of any risk will tend to be different.

Here are some recommendations for Project Managers for creating an improved risk communication culture:

1. Vary the risk based on expected uncertainty level and the phase of the project. For example, if you're in the planning phase the expected uncertainty level is high. If someone on the team says, "Well we don't really know how that's going to work but we have a plan" that's probably OK. If you hear the same thing in the middle of QA, then you should be very concerned and adjust accordingly.

2. Have the team translated risks from labels ("High", "Medium", "Low") to numbers where possible. In addition, don't assume that one set of labels means the same across the project. For example, a "High Risk" HTML issue might be something very different than a "High Risk" web services integration issue or performance issue.

3. Identify the requirements that are absolute and unmovable and pay particular attention to risks that are around them. For example, in some cases performance is an absolute requirement (e.g. it must serve 30 pages per second) and sometimes it isn't (it must be "fast"). If your requirements are absolute then any risk that hits those requirements will not just cause a delay, an increase in cost, etc. - it will kill your project.

4. Pay close attention to areas of the project where there is limited flexibility or options. Compare the flexibility of a creative front end vs. a bought software system. If someone doesn't like the the front-end, you can rebuild it, change it, etc. It might cost more, create a time impact, etc. but you've got lots of flexibility for change. If you buy Microsoft Content Management Server and it has a bug in it your options are pretty limited.

5. "I don't know" must be a viable option in any discussion. In addition, the response to "I don't know" should be to go find out - be careful about arbitrary padding of estimates simply to factor it into your plan.

6. Pay close attention to body language, talent and historical experience. Relationships matter in software teams and a team that has worked together before has the advantage that the PM starts to recognize the subtleties of individual communication styles and to understand how each person reacts to different questions or risks. Is John an optimist or a pessimist? When someone says, "I don't know" are they being cagey or honest? Teams who have worked together start to intuitive find out these non-verbal cues.

7. Review risks on a regular basis so that you can refine your strategies to mitigate them.

Hopefully that helps you in creating a more pro-active risk based approach and to be aware that just because people the team are calling somet "high risk" they may not actually mean the same thing.

No comments: