Sunday, June 6, 2010

How to conduct effective Root Cause Analysis...

Root Cause Analysis is a very effective tool to conduct Causal Analysis. This process helps understand the problems which are being report and take corrective and preventive actions so that similar problems are fixed permanently and do not repeat in future.

Following are the steps to conduct effective Root Cause Analysis. Given below are the process steps with few guidelines to help complete RCA in a quick time and with effective and productive manner. These guidelines can be extended as needed for any specific scenario. The steps for conducting RCA are following:

A. Modularize Issue - Identify and categorize the issue, so that it can be assigned to a individual or a team
B. Identify Root Cause of Issue - Identify what exactly caused the issue to occur
C. Identify Corrective Actions - Identify what we need to do to provide immediate fix
D. Identify Preventive Actions - Identify what we need to do to prevent similar problem from re-occurring
E. Review with the team and Action items tracking - Identify and track actions items for the team

Let's understand each of these steps in details on what activities we should do in each of these steps:

A. Modularize Issue –

   #1. Identify Modules and Sub-modules in the product to categorize the product
   #2. Identify Modules wise SME in project team
   #3. Identify issue to respective Modules and Sub Modules
   #4. Based on respective Modules, assign to the respective SME or team

B. Identify Root Cause

   #1. Review issue details
      a. In Bug database
      b. Service request id from Consulting/Customer Support
      c. Customer Issue details e.g. QC system of customer
      d. Any relevant email and other available information

  #2. Define the problem
     a. Understand the problem that is being reported based on available information
     b. Understand the area of problem and steps to reproduce the problem
     c. Identify any dependencies that may help in understanding the problem
     d. If there is any gap in understanding talk to respective support, consulting or Dev resources

   #3. Reproduce issue in test environment
     a. Identify a respective QC environment on the release issue found e.g. if issue is found on version 5.1 then check issue in version 5.1 test environment
     b. Reproduce in QC test env
     c. Try to reproduce issue with the exact steps provided in the issue

     d. If issue do not get reproduce then seek more information rather than giving up!
       i. Send email to respective consulting or CS representative
       ii. Talk to respective Dev lead/Developer or peer testers
       iii. Other relevant sources that may help…

     e. Smart Thought (If QC environment is not available)
       i. Check any possible env with Dev team
       ii. Check any available env with CS or Consulting
       iii. If above options not available then prepare quick env with bare minimum components

(Note: It is highly advisable to keep latest release up-and-running to expedite issue reproduction process)

   #4. Analyze issue on following parameters
      a. Behavior of Design
        i. Is this a known issue
        ii. Do we have workaround conveyed to customer in release notes
        iii. Do we have design document covering this scenario
        iv. Have we prepared Impact assessment document covering this scenario (for patches)
        v. Problem in logic
        vi. Problem in algorithm
        vii. Problem in coverage
        viii. Problem in test strategy
        ix. Problem in performance

      b. Behavior of Coding
        i. Code segment not correct
        ii. Code segment not as per requirement
        iii. Code segment has logical error
        iv. Unit test strategy not covers this scenario
        v. Unit testing not conducted for this scenario
        vi. Unit testing not challenging the business scenario
        vii. Unit testing not have adequate test data and test coverage
        viii. Code review not done properly

     c. Behavior of Testing
       i. Scenario not part of test strategy
       ii. Scenario not part of test scenario
       iii. Scenario not part of test steps
       iv. Was testing done for this issue
       v. Was coverage enough for the issue
       vi. Was Adequate regression testing done for the issue

C. Identify Corrective Actions

     #1. Is there any workaround available
     #2. Can workaround be suggested to customer
     #3. Can a small and quick fix be provided to customer
     #4. Does this require quick suggestion on IG and IQ steps that may help prevent the issue in near future
     #5. How many customers are affected by this issue
     #6. What is the Risk Priority Number for this issue (Severity x Occurrence-ability x Detect-ability)
     #7. What it will take to fix the problem in the code branch
     #8. What it will take to fix the problem in Main line code

D. Identify Preventive Actions

     #1. Requirements
        i. Was requirement was clearly stated
        ii. Was requirement reviewed completely by team
        iii. Was requirement review budgeted for this area
        iv. How we will ensure that this problem do not occur in future

     #2. Planning
        i. Was this task planned in test strategy
        ii. Was this task planned in mpp
        iii. Was estimation correct (Have we missed some scenarios)
        iv. Was enough time provided for the task in mpp
        v. Was the plan followed with team and status updated correctly
        vi. How planning can be made better for similar tasks

     #3. Design
        i. How similar logic/parameter/algorithm will be designed correctly in future
        ii. How similar logic/parameter/algorithm will be implemented fully in future
        iii. How can we prepare better test strategy during design
        iv. How will these gaps can be identified during reviews
        v. How can we plan and conduct good design practices

     #4. Construction
        i. How and why this scenario will be implemented correctly while coding
        ii. How will we follow coding guidelines and checklist
        iii. How will we add scenario part of Unit test strategy
        iv. How will we ensure unit testing strategy is covers similar scenario
        v. How will we ensure scenario missed in Unit testing
        vi. How we will ensure all boundary scenarios are covered in Unit testing
        vii. How Code Review done properly for the code to highlight the issue beforehand

     #5. Test Script
        i. How will we identify similar scenario part of test strategy
        ii. Was this scenario present in Test Script
          1. If Yes
             a. Was this step executed properly
             b. Why execution of step did not found the issue
             c. Identify any missing test data
            d. Identify any missing negative boundary condition
          2. If No
             a. Identify why this step not present in script
             b. Identify why this was not caught in script review
             c. Add this missing scenario to the script

        iii. How will we ensure similar scenarios are not missed in review
        iv. Are we considering past issues and test assets while designing test scripts
        v. Are we considering learning from past (CAPA sheet) in next projects
        vi. Is this scenario covered in Regression scripts
        vii. Are existing regression scripts can discover the issue as-is
        viii. What changes are required in Regression script

     #6. Test Execution
        i. Why this scenario was not planned in mpp
        ii. Why this scenario was not executed
        iii. Why we do not have complete proofs and available in VSS
        iv. Why we do not have execution logs in VSS
        v. Do this scenario need to be covered in Functional test script for next project
        vi. Do this scenario need to be covered in Smoke test script for next project
        vii. Do this scenario need to be covered in Regression test script for next project

E. Review with the team and Action Items tracking

     #1. Once RCA is completed it should be reviewed with the team
     #2. Any additional points and information should be updated in RCA
     #3. A concrete action item should be identifed and planned for tracking
     #4. The action item should have an owner who is responsible for completing the action items
     #5. Action items should be reviewed by team on frequent basis (at least once a week/fortnight)
     #6. Once action item is completed it should be marked as closed with details of artifacts which are updated

2 comments:

Freddy Gustavsson said...

Thanks for the useful checklist, Prem! Though I have never been involved in doing a RCA myself, I do think it can benefit an organization a lot by identifying what causes the problems. Once the causes have been identified and highlighted, the organization can create a strategy for avoiding them in the future.

Prem Phulara - If You Think You Can...YOU CAN! said...

Thanks Freddy for your comments. You are absolutely correct that we know and learn what the actual problems are, and then we can focus on the areas, where there are most of the problems occurring repeatedly. There are many analysis techniques that you can use to identify and prioritize e.g. seven tools of Six sigma. I'm planning to write a write up on this soon.