Article
Understanding Failure Mode and Effects Analysis (FMEA): A Crucial Step in Engineering Excellence
In the engineering field—whether the focus is on mechanical engineering, firmware engineering, electrical engineering, RF engineering, or one of the other engineering subdisciplines—ensuring reliability and quality in products is paramount. Enter Failure Mode and Effects Analysis (FMEA), a systematic approach for identifying and addressing potential failures within a product or process.
By anticipating issues before they arise, FMEA not only enhances product quality but also saves time and resources in the long run.
What is Failure Mode and Effects Analysis (FMEA)?
In an end-to-end systems design or product development process, FMEA is used to proactively evaluate the effects of possible failures. From identifying potential failure modes to determining the effects of each failure and prioritizing actions for risk mitigation, FMEA encompasses numerous critical engineering initiatives. Key approaches, protocols, and outcomes for FMEA include:
- FMEA Necessitates a Team-Based Approach: Collaboration among cross-functional teams ensures comprehensive input and insight, promoting diverse perspectives. In many industries there must be records and justification why the team is competent and capable.
- FMEA Requires Structured Documentation: A well-defined format helps maintain clarity and consistency throughout the analysis.
- FMEA Yields a Risk Priority Number (RPN): This numerical score helps prioritize failures based on their severity, occurrence, and detection rates.
- FMEA Benefits from Continual Improvement: FMEA is a continual process. As corrective measures are taken the study should be re-evaluated to ensure it is within the risk tolerance acceptable for the product. As the product matures more data will be available and this data should be periodically updated. In many cases, this will lead to more product improvements via corrective actions.
By carrying out vital tasks of assessing failure severity, analyzing causes of failure, prioritizing risks based on likelihood to take place and expected outcome, the teams that conduct FMEA can approach the risk mitigation process in a systematic, highly-documented fashion that is measurable and above all, actionable.
Why is Failure Mode and Effects Analysis important in the engineering process?
- Enhanced Product Reliability: By identifying potential failure points in the design stage, FMEA allows engineers to rectify issues before products reach production, significantly enhancing reliability.
- Cost Reduction: Identifying risks early in the design phase minimizes the costs associated with product recalls, warranty claims, and reputation damage. A proactive approach saves money compared to reactive strategies.
- Improved Safety: FMEA directly addresses safety by considering the implications of failures, particularly in industries such as automotive and aerospace, where failures can have catastrophic consequences.
- Regulatory Compliance: Many industries require adherence to safety and reliability standards. FMEA can help organizations meet these requirements by systematically evaluating potential risks.
- Continuous Improvement: FMEA not only fosters a culture of quality and safety but also promotes continuous improvement initiatives by encouraging teams to regularly assess and refine processes.
- Customer Satisfaction: Ultimately, products designed with FMEA in mind are more reliable and of higher quality. Focusing on building better products creates a positive customer impact, leading to improved customer satisfaction and loyalty.
The Failure Mode and Effects Analysis (FMEA) Process: 11 Key Steps
The 11 steps of FMEA are as follows. These may change based on the organization conducting the FMEA analysis; however, they represent the comprehensive focus of Failure Mode and Effects Analysis and the outcomes you can expect by investing in the process.
Step #1: Establish the ground rules
How will the team evaluate throughout the Failure Mode and Effects Analysis (FMEA) process?
The scales for severity, occurrence and detection must be consistent across the study. How the rating scales affect the analysis needs to be considered and the definitions need to fit your application.
Failure Modes and Effects Analysis (FMEA) ground rules should be defined by the standard, work instruction, or quality procedure used in analyzing specific types of processes or products.
As stated previously, there are many variations of FMEA. Be sure to adhere to the standard and type of FMEA that fits your applications.
Step #2: Define the object to be analyzed
Determine what system, process or design needs to be analyzed and specify scope.
A well-defined scope ensures the FMEA addresses the right risks, avoids unnecessary complexity, and provides actionable insights. Early in the design process the scope is typically broad and focuses on higher-level concepts or risks.
As the design progresses FMEA may be applied to specific parts, subsystems, or subprocesses.
Step #3: Identify potential failure modes
What could go wrong?
It is important to have an experienced cross-functional and diverse team when conducting FMEA. A variegated team provides more assurance that critical failure modes will be identified. The study will only be as good as the potential failure modes identified.
At its core, FMEA is a team-based approach. Collaboration among cross-functional teams ensures comprehensive input and insight, promoting diverse perspectives. Having team representatives from a cross-section of your organization (including manufacturing, engineering, quality, customer service, supply, etc.) ensures a comprehensive understanding of the potential failures modes in products, systems, and processes.
Step #4: Determine the effects of each failure
How would each failure affect the system or user?
When determining effects be sure to consider all scenarios and focus on the end impact. Do not let likelihood influence whether or not the failure mode is listed. Be as specific and quantifiable as possible and consider secondary and cascading effects.
As more data on effects become available the study should be updated periodically.
Be sure to address the effects on all stakeholders including:
- Customers: Usability, safety, satisfaction.
- Business: Costs, production and manufacturing delays, brand reputation.
- Regulatory Compliance: Legal and standards-related consequences.
Pictured above is an example of Finite Element Analysis (FEA) of a custom robot chassis. FEA can provide insight into what will happen to the structure if it fails. In this case, if failure were to occur we would want that structural failure putting physical stress on the battery because if the battery is damaged it might lead to a fire.
Step #5: Assess severity
How severe is each effect?
Severity focuses on the consequences of a failure if it occurs, independent of its likelihood or detectability. Assessments for severity should be based on the most severe outcome of the failure. Consider all levels of impact and its criticality, including:
- Local Impact: Immediate effect on the failing component or subsystem.
- Systemic Impact: Effect on the entire system or product functionality.
- End-User Impact: How the failure affects user safety, satisfaction, or experience.
- Regulatory Impact: Whether the failure leads to legal or compliance issues.
Step #6: Analyze causes of failure
What might cause each failure?
Ensure the FMEA focuses on realistic failure modes. Consider known weaknesses in materials, similar systems or products. In addition, reference data if available of how the product or process (or similar product or process) has failed in the past.
Step #7: Estimate the likelihood of occurrence for each failure mode & cause
What is the likelihood of each failure occurring?
Historical data of similar products and/or processes is a preferred way of evaluating likelihood. In instances where historical data isn’t available use component reliability data such as MTBF (mean time before failure), MTTF (mean time to failure), and Weibull Analysis. Consider other common incident metrics and the criticality of these incidences, as appropriate to your project.
Step #8: Determine the controls around each failure mode and root cause
What steps can be taken to mitigate risks?
Determining effective controls involves identifying and implementing measures that reduce the likelihood of failure or its impact. Effective controls address failure causes or modes—the criticality of of these issues necessitates FMEA to maintain product and process reliability.
Step #9: Estimate your detection level for each failure mode, cause and effect
What is the likelihood of detection?
Determining detectability involves evaluating the likelihood that a failure mode or its cause can be identified before it leads to a problem. It is important to keep in mind what needs to detect the failure mode (e.g. a person or a control system).
As part of the exercise it is advised to list all detection methods including inspections, testing protocols and monitoring systems. Also consider the capabilities of the detection method including sampling time, resolution, time between inspections, speed, coverage, reliability, etc. Ask the team: “Will the detection method be quick and reliable enough?”
Step #10: Calculate the risk priority number for each failure mode
How serious is the failure mode?
The larger the Risk Priority Number (RPN) the higher the priority to mitigate the potential failure mode. (RPN = Severity x Occurrence x Detection)
RPN is invaluable when making business-based decisions regarding prioritization of change management and risk management efforts.
Step #11: Determine corrective actions
What steps can be taken to mitigate risks?
Now that the team has determined what needs to be done to adequately reduce risk the real work begins. Corrective actions need to be formally documented and should focus on failure modes with a high RPN. Corrective actions should focus on the root cause of the failure mode.
Effective Tools For Supporting FMEA
Root Cause Analysis
Root Cause Analysis includes:
- 5 Whys technique
- Fishbone diagram (also known as Ishikawa and includes the 8Ms)
- Fault Tree Analysis
These methods are often utilized to analyze the cause of the failure mode.
Design Trees → Function Trees
With Design FMEA, a product is broken down into functions and each function will have associated failure modes. These failure modes can then be used in the FMEA process.
Prototyping and Testing
Prototyping and testing are often used to determine what the failure modes will be in operational scenarios.
Model-Based Simulation
Model based simulation with fault injection, for examples, allows teams to develop products in a digital world where real-world scenarios are simulated and the design can be improved before actually being built.
Whether you’re combining software and hardware to build a custom solution or simply testing a physical product, model-based simulation is invaluable.
Digital Twins
A Digital Twin is a duplicate of a physical system or object that is used for simulation, analysis and control. Digital twin technology can be effectively integrated with FMEA to enhance its capabilities by providing a virtual environment to simulate potential failure modes and their effects, allowing for more accurate risk assessments and proactive mitigation strategies.
Essentially, a digital twin acts as a powerful tool to enrich the data used in FMEA analysis.
Reliability and Failure Mode and Effects Analysis (FMEA)
Reliability and FMEA work together as complementary disciplines to ensure product or process robustness.
- FMEA identifies potential failure modes, their causes, and effects
- Reliability engineering focuses on quantifying and improving the likelihood that a product or system will perform its intended function without failure over time
Operator and User Feedback
This essential data can be collected and utilized to refine the initial FMEA and continue to improve the system or object.
Process Flow Diagrams and Process Control Plans
A process flow diagram (PFD) acts as a visual representation of a process, outlining each step involved, and a process control plan details the specific monitoring and inspection methods required for failure prevention.
The process flow diagram provides the foundation for creating a detailed control plan by identifying the key steps where quality checks need to be implemented.
Industry-Specific Guidelines for Failure Likelihoods
Reference industry-specific guidelines for failure likelihoods, system safety, and risk prevention protocols:
- Example: AIAG & VDA FMEA (automotive standard)
- Example: ARP4761 & ARP4754 (aerospace standard)
There are many variations of FMEA. As stated previously, be sure to adhere to the FMEA process that best fits.
Delphi Method
The Delphi Method is a research tool that uses a series of questionnaires to gather insights from a panel of experts. Leveraging subject matter experts (SMEs) is key to identifying failure modes, estimating likelihood, severity and detectability.
The Delphi method is an effective way of collecting data from SMEs without letting the loudest person in the room influence the others.
Failure Mode and Effects Analysis (FMEA) is essential for risk management in engineering
Failure Mode and Effects Analysis is not just a quality assurance tool; it is an essential part of the engineering process that helps organizations proactively manage risks and implement prevention strategies. By embracing FMEA, teams can significantly improve product reliability, enhance safety, and streamline costs, all while promoting a culture of continuous improvement.
In a competitive landscape, the benefits of FMEA are undeniable, making it a critical element in the arsenal of every engineer.