Article

Understanding Failure Mode and Effects Analysis (FMEA): A Crucial Step in Engineering Excellence

Harmony_2600x1565

In the engineering field—whether the focus is on mechanical engineering, firmware engineering, electrical engineering, RF engineering, or one of the other engineering subdisciplines—ensuring reliability and quality in products is paramount. Enter Failure Mode and Effects Analysis (FMEA), a systematic approach for identifying and addressing potential failures within a product or process. 

By anticipating issues before they arise, FMEA not only enhances product quality but also saves time and resources in the long run.

What is Failure Mode and Effects Analysis (FMEA)?

In an end-to-end systems design or product development process, FMEA is used to proactively evaluate the effects of possible failures. From identifying potential failure modes to determining the effects of each failure and prioritizing actions for risk mitigation, FMEA encompasses numerous critical engineering initiatives. Key approaches, protocols, and outcomes for FMEA include:

  • FMEA Necessitates a Team-Based Approach: Collaboration among cross-functional teams ensures comprehensive input and insight, promoting diverse perspectives.  In many industries there must be records and justification why the team is competent and capable.
  • FMEA Requires Structured Documentation: A well-defined format helps maintain clarity and consistency throughout the analysis.
  • FMEA Yields a Risk Priority Number (RPN): This numerical score helps prioritize failures based on their severity, occurrence, and detection rates.
  • FMEA Benefits from Continual Improvement: FMEA is a continual process.  As corrective measures are taken the study should be re-evaluated to ensure it is within the risk tolerance acceptable for the product.  As the product matures more data will be available and this data should be periodically updated. In many cases, this will lead to more product improvements via corrective actions.

By carrying out vital tasks of assessing failure severity, analyzing causes of failure, prioritizing risks based on likelihood to take place and expected outcome, the teams that conduct FMEA can approach the risk mitigation process in a systematic, highly-documented fashion that is measurable and above all, actionable. 

Why is Failure Mode and Effects Analysis important in the engineering process?

  1. Enhanced Product Reliability: By identifying potential failure points in the design stage, FMEA allows engineers to rectify issues before products reach production, significantly enhancing reliability.
  1. Cost Reduction: Identifying risks early in the design phase minimizes the costs associated with product recalls, warranty claims, and reputation damage. A proactive approach saves money compared to reactive strategies.
  1. Improved Safety: FMEA directly addresses safety by considering the implications of failures, particularly in industries such as automotive and aerospace, where failures can have catastrophic consequences.
  1. Regulatory Compliance: Many industries require adherence to safety and reliability standards. FMEA can help organizations meet these requirements by systematically evaluating potential risks.
  1. Continuous Improvement: FMEA not only fosters a culture of quality and safety but also promotes continuous improvement initiatives by encouraging teams to regularly assess and refine processes.
  1. Customer Satisfaction: Ultimately, products designed with FMEA in mind are more reliable and of higher quality. Focusing on building better products creates a positive customer impact, leading to improved customer satisfaction and loyalty.

The Failure Mode and Effects Analysis (FMEA) Process: 11 Key Steps

The 11 steps of FMEA are as follows. These may change based on the organization conducting the FMEA analysis; however, they represent the comprehensive focus of Failure Mode and Effects Analysis and the outcomes you can expect by investing in the process.

Step #1: Establish the ground rules

How will the team evaluate throughout the Failure Mode and Effects Analysis (FMEA) process?

The scales for severity, occurrence and detection must be consistent across the study. How the rating scales affect the analysis needs to be considered and the definitions need to fit your application.

Failure Modes and Effects Analysis (FMEA) ground rules should be defined by the standard, work instruction, or quality procedure used in analyzing specific types of processes or products.

As stated previously, there are many variations of FMEA. Be sure to adhere to the standard and type of FMEA that fits your applications.

Failure Mode and Effects Analysis (FMEA) scales can be modified between different industries and use cases.
Source

Step #2: Define the object to be analyzed

Determine what system, process or design needs to be analyzed and specify scope.

A well-defined scope ensures the FMEA addresses the right risks, avoids unnecessary complexity, and provides actionable insights. Early in the design process the scope is typically broad and focuses on higher-level concepts or risks.

As the design progresses FMEA may be applied to specific parts, subsystems, or subprocesses.

Step #3: Identify potential failure modes

What could go wrong?

It is important to have an experienced cross-functional and diverse team when conducting FMEA.  A variegated team provides more assurance that critical failure modes will be identified.  The study will only be as good as the potential failure modes identified.

At its core, FMEA is a team-based approach. Collaboration among cross-functional teams ensures comprehensive input and insight, promoting diverse perspectives. Having team representatives from a cross-section of your organization (including manufacturing, engineering, quality, customer service, supply, etc.) ensures a comprehensive understanding of the potential failures modes in products, systems, and processes.

Step #4: Determine the effects of each failure

How would each failure affect the system or user?

When determining effects be sure to consider all scenarios and focus on the end impact. Do not let likelihood influence whether or not the failure mode is listed. Be as specific and quantifiable as possible and consider secondary and cascading effects.

As more data on effects become available the study should be updated periodically.

Be sure to address the effects on all stakeholders including:

  • Customers: Usability, safety, satisfaction.
  • Business: Costs, production and manufacturing delays, brand reputation.
  • Regulatory Compliance: Legal and standards-related consequences.
Finite Element Analysis (FEA) is related to Failure Mode and Effects Analysis (FMEA), and is helpful in determining the effects of various product failures.

Pictured above is an example of Finite Element Analysis (FEA) of a custom robot chassis. FEA can provide insight into what will happen to the structure if it fails. In this case, if failure were to occur we would want that structural failure putting physical stress on the battery because if the battery is damaged it might lead to a fire.

Step #5: Assess severity

How severe is each effect?

Severity focuses on the consequences of a failure if it occurs, independent of its likelihood or detectability. Assessments for severity should be based on the most severe outcome of the failure. Consider all levels of impact and its criticality, including:

  • Local Impact: Immediate effect on the failing component or subsystem.
  • Systemic Impact: Effect on the entire system or product functionality.
  • End-User Impact: How the failure affects user safety, satisfaction, or experience.
  • Regulatory Impact: Whether the failure leads to legal or compliance issues.

Step #6: Analyze causes of failure

What might cause each failure?

Ensure the FMEA focuses on realistic failure modes. Consider known weaknesses in materials, similar systems or products. In addition, reference data if available of how the product or process (or similar product or process) has failed in the past.

Step #7: Estimate the likelihood of occurrence for each failure mode & cause

What is the likelihood of each failure occurring?

Historical data of similar products and/or processes is a preferred way of evaluating likelihood.  In instances where historical data isn’t available use component reliability data such as MTBF (mean time before failure), MTTF (mean time to failure), and Weibull Analysis. Consider other common incident metrics and the criticality of these incidences, as appropriate to your project.

Step #8: Determine the controls around each failure mode and root cause

What steps can be taken to mitigate risks?

Determining effective controls involves identifying and implementing measures that reduce the likelihood of failure or its impact. Effective controls address failure causes or modes—the criticality of of these issues necessitates FMEA to maintain product and process reliability.

A design FMEA (DFMEA) template for the design phase of a product design and development, built to assess the impact of potential failures (and their visually heatmapped severity) on product safety and functionality.
A design FMEA (DFMEA) template for the design phase of a product design and development, built to assess the impact of potential failures (and their visually heatmapped severity) on product safety and functionality. [credit: Lulu Richter, Smartsheet]

Step #9: Estimate your detection level for each failure mode, cause and effect

What is the likelihood of detection?

Determining detectability involves evaluating the likelihood that a failure mode or its cause can be identified before it leads to a problem. It is important to keep in mind what needs to detect the failure mode (e.g. a person or a control system).

As part of the exercise it is advised to list all detection methods including inspections, testing protocols and monitoring systems. Also consider the capabilities of the detection method including sampling time, resolution, time between inspections, speed, coverage, reliability, etc. Ask the team: “Will the detection method be quick and reliable enough?”

Step #10: Calculate the risk priority number for each failure mode

How serious is the failure mode?

The larger the Risk Priority Number (RPN) the higher the priority to mitigate the potential failure mode. (RPN = Severity x Occurrence x Detection)
RPN is invaluable when making business-based decisions regarding prioritization of change management and risk management efforts.

Step #11: Determine corrective actions

What steps can be taken to mitigate risks?

Now that the team has determined what needs to be done to adequately reduce risk the real work begins. Corrective actions need to be formally documented and should focus on failure modes with a high RPN. Corrective actions should focus on the root cause of the failure mode.

Effective Tools For Supporting FMEA

Root Cause Analysis

Root Cause Analysis includes:

These methods are often utilized to analyze the cause of the failure mode.

Design Trees → Function Trees

With Design FMEA, a product is broken down into functions and each function will have associated failure modes. These failure modes can then be used in the FMEA process.

Prototyping and Testing 

Prototyping and testing are often used to determine what the failure modes will be in operational scenarios. 

Model-Based Simulation

Model based simulation with fault injection, for examples, allows teams to develop products in a digital world where real-world scenarios are simulated and the design can be improved before actually being built.

Whether you’re combining software and hardware to build a custom solution or simply testing a physical product, model-based simulation is invaluable.

Digital Twins

A Digital Twin is a duplicate of a physical system or object that is used for simulation, analysis and control. Digital twin technology can be effectively integrated with FMEA to enhance its capabilities by providing a virtual environment to simulate potential failure modes and their effects, allowing for more accurate risk assessments and proactive mitigation strategies. 

Essentially, a digital twin acts as a powerful tool to enrich the data used in FMEA analysis.

Digital twin technology can be effectively integrated with Failure Mode and Effects Analysis (FMEA) for accurate risk assessments and proactive mitigation strategies.
An aerospace-related representation of a Digital Twin. [Source: TechTarget]

Reliability and Failure Mode and Effects Analysis (FMEA)

Reliability and FMEA work together as complementary disciplines to ensure product or process robustness.

  • FMEA identifies potential failure modes, their causes, and effects
  • Reliability engineering focuses on quantifying and improving the likelihood that a product or system will perform its intended function without failure over time

Operator and User Feedback

This essential data can be collected and utilized to refine the initial FMEA and continue to improve the system or object.

Process Flow Diagrams and Process Control Plans

A process flow diagram (PFD) acts as a visual representation of a process, outlining each step involved, and a process control plan details the specific monitoring and inspection methods required for failure prevention.

The process flow diagram provides the foundation for creating a detailed control plan by identifying the key steps where quality checks need to be implemented.

Industry-Specific Guidelines for Failure Likelihoods

Reference industry-specific guidelines for failure likelihoods, system safety, and risk prevention protocols:

There are many variations of FMEA. As stated previously, be sure to adhere to the FMEA process that best fits.

Delphi Method

The Delphi Method is a research tool that uses a series of questionnaires to gather insights from a panel of experts. Leveraging subject matter experts (SMEs) is key to identifying failure modes, estimating likelihood, severity and detectability.

The Delphi method is an effective way of collecting data from SMEs without letting the loudest person in the room influence the others.

Failure Mode and Effects Analysis (FMEA) is essential for risk management in engineering

Failure Mode and Effects Analysis is not just a quality assurance tool; it is an essential part of the engineering process that helps organizations proactively manage risks and implement prevention strategies. By embracing FMEA, teams can significantly improve product reliability, enhance safety, and streamline costs, all while promoting a culture of continuous improvement. 

In a competitive landscape, the benefits of FMEA are undeniable, making it a critical element in the arsenal of every engineer.

Ben-Jordan (1)

Ben Jordan

Mechanical Engineering Manager

Ben has over 15 years of experience overseeing mechanical engineering projects from concept to completion. His experience includes the design, development, and project management for products such as custom winches, cranes, robots, and automated machines. He’s also skilled in electrical and controls design, including schematic design, panel layout, and PLC programming.

Ben earned a Master of Science in Mechanical Engineering from the University of Washington and is a licensed Professional Engineer in both Washington and Oregon.

He enjoys time with his wife and three children, home improvement projects, and learning ROS (Robot Operating System).