Skip to Tutorial Content

Introduction

Welcome to the tutorial on Reliability, Availability, and Maintainability (RAM)! In this tutorial, you will learn about these fundamental concepts and their applications.

Learning Objectives

By the end of this module, learners will be able to:

  • Define key reliability metrics, including reliability, availability, and failure rate.
  • Describe the significance of MTTR, MTTF, and MTBF in reliability engineering.
  • Calculate probability of failure using given reliability data.
  • Interpret \(B_n\) or \(L_n\) life values in the context of product reliability.
  • Differentiate between different reliability measures.
  • Compute system reliability for series and parallel configurations.

What is RAM?

The terms Reliability, Availability, and Maintainability are related but distinct concepts.

Reliability: The ability of an item to perform its intended function without failure over a specified period.

Availability: The proportion of time an item is in a functioning condition.

Maintainability: The ease and speed with which an item can be restored to operational status after a failure.

These concepts are interrelated in the sense that a highly reliable item is likely to be available more often, and an item that is easy to maintain can be restored to service quickly, thus improving availability.

Reliability

Reliability is the probability that an item will not fail under defined conditions for a specified period. Specifically, reliability is defined as:

\[ Reliability = (1 - (Failed Time/Total Time))*100 \] Where Failed Time is the time that the item was not functioning, and Total Time is the total time that the item was in operation.

For example, suppose a motor ran for 5 years total and was failed for 10 of those days. The reliability of the motor is:

\[ Reliability = (1 - (10/(5*365)))*100 = 94.5% \] So, the motor has a reliability of 94.5% over the 5-year period.

Unreliability is the opposite of reliability, that is, the probability that an item will fail under defined conditions for a specified period. Unreliability is defined as:

\[ Unreliability = 1 - Reliability = (Failed Time/Total Time)*100 \] Take the previous example, the unreliability of the motor is:

\[ Unreliability = 1 - 0.945 = 0.055 = (10/(5*365))*100 = 5.5% \] The motor has an unreliability of 5.5% over the 5-year period.

Exercise: Reliability

Availability

Availability is the probability that an item will be available for service. Unavailable time includes failed time and scheduled maintenance. Specifically, availability is defined as:

\[ Availability = (1 - (Unavailable Time/Total Time))*100 \] Where Unavailable Time is the time that the item was not available for service, and Total Time is the total time that the item was in operation.

Unavailable time is different than standby time, where standby time is when the item is available for service, but is not being used.

Suppose a motor ran for 5 years total, was failed for 10 days, and had scheduled maintenance for 15 days. The availability of the motor is:

\[ Availability = (1 - ((10+15)/(5*365)))*100 = 93.2% \] So the motor has an availability of 93.2% over the 5-year period.

Exercise: Availability

Mean Time to Repair

The Mean Time to Repair (MTTR) is a measure of maintainability and is defined as the average time required to repair a failed item. The MTTR is the ratio of total repair time by the number of repairs for a given item.

\[ MTTR = \sum_{i=1}^n RepairTime_i/RepairCount \] Where \(RepairTime_i\) is the time to repair for the ith repair, and \(RepairCount\) is the total number of repairs.

Suppose 100 motors run for 5 years total. During that time, 5 failures occur with repair times in days of 5, 10, 15, 8, and 12. The MTTR is:

\[ MTTR = (5+10+15+8+12)/5 = 10 days \]

Exercise: MTTR

Mean Time to Failure

The Mean Time to Failure (MTTF) is a measure of the reliability for non-repairable items, meaning that once the item fails, it is not repaired, but replaced. The MTTF is the ratio of total time by the number of failures for a given item.

\[ MTTF = Total Time/FailureCount \] Where Total Time is the total time that the item was in operation, and FailureCount is the total number of failures.

For example, suppose 100 motors run for 5 years total. During that time, 5 failures occur. The MTTF is:

\[ MTTF = (5*100)/5 = 100 years \] An MTTF of 100 years means that, on average, a single motor is expected to run for 100 years before failing. This may seem high, but remember that this is an average across all motors, and some motors may fail much sooner while others may last much longer.

Mean Time Between Failures

Mean Time Between Failures (MTBF) is a measure of the reliability for repairable systems, meaning that once the item fails, it is repaired and put back into service. In this way, the item can fail multiple times. The MTBF is the ratio of total time by the number of failures for a given item.

The calculation is similar to the MTTF, just with a slightly different interpretation.

Take an example where a motor fails 5 times during a total time of 10,000 hours. The MTBF is:

\[ MTBF = 10000/5 = 2000 hours \] An MTBF of 2000 hours means that, on average, the motor is expected to run for 2000 hours between failures.

Exercise: MTTF and MTBF

Calculate the MTBF for the following scenario.

# A fleet of 50 generators runs for 8 years total.
# During that time, 20 failures occur.
# Calculate the MTBF in years.
fleet_size <- 50
years      <- 8
failures   <- 20
# MTBF = Total Time / Failure Count
# Total time = fleet_size * years
fleet_size <- 50
years      <- 8
failures   <- 20
MTBF <- (fleet_size * years) / failures
MTBF  # 20 years

Failure Rate

The failure rate is the ratio of the number of failures by the total operational time for a given item and a specified period. Since the MTBF is the ratio of total time by the number of failures, the failure rate is the inverse of the MTBF. Specifically, the failure rate is defined as:

\[ \lambda = 1/MTBF \] Where \(\lambda\) is the failure rate.

The failure rate can also be calculated directly as the number of failures divided by the total operational time.

\[ \lambda = FailureCount/Total Time \] Where Total Time is the total time that the item was in operation, and FailureCount is the total number of failures.

A key assumption of MTBF is that the failure rate is constant, meaning that the item has the same probability of failing at any time during its operation. In practice, this is not always true, as items may have higher failure rates at the beginning or end of their life. However, for many items, the constant failure rate assumption is a reasonable approximation.

To illustrate, suppose 100 motors run for 10 years total. During that time, 20 failures occur. The failure rate is:

\[ \lambda = 20/(10*100) = 0.0055 \text{ failures per year} \] A failure rate of 0.0055 failures per year means that, on average, there are 0.0055 failures for every motor per year of operation.

Probability of Failure

Failures do not often occur at predictable times and often occur randomly according to a probability distribution. A common assumption is that failures occur according to a Poisson process, meaning that the number of failures in a given time period follows a Poisson distribution. When failures follow a Poisson process, the time between failures follows an Exponential distribution.

For an exponential distribution, the probability of failure prior to time t can be calculated using the cumulative distribution function (CDF) of the exponential distribution. Specifically, the cumulative probability of failure prior to time t, F(t), is given by:

\[ F(t) = 1 - e^{-t\lambda} \] Where \(\lambda\) is the failure rate.

The cumulative probability of survival prior to time t, R(t), is the opposite of F(t), that is

\[ R(t) = 1 - F(t) \] \[ R(t) = e^{-t\lambda} \]

For example, suppose a motor has a failure rate of 0.1 failures per year. The probability of survival at time 10 is:

\[ R(10) = e^{-10*0.1} = 0.3679 = 36.79% \] So, there is a 36.79% chance that the motor will survive for 10 years without failing or a 63.21% chance that it will fail before 10 years.

Exploring the Reliability Function

The shape of the reliability curve \(R(t) = e^{-\lambda t}\) depends entirely on the failure rate \(\lambda\). Use the slider below to explore how different failure rates affect reliability over time.

As you increase \(\lambda\), the curve drops more steeply. Note that at \(t = MTBF = 1/\lambda\), the reliability is always exactly \(e^{-1} \approx 36.8\%\), regardless of the failure rate. This is a universal property of the exponential distribution.

Calculate the probability of survival in this exercise.

# A pump has a failure rate of 0.05 failures per year.
# Calculate the probability of survival at t = 10 years.
lambda <- 0.05
t      <- 10
# R(t) = exp(-lambda * t)
lambda <- 0.05
t      <- 10
R <- exp(-lambda * t)
R  # ~0.607 (60.7% survival probability)

\(B_n\) or \(L_n\) Life

The \(B_n\) life is defined as the time at which n % of a population are expected to fail (or 1-n % are expected to survive). For example, the B10 life is the time at which 10% of a population are expected to fail, B90 is when 90% are expected to fail, and so on. Some industries use the term \(L_n\) life instead of \(B_n\) life, but they are the same concept.

To calculate the \(B_n\) life, we can solve for time t in the cumulative distribution function (CDF) of the exponential distribution. Specifically, we set F(t) equal to n % and solve for t.

\[ t = -ln(1-F(t))/\lambda \]

For example, suppose a factor motor has a failure rate of 0.2 failures per year. The B10 life is:

\[ B10 = -ln(1-0.1)/0.2 = 0.526 years \] So at the B10 life of 0.526 years, 10% of the motors are expected to have failed.

Exercise: Failure Rate, Probability of Failure, and Bn Life

Now calculate the B10 life in R.

# A component has a failure rate of 0.05 failures per year.
# Calculate the B10 life (time at which 10% of components have failed).
lambda <- 0.05
# Solve F(t) = 0.10 for t:  t = -log(1 - 0.10) / lambda
lambda <- 0.05
B10 <- -log(1 - 0.10) / lambda
B10  # ~2.1 years

From Exponential to Weibull

The exponential model assumes a constant failure rate throughout an item’s life. This is appropriate for electronic components that fail at random, but many mechanical components have failure rates that change over time, increasing as they wear out, or decreasing as early defects are screened out.

The Weibull distribution generalizes the exponential by adding a shape parameter \(\beta\):

\[ R(t) = e^{-(t/\eta)^\beta} \]

When \(\beta = 1\), the Weibull reduces exactly to the exponential with \(\lambda = 1/\eta\). Everything covered in this tutorial applies as a special case.

\(\beta\) Failure rate Typical cause
\(< 1\) Decreasing Infant mortality (early defects eliminated)
\(= 1\) Constant Random failures — exponential model applies
\(> 1\) Increasing Wear-out (fatigue, corrosion, aging)

The WeibullR package fits Weibull models to failure data. Here the same motor failure scenario is fit with a 2-parameter Weibull:

failures <- c(500, 820, 1100, 1350, 1590)
fit <- MLEw2p(failures, show = TRUE)

A \(\beta\) close to 1 is consistent with the exponential (random failures); higher values indicate wear-out. The Life Data Analysis module covers Weibull analysis in full depth.

System Reliability

So far we have focused on a single component. Real systems combine many components, and the system reliability depends on how those components are arranged.

Series Systems

In a series configuration, every component must function for the system to function. The system reliability is the product of individual component reliabilities:

\[ R_{sys} = R_1 \times R_2 \times \cdots \times R_n \]

Three pumps in series, each with 90% reliability:

R_components <- c(0.90, 0.90, 0.90)
R_series <- prod(R_components)
R_series  # 72.9%
## [1] 0.729

The series system is less reliable than any individual component. Adding more components in series always reduces system reliability.

Parallel Systems

In a parallel configuration, only one component needs to function — this is redundancy. The system fails only when all components fail:

\[ R_{sys} = 1 - (1-R_1)(1-R_2) \cdots (1-R_n) \]

The same three pumps in parallel:

R_parallel <- 1 - prod(1 - R_components)
R_parallel  # 99.9%
## [1] 0.999

Redundancy dramatically improves system reliability. Use the slider below to explore how the number of redundant components affects system reliability.

Exercise: System Reliability

Calculate series and parallel system reliability in R.

# Four components each have reliability R = 0.85.
# 1. Calculate the reliability of a series system.
# 2. Calculate the reliability of a parallel system.
R_comp <- 0.85
# Series:   R_sys = prod(R_components)
# Parallel: R_sys = 1 - prod(1 - R_components)
R_comp <- rep(0.85, 4)
R_series   <- prod(R_comp)
R_parallel <- 1 - prod(1 - R_comp)
R_series    # ~0.522 (52.2%)
R_parallel  # ~0.9995 (99.95%)

Summary

Congratulations on completing the Reliability, Availability, and Maintainability (RAM) tutorial!

In this tutorial, we introduced the concepts of Reliability, Availability, and Maintainability. You learned to calculate key reliability metrics, explored the exponential reliability model interactively, computed series and parallel system reliability, and saw how the Weibull distribution generalizes the exponential model.

Key takeaways:

  • \(R(t) = e^{-\lambda t}\) — exponential reliability under a constant failure rate.
  • \(MTBF = 1/\lambda\) — at \(t = MTBF\), reliability is always 36.8%.
  • Series systems: \(R_{sys} = \prod R_i\) — reliability decreases with more components.
  • Parallel systems: \(R_{sys} = 1 - \prod(1-R_i)\) — redundancy increases reliability.
  • When \(\beta \neq 1\), use the Weibull distribution (see the Life Data Analysis module).

References

  • Abernethy, R.B. (2004) The New Weibull Handbook. Fifth Edition.

  • Aden-Buie G, Schloerke B, Allaire J, Rossell Hayes A (2023). learnr: Interactive Tutorials for R. https://rstudio.github.io/learnr/, https://github.com/rstudio/learnr.

  • Silkworth D, Symynck J (2022). WeibullR: Weibull Analysis for Reliability Engineering. R package version 1.2.1, https://CRAN.R-project.org/package=WeibullR.

Reliability, Availability, and Maintainability (RAM)