When dealing with large production runs, such as a batch of a million electronic components, assessing the quality or defect proportion can be challenging. In this blog post, we'll explore sampling methods, confidence intervals, and the probability framework for tackling this problem. We'll also delve into building operating characteristic (OC) curves to evaluate the performance of our sampling plan. [1]
Sampling Methods
To obtain a representative sample from a large production run, we can use the following sampling methods:
- Simple Random Sampling (SRS): Each component has an equal probability of being selected. This method is unbiased and easy to implement.
- Stratified Sampling: If the components have distinct subgroups or strata based on certain characteristics, we can divide the population into strata and randomly sample from each stratum proportionally to its size in the population.
Confidence Intervals
Confidence intervals provide a range of values likely to contain the true population parameter (proportion of defective components) with a certain level of confidence. The most common confidence level is 95%. The confidence interval for the proportion of defective components is given by:
Where:
- is the sample proportion of defective components (aka relative frequency)
- is the critical value from the standard normal distribution (e.g., 1.96 for a 95% confidence level)
- is the sample size
Probability and Statistical Framework
The Central Limit Theorem (CLT) states that the sampling distribution of the sample proportion approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This allows us to make inferences about the population based on the sample data.
The standard error of the sample proportion, denoted as , measures the variability of the sample proportion and is calculated as:
The probability of the sample representing the main population increases as the sample size grows. To determine the minimum sample size required to achieve a desired margin of error () and confidence level (CL), we can use the following formula:
Python Simulation
Let's simulate sampling 100 electronic components from a production run of 1 million components, assess the defects/good components, and repeat the process using Monte Carlo simulation. We'll then compare the results to the analytical results.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom, poisson
# Set the production run size and sample size
production_run_size = 1000000
sample_size = 100
# Set the true proportion of defective components in the production run
true_defect_proportion = 0.05
# Set the number of Monte Carlo iterations
num_iterations = 10000
# Initialize arrays to store the sample proportions and confidence intervals
sample_proportions = np.zeros(num_iterations)
confidence_intervals = np.zeros((num_iterations, 2))
# Perform Monte Carlo simulation
for i in range(num_iterations):
# Generate a random sample of components (0: good, 1: defective)
sample = np.random.binomial(1, true_defect_proportion, sample_size)
# Calculate the sample proportion of defective components
sample_proportion = np.mean(sample)
# Calculate the standard error of the sample proportion
standard_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)
# Calculate the confidence interval (95% confidence level)
z_critical = 1.96
lower_bound = sample_proportion - z_critical * standard_error
upper_bound = sample_proportion + z_critical * standard_error
# Store the sample proportion and confidence interval
sample_proportions[i] = sample_proportion
confidence_intervals[i] = [lower_bound, upper_bound]
# Calculate the mean and standard deviation of the sample proportions
mean_sample_proportion = np.mean(sample_proportions)
std_sample_proportion = np.std(sample_proportions)
# Calculate the analytical standard error
analytical_standard_error = np.sqrt((true_defect_proportion * (1 - true_defect_proportion)) / sample_size)
# Display the results
print("Monte Carlo Simulation Results:")
print(f"Mean Sample Proportion: {mean_sample_proportion:.4f}")
print(f"Standard Deviation of Sample Proportions: {std_sample_proportion:.4f}")
print(f"Analytical Standard Error: {analytical_standard_error:.4f}")
Monte Carlo Simulation Results:
Mean Sample Proportion: 0.0502
Standard Deviation of Sample Proportions: 0.0217
Analytical Standard Error: 0.0218
Operating Characteristic Curves
An operating characteristic (OC) curve plots the probability of accepting the production run based on the sample results for different true population proportions of defective components. To build an OC curve, we need to:
- Define the acceptance criteria:
- Specify the acceptable quality level (AQL) and the rejectable quality level (RQL).
- Determine the sample size () and acceptance number ().
- Calculate the probability of acceptance for various true population proportions using the binomial distribution:
- Plot the OC curve with true population proportions on the x-axis and probabilities of acceptance on the y-axis.
Here's an example of building an OC curve in Python:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom
# Define the acceptance criteria
AQL = 0.03 # Acceptable Quality Level
RQL = 0.10 # Rejectable Quality Level
n = 100 # Sample size
c = 5 # Acceptance number
# Range of true population proportions
p_range = np.arange(0, 0.21, 0.01)
# Calculate the probability of acceptance for each true population proportion
prob_acceptance = binom.cdf(c, n, p_range)
# Plot the OC curve
plt.figure(figsize=(8, 6))
plt.plot(p_range, prob_acceptance, linewidth=2)
plt.xlabel('True Population Proportion of Defective Components')
plt.ylabel('Probability of Acceptance')
plt.title('Operating Characteristic (OC) Curve')
plt.grid(True)
plt.axvline(AQL, linestyle='--', color='r', linewidth=1.5, label='AQL')
plt.axvline(RQL, linestyle='--', color='g', linewidth=1.5, label='RQL')
plt.legend()
plt.show()
Alternatively, we can use the Poisson distribution as an approximation to the binomial distribution when the sample size is large and the proportion of defective components is small. The probability of acceptance using the Poisson distribution is given by:
where is the average number of defective components per sample. [1]
Probability of Acceptance - Poisson vs. Binomial
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom, poisson
# Set the production run size and sample size
production_run_size = 1000000
sample_size = 1000
# Set the true proportion of defective components in the production run
true_defect_proportion = 0.01
# Calculate the average number of defective components per sample
lambda_defects = sample_size * true_defect_proportion
# Range of the number of defective components
defects_range = np.arange(0, 30)
# Calculate the probability of observing each number of defective components using the binomial distribution
prob_binomial = binom.pmf(defects_range, sample_size, true_defect_proportion)
# Calculate the probability of observing each number of defective components using the Poisson distribution
prob_poisson = poisson.pmf(defects_range, lambda_defects)
# Plot the probability distributions
plt.figure(figsize=(8, 6))
plt.plot(defects_range, prob_binomial, 'bo-', label='Binomial')
plt.plot(defects_range, prob_poisson, 'ro-', label='Poisson')
plt.xlabel('Number of Defective Components')
plt.ylabel('Probability')
plt.title('Binomial vs Poisson Distribution')
plt.legend()
plt.grid(True)
plt.show()
# Calculate the probability of acceptance using the binomial distribution
c = 15 # Acceptance number
prob_acceptance_binomial = binom.cdf(c, sample_size, true_defect_proportion)
print(f"Probability of Acceptance (Binomial): {prob_acceptance_binomial:.4f}")
# Calculate the probability of acceptance using the Poisson distribution
prob_acceptance_poisson = poisson.cdf(c, lambda_defects)
print(f"Probability of Acceptance (Poisson): {prob_acceptance_poisson:.4f}")
Probability of Acceptance (Binomial): 0.9521
Probability of Acceptance (Poisson): 0.9513
In this example, we set the production run size to 1 million and the sample size to 1000. The true proportion of defective components is set to 0.01, which is relatively small.
We calculate the average number of defective components per sample using , where is the sample size and is the true defect proportion.
We then generate a range of the number of defective components from 0 to 29 and calculate the probability of observing each number of defective components using both the binomial distribution (binom.pmf
) and the Poisson distribution (poisson.pmf
).
We plot the probability distributions to visually compare the binomial and Poisson distributions. When the sample size is large and the defect proportion is small, the Poisson distribution serves as a good approximation to the binomial distribution.
Finally, we calculate the probability of acceptance using both the binomial distribution (binom.cdf
) and the Poisson distribution (poisson.cdf
) for a given acceptance number (). The acceptance number represents the maximum number of defective components allowed in the sample for the production run to be accepted.
The output will display the probability of acceptance calculated using both distributions. You will notice that the probabilities are very close to each other, demonstrating that the Poisson distribution can be used as an approximation to the binomial distribution in this scenario.
Conclusion
In this blog post, we explored the concepts of sampling, confidence intervals, and operating characteristic curves for assessing the quality or defect proportion in a large production run of electronic components. We discussed various sampling methods, derived the formulas for confidence intervals and sample size determination, and provided Python code for simulating the sampling process and building OC curves.
By understanding these statistical techniques, we can make informed decisions about the quality of a large batch of electronic components based on a representative sample. The OC curves help us evaluate the performance of our sampling plan and balance the risks associated with accepting or rejecting the production run.
Remember, the choice between the binomial and Poisson distributions depends on the sample size and the expected proportion of defective components. When in doubt, the binomial distribution provides more precise calculations.
I hope this blog post has provided you with valuable insights into assessing defective electronic components using statistical methods. Feel free to adapt the Python code and experiment with different scenarios to deepen your understanding of these concepts.
Happy sampling and analyzing!
References
[1] Juran, J. M. (1951). Juranβs Quality Control Handbook. New York: McGraw Hill.