When we put these two results together, we find that the precision to tolerance ratio is, and always has been:

Thus, the fallacy behind drawing Figure 4 or Figure 5 is that the tolerance lines encourage us to interpret a trigonometric function as a proportion. Whenever we do this we will always be wrong.

Why do the simple ratios of Figures 5 and 6 not work as proportions? It has to do with a fundamental property of random variables. Whenever we plot a histogram or a running record, the variation shown will always be a function of the standard deviation. This is not a problem when we are working with a single variable, but when we start combining variables the bogus comparisons creep in because the standard deviations do not add up.

Thus, if we are not careful to compare apples with apples, bogus proportions can creep in when we work with multiple variables. While the graphs will show variation as a function of the standard deviations, these standard deviations will never be additive. This complicates comparisons among the multiple variables.

Some software will plot a running record showing the repeated measurements of the standard with tolerance limits added. How should we interpret this graph?

The average for the data of Figure 1 is 13.495 in. However, since these observed values drift by 3/8 in. in the course of an hour, this average does not provide a useful estimate of the value of the measured insert. A wood yardstick would give us a more reliable measurement of the diameter of one of these inserts than is provided by this electronic vision system.

When the measurement process appears to be predictable, what do the average and standard deviation statistics represent?

The standard deviation statistic of 0.114 in. for Figure 1 does not tell us anything useful about the precision of this vision system, since it has been inflated by both the trend and the upsets.

Before we can talk about bias, we have to have a predictable measurement process.

As above, we traditionally report repeatability using the standard deviation statistic from the repeated measurements. However, a simple multiple of this quantity offers an alternative that is easier to explain and use. This alternative is the probable error, a concept that dates back to Bessel.

Did Richard Lyday need additional data in Figure 1 to know that he had a rubber ruler? Without a predictable measurement process there is no magic number of readings. Without predictability, there is no repeatability and no bias. So we cannot estimate these quantities regardless of how many data we collect.

Whenever your Type 1 repeatability study results in a set of values that display a lack of predictability, then, regardless of the technology involved, your measurement system is nothing more than a rubber ruler.

No, we cannot.

** Figure 4:** 50 measurements of a standard with specifications added

Since this interval includes zero, we have no detectable bias, and any bias present is likely to be less than 0.58 units. Since this is less than the probable error of 1.1 units, we can say that this test is unbiased in the neighborhood of 40.

### Question 7

**Figure 1:** *XmR* chart for 30 measurements of the diameter of one insert

Only a process behavior chart can answer the question raised by Eisenhart. We should always place our repeated measurements on an *XmR* chart to see if they have that degree of consistency that is essential in practice.

For Figure 2, the average is 0.32 units smaller than the accepted value of 40. With 24 *d.f.* the student-t critical value is 1.711, which gives a 90% interval estimate for the difference of:

A Type 1 repeatability study starts with a “standard” item. This standard may be a known standard with an accepted value determined by some master measurement method, or it may be an item designated for use with the study (an “unknown standard”). The standard is measured repeatedly, within a short period of time, by a single operator using a single instrument. Finally, these repeated measurements are used to compute the average and standard deviation statistics, and these are used to characterize the measurement process. This technique can be traced as far back as Friederich Wilhelm Bessel and his *Fundamenta Astronomiae*, published in 1818.

Can we test for bias?

With a predictable measurement process, the average statistic is your measurement system’s best estimate of the value of the measured item. When a known standard is used, the average will let you test for bias.

Let us now assume we have a predictable production process operating with a mean of 60 and a standard deviation of 1.00. A set of 50 product values plotted against the specifications might look like Figure 6.

How can a production process that consumes 67% of the tolerance be combined with a measurement system that consumes 65% of the tolerance and end up with a stream of product measurements that only use 93% of the tolerance?

When we do not place our repeated measures on an *XmR* chart, we are making a naive assumption that the measurement process is predictable. When we use the *XmR* chart, we are checking whether this assumption is reasonable. This is why any assessment of the measurement process that does not begin with a process behavior chart is inherently flawed.

### Question 2

When we understand that the P/T is a trigonometric function times a constant, we discover why this ratio is so hard to interpret.

Thus, to cut the uncertainty in half you will have to collect four times as many observations.

The probable error characterizes the median error of a measurement. A measurement will err by this amount or more at least half the time. As such, the probable error defines the essential resolution of a measurement and tells us how many digits should be recorded. (We will want our measurement increment to be about the same size as the probable error.)

When a measurement process calls for multiple determinations and reports their average as the observed value, a repeatability study will require multiple determinations for each observation, with reloading between each set of multiple determinations.

### Summary

Since this graph makes a comparison that can only be misleading, it should be ignored. The *only* appropriate limits for the running record of values from a Type 1 repeatability study are those of an *XmR* chart.

So while small P/T ratios are good, large P/T ratios are not necessarily bad. This is why it is a mistake to use a P/T ratio to condemn a measurement process.

### Question 6

Here the product variation appears to consume 66.6% of the tolerance and there is very little room left for measurement error. So what happens when we combine the product values *Y* with the measurement errors *E* to get 50 product measurements,* X*?

**Figure 2:** *XmR* chart for 25 measurements with Test Method 65

At the same time, (6 *SD(E)*) defines one side of the right triangle. Thus, the ratio of (6 *SD(E))* to (6 *SD(X))* defines the cosine of the angle denoted by alpha in Figure 8.

Do we always need to use 50 measurements?

To illustrate how the plot of the repeatabilities vs. the tolerance is misleading, I will use a synthetic example. Here we’ll assume that the specifications are 60.0 ± 4.5 units, we have a known standard with an accepted value of 60, and the measurement errors have a mean of zero and a standard deviation of 0.98. Fifty observations of this standard from a Type 1 repeatability study plotted against the specifications might look like Figure 4.

Statistics

## Questions About Type 1 Repeatability Studies

### How to avoid some pitfalls

** Figure 9:** Uncertainty and degrees of freedom

Richard Lyday wanted to evaluate a new vision system used to measure the diameters of steel inserts for steering wheels. He took a single insert and measured it 30 times over the course of one hour and got the diameters shown in Figure 1.

The graphs we draw determine the way we think. The way we think determines the words we use. The words we use determine the actions we take. If we start with the wrong graph, our reality becomes distorted and what we do may be skewed or even incorrect.

All bias is relative. If we perform our Type 1 repeatability study using a known standard that has an accepted value based on some master measurement method, and if our measurements do not display a lack of predictability, then we can compare our average value with the accepted value for the standard.

The repeated measurements of a Type 1 repeatability study belong on an *XmR* chart. The notions of repeatability and bias are predicated upon having a predictable measurement process. This is why the analysis of data from a Type 1 repeatability study should always start with an *XmR* chart.

Do we need to be concerned about the predictability of the measurement process?

** Figure 7:** Product measurements with specifications

How can you measure a part without loading it? Preparing an item for testing is part of the measurement process. Variation in preparation can contribute to measurement error. Even if parts are loaded automatically, loading the item is part of obtaining the measurement.

Should we reload the part between measurements?

The answer lies in the fact that both the 67% and the 65% are bogus proportions. Of the four graphs in this example, only Figure 7 is correct. Figures 4, 5, and 6 misrepresent reality. So, even though your software may give you Figure 4 or 5, these figures are bogus. They have always been bogus, and they will continue to be bogus until the Pythagorean theorem is no longer true.

*Estimated probable error = 0.675 estimated SD(E) *

A simple approach for quantifying measurement error that has been around for over 200 years has recently been packaged as a “Type 1 repeatability study.” This column considers various questions surrounding this technique.

This is why the precision-to-tolerance ratio should not be used to condemn a measurement process. It does not tell the whole story. Since money spent on measurement systems will always be overhead, we should be careful about condemning a measurement system based on a trigonometric function masquerading as a proportion.

The only correct graph to use with the results of a Type 1 repeatability study is an *XmR* chart such as those in Figures 1 and 2.

### Question 5

For the data of Figure 2, our estimate of the value for the known standard is 39.68 units, and our estimate for repeatability is *s* = 1.68 units.

When the measurement errors *E* are independent of the product values *Y*, then the *variance* of the product measurements *X *will be the sum of the *variance* for *Y* plus the *variance* for *E.*

Yes. Sixty years ago, Churchill Eisenhart, senior research fellow and chief of the Statistical Engineering Laboratory at the National Bureau of Standards, wrote the following about Type 1 repeatability studies: “Until a measurement process has been ‘debugged’ to the extent that it has attained a state of statistical control, it cannot be regarded, in any logical sense, as measuring anything at all.”

Consider the previous example. There we had a P/T ratio of 0.65. Based on this value, most arbitrary guidelines would condemn this measurement process. Yet here we have a process that, when operated predictably and on-target, is capable of producing essentially 100%-conforming product. Moreover, the current measurement system is adequate to allow this process to be operated up to its full potential. Here, there is no need to upgrade or change this measurement process.

Once you pass 10 degrees of freedom, you are in the region of diminishing returns. Between 10 and 30 degrees of freedom, your estimate of repeatability will congeal and solidify. The 25 data of Figure 2 give an estimate of repeatability having a coefficient of variation of 14%. Using twice as many data would have reduced the *CV* to 10%. This is why, historically, Type 1 repeatability studies have been based on 25–30 data.

### Question 8

In Figure 4, we see the repeated measurements, (60+*E)*, and the horizontal band shown is (6 *SD(E)*) wide. This part of Figure 4 is correct. It is the inclusion of the specifications on Figure 4 that creates a bogus comparison.

When repeated measures of the same thing display a lack of predictability, the statistics tell us nothing about the random process that is producing the measurements. Here the measurement process simply does not exist as a well-defined entity, and there is no such thing as repeatability or bias.

### Question 4

In Figure 9 the vertical axis shows the coefficient of variation (*CV*), which is the traditional measure of uncertainty for an estimator. The coefficient of variation is the ratio of the standard deviation of an estimator to the mean of that estimator.

With a predictable measurement process, the relationship between the uncertainty in an estimate of dispersion and the amount of data is shown in Figure 9.

Figure 2 shows the *XmR* chart for Test Method 65. A known standard having an accepted value of 40 was measured 25 times using Test Method 65. The average is 39.68, the average moving range is 2.15, and the global standard deviation statistic is 1.684. Here we find no evidence of unpredictability.

For Figure 2, our estimate of probable error is 0.675 (1.68) = 1.1 units. This means that these measurements are good to the nearest whole number of units. They will err by 1.1 units or more at least half the time. Thus, the results for Test Method 65 should be rounded to the nearest whole number; recording fractions of a unit will be excessive.

### Question 3

A failure to detect a bias means that any bias present is too small to be detected with the amount of data available. In this case, the t-test will place an upper bound on the size of any bias present.

Thus, the P/T ratio is *always* a trigonometric function divided by the capability ratio. And everyone who has had high school trigonometry knows that we cannot treat trigonometric functions like they are proportions. They simply do not add up. Never have, never will.

If we subtract the value of the standard, we get a plot of the measurement errors as shown in Figure 5.

However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in *Quality Digest* apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads.

When the measurement process appears to be unpredictable, what do the average and standard deviation statistics represent?

(39.68 – 40) ± 0.58 = –0.90 to 0.26

Here we see that this steel insert grew more than 1/4 in. in diameter over the course of one hour! The trend of these readings over time plus the three upsets are problems with the measurement system. These problems turn this new high-tech, gee-whiz vision system into a rubber ruler.

Specifications always apply to the product measurements, *X*. Thus the specifications, and the specified tolerance, belong on the hypotenuse of the right triangle as shown in Figure 8.

If an item is prepared (loaded) once and then measured multiple times, you will have “multiple determinations.” Multiple determinations do not reflect the repeatability of obtaining a single measurement. For this reason, we need to be careful to distinguish between repeated measurements and multiple determinations.

So the standard deviation of *X* will be the square root of 1.96, which is 1.40. Thus, the only correct way to show the relationship between the standard deviations of *X, Y,* and *E* is to use a right triangle as in Figure 8.

Descriptive statistics can only describe the data. Meaning has to come from the context for the data. When the data are an incoherent collection of different values, the statistics will not represent any underlying properties or characteristics.

With a predictable measurement process, the standard deviation statistic is an estimate of the repeatability of your measurement system. This is the state of affairs shown in Figure 3.

Nothing whatsoever.

So, can we use the P/T ratio to condemn a measurement process?

** Figure 8:** Relationships between *SD(X)*, *SD(Y)*, and *SD(E)*

Our predictable measurement process may be said to be biased relative to the master measurement method if and only if we find a detectable difference between the average and the accepted value for the standard. Here, we typically use the traditional t-test for a mean value. With a detectable bias, the best estimate for that bias will be the difference between the average and the accepted value.

Thanks,

Quality Digest