This test will quantify the chances that you can successfully fit *any* probability model to your data. By using this simple test to examine the assumptions behind all probability models, you can avoid making serious mistakes. This column will illustrate this test and explain why it works.

### Example 1

While a large *p*-value doesn’t guarantee that a probability model exists, or tell you which model might work, a small *p*-value provides a red light to save you from wasted effort.

When presented with a collection of data from operations or production, many will start their analysis by computing descriptive statistics and fitting a probability model to the data. But before you do this, there’s an easy test that you need to perform.

If no probability model exists that will fit your data, what do you do next? If the data have been collected in such a way that they should be homogeneous, then the lack of homogeneity indicated by the Ramirez-Runger test is a signal that some unknown cause is changing your process outcomes. So the question becomes, “How can we identify this unknown cause?”

One caveat is needed. Since the Ramirez-Runger test depends upon the time-order sequence of the data, it should always be used with data in their native ordering. Specifically, it can’t be used on data that have been rearranged into a ranking where the values are placed in ascending or descending order.

### So what happens next?

The Ramirez-Runger test statistic is:

Thus, with a *p*-value of 0.238, it’s not unreasonable to think that you might find some probability model that would fit these data and describe the process outcomes.

### How this test works

These data are not sufficiently well-behaved to be represented by a single probability model. However, that doesn’t keep your software from drawing a bell-shaped curve over the histogram as in Figure 8.

This statistic will be approximately distributed according to an F-distribution having (*n–*1) numerator degrees of freedom and (0.62*(*n–*1)) denominator degrees of freedom. The probability of exceedance (the *p*-value) for this test statistic may be obtained in most spreadsheet programs. In Excel we use the formula:

= FDIST(test stat., num. d.f., denom. d.f.)

= FDIST(2.60, 199, 123)

We might be tempted to immediately test these data to see if they’re consistent with a normal probability model. However, before we try to test for a lack of fit with a specific probability model, it’s instructive to test the data to see if they’ll fit *any* probability model.

When the *p*-value for the Ramirez-Runger test is small, you’ll know that you can’t fit a probability model to your data. Neither can you estimate process parameters nor compute confidence intervals, test hypotheses, or use any other statistical analysis techniques. Rather, you’ll need to use a more fine-grained approach, looking for the assignable causes of exceptional variation within the data themselves. And, of course, this will lead to the use of process behavior charts.

Ramirez-Runger tells us that, regardless of how pretty Figure 9 may look, there are only 13 chances in a billion that it’s correct. A more realistic representation of the process that produced the data for Example 1 would look something like Figure 9.

When they’re not equivalent, it’s unreasonable to assume that the data came from a sequence of independent and identically distributed random variables. And when the data show evidence that they didn’t come from a sequence of independent and identically distributed random variables, the notion of a probability model vanishes. At this point, you should abandon all hope of ever fitting a reasonable probability model to your data.

This value may be interpreted as the probability that *any* probability model can be found that will actually describe the process outcomes. With 13 parts per billion as your chance for success, you can save yourself a lot of time and trouble by simply not trying to fit a probability model to these data. So, what do you do instead? This will be discussed below.

### Example 2

To do this, we use a test developed and published by Brenda Ramirez and George Runger in 2006. The test statistic compares two different measures of dispersion. The first of these is the standard deviation statistic computed using all of the data. The second of these is the average of the differences between successive values. Here, the time order is given by reading the data in rows. Figure 3 lists the 199 successive differences.

So, while the process behavior chart remains the final arbiter of when a process is operated unpredictably, the Ramirez-Runger test provides a computation-based alternative that can keep you from making serious mistakes. If you’re not already starting your analysis with a process behavior chart, then the Ramirez-Runger test is the test to use before all other tests.