text-statistics-rickmers-1971

STATISTICS IN PHOTOGRAPHIC SCIENCE?
by Professor Albert Rickmers
Department of Photographic Science and Instrumenttaion
School of Photographic Arts and Sciences
College of Graphic Arts and Photography

as published in The Photographic Scientist, March 1971, Vol. 2, Number 2

In Photographic Science and Engineering, as in any of the science areas, we are constantly called upon to answer questions and to contribute to the advancement of the "state-of-the-art". When a specific question has been well formulated, the knowledgeable person with the technical insight will usually know what has to be done to secure information which will lead to the answers. Experiments will be run, data collected, and then a report written on the findings from the experimental data.

Applied Statistics is concerned with the scientific methods for collecting, organizing, summarizing, presenting. and analyzing. data, as well as drawing valid conclusions and making reasonable decisions on the basis of such analysis.

Instead of running an exhaustive set of experimental runs, we examine as few as possible, called a sample. If a sample is representative of a set of conditions, important conclusions about that set of conditions can often be inferred from analysis of the sample. Because such inferences cannot be absolutely certain, the language of probability is often used in stating conclusions. The phase of statistics dealing with conditions under which such inferences are valid, is called Statistical Inference.

As it takes money to make experimental runs, the more runs required and the more the subsequent cost, the more scientists and engineers are making use of that phase of applied statistics called Design of Experiments. Here prior to the collection of any data. the experiment is designed so as to insure that maximum information will be secured from the minimum number of experimental runs. Equally important in a well designed experiment is the internal estimate of error, which is always present in any experiment, for valid reasons can not be drawn unless the changes which are found in the factors under test, are statistically larger than the experimental error. If the improper number of experimental runs are made, two types of false conclusions can be arrived at: first, with too few runs, the experiment could be insensitive to small important improvements and secondly, with too many runs, the experiment becomes super sensitive and points out improvements which are of no value to the economic aspects of the problem.

Applied statistics allows the experimenter to use the proper number of runs to give his experiment the proper sensitivity.

(Consider the following data table showing the results of a small experiment to see if three different processing conditions have any effect on the residual hypo in the processed film:

PROCESSING CONDITION

                                  A B                                 C
REPLICATES
                                5.0                                  10.0                              10.0
                                2.5                                  10.0                              12.0
                                5.0                                    5.0                                 5.0

TOTALS              12.5                                  25.0 27.0
AVERAGE    4. 1                                   8.3                                  9.0

It is easy to see that the results are different for each of the three processing conditions. A closer look at the data table will show that the replicates of the same processing conditions, do not turn out to be alike. Now the question becomes one of determining if the difference between the processing conditions is larger than the difference between replicates of the same condition.

Statistics tell us that the difference between replicates can be used to obtain a measure of the experimental error which is present in the experiment. The difference between columns, (the processing conditions,) can be used to obtain a measure of the change in residual hypo caused by the type of processing used. But the evaluation is better if it is made on statistical grounds, not merely by looking at the results and guessing.

Each result obtained, Xij, turns out as it did for the following reasons: (a) the general level of hypo left in the film u .(b) the amount due to the type of' processing Pj. (c) the presence of experimental error Eij. This can be written as Xij = u + Pj + Eij, and by a statistical technique known as Analysis of Variance, ANOVA, we can calculate the amount of variance due to each cause. The results of the analysis of variance will produce the following ANOVA table:

Source              Sum of Squares          Degrees of Freedom Mean   Sq.

Between 41.17 2                                        2                                        20.58
Proc.

Within                    46.85 6                                          7.81
Column (error)

Total                     88.02    8                                           ------

SS Processing = 0.333 (12.5 sq + 25.0sq   + 27.0sq) - (64.5)sq / 9

SS Error = (5.0 - 4.1) sq + (2.5 - 4.1) sq + (5.0 - 4.1)sq + 10.0 - 8.3)sq + (10.0 - 8.3)sq + (5.0 - 10,0)sq + (10.0 - 9.0)sq + (12 - 9.0)sq + (12 - 9.0)sq + (5.0 - 9.0)sq

where sq stands for "squared"

The meani squares are merely the Sum of Squares divided by the degrees of freedom, and the mean squares are then used to test if the difference due to processing is greater than the error. This involves the running of an F test, which is merely the ratio of the two mean squares. F=20.58/7.81 = 2.62. To see if this value of the ratio is to be considered as significant, or due to chance, it is compared with a probability value obtained from a table which shows how large the ratio could be due to change alone. The table value, using a degree of confidence equal to 0.95 is 5.14. From the above ANOVA table, we are then unable to say that a difference due to processing has been demonstrated. The amount of error present could be the major reason the numbers turned out as they did. The conclusion then: There is no evidence of a difference in the residual hypo due to the three types of processing under study.

Photography, like other industries. depends upon the successful operation of processes for producing the proper quality level of product while waste and cost are kept down. Any process, regardless of the engineering which has gone into it, will produce a product which will vary with time. Applied statistics gives the science of photography a tool which can be used to monitor the process, and differentiate between random variability and assignable variability. The random variability is the "noise" of the system, and the assignable variability is the "signal" which tells the user that changes are taking place in the system, and that something can be done to return the system to its original capabilities.

With automated production lines, a product is being turned out so fast that every item produced can no longer be inspected to see if it meets quality standards. Here again, statistics provides tile tools for taking samples from the production line, and allows for the making of judgments about the entire set from the few. Sampling inspection of finished product will tell us when to screen and sort, when to remake and when to scrap. It will also tell the operator when to leave the production line alone. To over control production line can be as bad as to under control it.

The photographic scientist or engineer is constantly called upon to make decisions. Decisions are easily made when the answer is obvious, black from white, but when the decision is made in the presence of doubt, grays, then statistics can be used to make the necessary decision. Statistical decisions also have the advantage of knowing the likelihood of being correct. Decisions made with statistics have stated confidence levels, and insure that desired changes will be found when they are present.

As Dr. Mason E. Wescott, R.I.T. Statistics Department, has often said, "Mathematics is the science of the certain, but statistics is the science of the uncertain". The scientist and the engineer work in a world of uncertainty, so to his knowledge of mathematics should be added the science of statistics. There are two major areas of statistics: Mathematical Statistics, the theory of the science, and Applied Statistics.