Introduction
William Sealy Gosset's work has proven fundamental to statistical inference as practiced today. Better known by his pseudonym, "Student," Gosset's name is associated with the discovery of thet-distribution and its use.[1,2] He had a profound effect on the practice of statistics in industry and agriculture. He was a chemist and statistician, better known by his pen name "Student." He worked in a beer brewery and his testing of very small patches led him to discover certain small-sample distributions. This led to the development of Student'st-test. In 1908, a fundamentally new approach to the classical problem of the theory of errors was developed. Gosset was led early in his career at Guinness to examine the relationship between the raw materials for beer and the finished product, and this activity naturally led him to learn the tools of statistical analysis. In 1908, the two contributions, as Student'st-distribution and the small sample distribution of Pearson's correlation coefficient, placed him among the great men of the newly emerging field of statistical methodology.[3] He had a profound effect on the practice of statistics in industry and agriculture. The story of this advance is as instructive as it is interesting. This paper presents a detailed account of the development of small-sample approach, which was a pathbreaking contribution as Student'st-test by Gosset.
A Biographical Glimpse
William Sealy Gosset (187-1937) was born on June 13, 1876 in Canterbury, Kent, England. He was the first of five children of Colonel Frederic Gosset and Agnes Sealy Vidal. He was a very good student and won several scholarships. Gosset was a scholar of Winchester and later New College Oxford where he obtained first classes in mathematical moderations (1897) and chemistry (1899).[1,2] In 1935, Gosset left Dublin to take over a position of a scientist in a leading position at a new Guinness brewery in London, England. In 1937, he died of a heart attack in Beaconsfield, Buckinghamshire, England at the age of 61 years, still in the employment of Guinness.
A Humble Brewer
Guinness was interested in agricultural experimentation, and he hired scientists who could apply their expertise to the business. In 1893 at Guinness' Brewery, there was a policy of recruiting brewers with scientific degrees, (although only from Oxford or Cambridge) and it was decided that anyone wishing to make a mark as a brewer in the future must have training in "the application of science (chemistry and bacteriology) to the fermentation industries." Hence, Gosset joined Guinness Brewery in Dublin on October 1, 1899 as a junior brewer and was the fifth scientist to be recruited as a brewer. The recruitment of scientists as brewers brought them very much into research. Given that Gosset had studied mathematics as well as chemistry at Oxford, it was perhaps only natural that he focused his attention to the use of mathematical methods in the working of the brewing process. At Guinness, Gosset applied his statistical knowledge both in the brewery and on the farm to select the best yielding varieties of barley. Problems of this type in experimental brewery led him to turn his attention to the "The error of the mean of a small sample."[4] During this time, mathematicians were not able to answer the problem of determining the error of the mean for a small sample. This was critically important to derive valid results from many experiments in the brewery and that too without using adequate methods of sampling. Hence, in 1904 he wrote an internal report for Guinness on "The Application of the Law of Error to the Work of the Brewery" that emphasized the importance of the use of the probability theory to set exact values based on the results of experiments in the brewery. Another internal report in 1905 entitled "The Pearson Co-efficient of Correlation" was written by him that was also endorsed by the Guinness Board. The internal reports written by Gosset were especially interesting and illustrated a great utility of the new statistical methods introduced in the brewery. Gosset's statistical work helped him become the head brewer, a more interesting title than a professor of statistics.[5]
The 'Faraday of Statistics' on Mean and Correlation Coefficient
Gosset, the "Faraday of statistics," was a highly influential figure in the development of modern statistical thinking. He used statistics to solve a whole lot of problems connected with brewing, ranging from barley production to yeast fermentation that affected the quality of the product. One problem involved the selection of varieties of barley having maximum yields for given soil types and allowing for the vagaries of climate. His name may not be familiar but his work is known to the statistical world as "Student." To extend his knowledge, Gosset spent 1 year at the biometric laboratory of the leading statistician Karl Pearson at the University College London. Reliable statistics require adequate sample size. He soon realized that Pearson's large-sample theory required refinement if it was to be useful for the small-sample problems arising in brewing. It was in 1908 that he laid the basis for his most famous breakthrough work published in "Biometrika" entitled "The Probable Error of a Mean."[2,6,7] He focused primarily on determining the likelihood that a sample mean approximates the mean of the population from which it was drawn. The "probable error" of a mean is a specific estimate of the dispersion of sampling distribution such as the standard error. Estimating this dispersion today is a foundational step of statistical inference to draw inference about a population parameter from a sampled mean. In nearly all researches, both the population mean and variance are unknown. Therefore, we must use the sample variance to specify the sampling distribution of the mean. He confronted the problem of using sample variance to estimate the sampling distribution of the mean to have an error associated with sample variance. Further, this error is more likely to result in the underestimation of population variance because the sampling distribution of the variance is positively skewed.[8] Also, the "probable error" such as the error associated with sampled means, increases as the sample size decreases in case of small-sample research.[9] The unit normal table does not account for either the estimation of population variance or the fact that the error in this estimate depends on sample size. This limitation inspired Gosset to develop a set of valid probability tables for small sample sizes.[10,11] His fame today rests on a statistical test called Student'st-test.
The statistical methods available ended with a version of thez test for means - Even confidence intervals were not yet available. Gosset faced the problem we noted in using thez test to introduce the reasoning of statistical tests: He did not know the population standard deviation (ó). Moreover, field experiments give only small numbers of observations. Just replacing ó by s in thez statistic and calling the result roughly normal was not accurate enough. So, Gosset asked the key question: What is the exact sampling distribution of the statistic (x-u)/s? He also had the answer to his question and had calculated a table of critical values for his new distribution.[12] We call it thet distribution and thet-test is sometimes called "Student'st-test" in his honor.[13]
He wrote another paper in 1908 entitled "On the Probable Error of a Correlation Coefficient" after a simulated experiment in which he observed 750 sample values ofr (sample size 4) from a bivariate normal population with no correlation.[14] Gosset did not actually succeed in deriving a sampling distribution for ñ. Subsequently, R. A. Fisher (1915) derived the sampling distribution of ñ, using a geometrical argument, and this work led to the famous Fisher-Gosset correspondence.[15]
Gosset worked on a variety of statistical problems related to experimentation in agriculture and brewery.[16] He argued actively with other leading statisticians of his time including Karl Pearson, R.A. Fisher and Egon Pearson.[17] Sir Ronald Fisher,[15] a giant among statisticians, called Gosset "the Faraday of statistics," recognizing his ability to grasp general principles and apply them to problems of practical significance.
Why A Pseudonym as "APOS;Student"
Gosset's main paper, "The Probable Error of a Mean," was published in 1908. But to protect trade secrets, Guinness would not allow employees to publish the results of the research. They wished to keep the advantages that were gained from employing statisticians a secret from their competitors. Gosset persuaded his bosses that there was nothing in his work that would benefit competitors; they allowed him to publish but under an assumed name "Student".[1,2,3,4,6,14] Hence, anyone studying statistics encounters the name "Student" rather than that of the true author of the method. His most famous achievement is now referred to as the Student'st-distribution, which might otherwise have been the Gossett-distribution.
Conclusion
Gosset's work has proven fundamental to statistical inference as practiced today. The world of research changed greatly to an era characterized from small-sample research. His work marked the beginning of serious statistical inquiry into small sample inference and forms the basis of the most frequently used statistical test in behavioral science today. It plays a crucial role in statistical analysis, for example, it is used to evaluate the effect of medical treatment when we compare patients taking a new drug with a control group taking a placebo. It was also central to the development of quality control. Though he is considered as a humble brewer, he is a statistical pioneer to be recognized for his breakthrough work.
Financial Support and Sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
REFERENCES
1. Pearson ES. (Egon Sharpe), Barnard George A (George Alfred), Plackett RL, Gosset William Sealy d 1937 Student: a statistical biography of William Sealy Gosset 1990 Clarendon Press; New York Oxford University Press, Oxford,
2. Pearson ES. Student” as statistician Biometrika. 1939;30:210–50
3. Tankard JW Jr. The Statistical Pioneers 1984 Cambridge Schenkman Publishing Co:106
4. Box JF. Guinness, Gosset, Fisher, and small samples Stat Sci. 1987;2:45–52
5. Moore DS. The Basic Practice of Statistics 20002nd New York WH Freeman and Company
6. "Student". . The probable error of a mean Biometrika. 1908;6:1–25
7. "Student". . Student′s Collected Papers, (ed. by E.S. Pearson and J. Wishart), with a forward by L. McMullen Biometrika Office, University College. 1942
8. Pearson ES, Adyanthâya NK. The distribution of frequency constants in small samples from non-normal symmetrical and skew populations Biometrika. 1929;21:259–86
9. Welch BL. “Student” and small sample theory J Amer Statist Assoc. 1958;53:777–88
10. Pfanzagl J, Sheynin O. Studies in the history of probability and statistics XLIV: A forerunner of the t-distribution Biometrika. 1996;83:891–8
11. Eisenhart C. On the transition from "Student′s" z to "Student′s" t American Statistician. 1979;33:6–10
12. "Student". . New table for testing the significance of observations Metron. 1925;5:105–8
13. Box JF. Gosset, Fisher, and the t-distribution American Statistician. 1981;35:61–6
14. "Student". . Probable error of a correlation coefficient Biometrika. 1908;6:302–10
15. Fisher RA. “Student” Ann Engenics. 1939;9:1–9
16. "Student". . Statistics in biological research Nature. 1929;124:93.
17. McMullen L. “Student” as a man Biometrika. 1939;30:205–10