### Resampling Stats in MATLAB: Preface

Resampling Stats is a system for carrying out computations in statistics and for conducting simulations. The computations relate to an area of statistics called ``statistical inference'' that deals with questions such as these:
• If an exit poll of 500 randomly selected voters in a national election shows that candidate A is favored by 41% of voters while candidate B trails with 35% of the vote, how confident can I be that candidate A will still be in the lead when all the votes are counted?
• A test of a blood-pressure reducing drug in 50 subjects shows that it reduces blood pressure by an average of 9.5 mmHg, whereas a placebo (a sugar pill) shows a reduction of 1.2 mmHg an a second group of 50. Am I justified in concluding that the drug is effective?
• If the space shuttle flies its first 24 flights without an accident do I have reason to believe that it is perfectly safe? If not, what is the accident rate I should use in planning future missions?
• An experiment in educational reform will give randomly selected families free tuition to private schools, while a control group of families will send their kids to public schools. The experiment is controversial and expensive; it's important to get meaningful results. How many families should be enrolled in the experiment?

Readers who have experience with statistics will recognize these questions as examples of the application of confidence intervals, hypothesis testing, and power computations. In conventional statistics courses students are taught how to answer questions like these using a certain theoretical apparatus (based on ``Normal distribution theory'', the t-distribution, and so on). If things go right in the course, they also learn how to interpret the answers to such questions and when they do not have enough information to answer the posed questions. (For instance, in the second and fourth examples above there is not enough information.)

Resampling provides another, conceptually easier way to carry out the computations. In the theory of statistics, resampling is important because it allows questions to be answered even in situations where the historically conventional methods do not apply. In the learning and teaching of statistics, resampling is valuable because it allows students to address the questions of statistical inference in a way where their intuition can be brought to bear, by designing and carrying out simple numerical experiments on the computer.

By making the computations more accessible, resampling has another important benefit: it allows students to move on to the important matters of how to interpret the numerical answers to their questions and how to know when there is not enough information to answer the question.

Resampling Stats was originally developed by Julian Simon during the period 1973-1990 as a stand-alone software package. As the benefits of the resampling approach to teaching statistics have become more apparent it seemed advisable to make the facilities of Resampling Stats available to a wider audience, and to allow users to employ Resampling Stats in a widely used computational environment.

There is a large community of people who use the MATLAB computer language. It is very widely used, for example, by engineering students and often used in teaching mathematics. MATLAB provides an integrated environment for technical computation: it provides facilities for drawing graphs, reading and saving data, and carrying out a tremendous range of numerical calculations. Since so many people already know MATLAB, or will need to learn it in order to carry out work in their chosen fields, MATLAB is a natural platform for Resampling Stats.

At the same time, we realize that for many students of Resampling Stats this will be their first encounter with MATLAB, and some will not use MATLAB for any other purpose. We have therefore worked hard to keep the original simplicity and ease of use of Resampling Stats. We do not assume that you have any previous knowledge of MATLAB. A tutorial in the Appendix can be used to get started for those who have no previous experience in MATLAB.

The body of this book is divided into two parts. First, there is an introduction to the issues and terms of statistical inference done mainly through examples. This introduction is thoroughly integrated with computer examples using Resampling Stats in MATLAB. In addition to showing how resampling can be used to answer the simple, standard statistical inference questions found in traditional introductory statistics textbooks, we show cases where traditional introductory methods do not apply but where resampling techniques are straightforward extensions of the simple cases. The examples introduce and cover both the ``hypothesis testing'' framework for statistical inference and the Bayesian approach.

The second part of the book is documentation for the various Resampling Stats functions in MATLAB. This is arranged as a reference rather than a tutorial. Appendices provide a tutorial introduction to MATLAB, show how to perform the important operation of reading data into the MATLAB program.

This book is intended mainly to introduce --- using examples --- the resampling methodology and the Resampling Stats in MATLAB software. We attempt to provide enough conceptual background and definition of statistical terms to make the book self contained. ``Self contained'' is not, however, the same thing as ``systematic'' or ``comprehensive.'' This book does not cover all methods of analysis in statistical inference, nor does it do more than touch on the very important areas of experimental design, descriptive statistics, and exploratory data analysis. The treatment of these areas is largely independent of the mathematical methods (resampling vs. conventional formula) used for inference, although we believe that resampling is both more flexible and easier to learn.

Although the simple computer skills needed to use resampling are by no means trivial, we think they are far, far less formidable than the analytical mathematics that has been the bane of generations of statistics students who learned inference in the traditional way. Had computers been available 100 years ago, we think it likely that statistical inference would have developed with resampling as its foundation. As support for this entirely speculative statement, we note that one of the most important developments in traditional inference theory, the t-distribution, was developed at the turn of the last century by William Gosset based on resampling techniques (and tedious labor on hand calculators).