Introduction
From the Adapting Author – Introduction to the 1st Canadian Edition
In the era of digital devices, interactive learning has become a vital part of the process of knowledge acquisition. The learning process for the gadget generation students, who grow up with a wide range of digital devices, has been dramatically affected by the interactive features of available computer programs. These features can improve students’ mastery of the content by actively engaging them in the learning process. Despite the fact that many commercialized software packages exist, Microsoft Excel is yet known as one of the fundamental tools in both teaching and learning statistical and quantitative techniques.
With these in mind, two new features have been added to this textbook. First, all examples in the textbook have been Canadianized. Second, unlike the majority of conventional economics and business statistics textbooks available in the market, this textbook gives you a unique opportunity to learn the basic and most common applied statistical techniques in business in an interactive way when using the web version. For each topic, a customized interactive template has been created. Within each template, you will be given an opportunity to repeatedly change some selected inputs from the examples to observe how the entire process as well as the outcomes are automatically adjusted. As a result of this new interactive feature, the online textbook will enable you to learn actively by re-estimating and/or recalculating each example as many times as you want with different data sets. Consequently, you will observe how the associated business decisions will be affected. In addition, most commonly used statistical tables that come with conventional textbooks along with their distributional graphs have been coded within these interactive templates. For instance, the interactive template for the standard normal distribution provides the value of the z associated with any selected probability of z along with the distribution graph that shows the probability in a shaded area. The interactive Excel templates enable you to reproduce these values and depict the associated graphs as many times as you want, a feature that is not offered by conventional textbooks. Editable files of these spreadsheets are available in the appendix of the web version of this textbook (http://opentextbc.ca/introductorybusinessstatistics/) for instructors and others who wish to modify them.
It is highly recommended that you use this new feature as you read each topic by changing the selected inputs in the yellow cells within the templates. Other than cells highlighted in yellow, the rest of the worksheets have been locked. In the majority of cases the return/enter key on your keyboard will execute the operation within each template. The F9 key on your keyboard can also be used to update the content of the template in some chapters. Please refer to the instructions within each chapter for further details on how to use these templates.
From the Original Author
There are two common definitions of statistics. The first is “turning data into information”, the second is “making inferences about populations from samples”. These two definitions are quite different, but between them they capture most of what you will learn in most introductory statistics courses. The first, “turning data into information,” is a good definition of descriptive statistics—the topic of the first part of this, and most, introductory texts. The second, “making inferences about populations from samples”, is a good definition of inferential statistics—the topic of the latter part of this, and most, introductory texts.
To reach an understanding of the second definition an understanding of the first definition is needed; that is why we will study descriptive statistics before inferential statistics. To reach an understanding of how to turn data into information, an understanding of some terms and concepts is needed. This first chapter provides an explanation of the terms and concepts you will need before you can do anything statistical.
Before starting in on statistics, I want to introduce you to the two young managers who will be using statistics to solve problems throughout this book. Ann Howard and Kevin Schmidt just graduated from college last year, and were hired as “Assistants to the General Manager” at Foothill Mills, a small manufacturer of socks, stockings, and pantyhose. Since Foothill is a small firm, Ann and Kevin get a wide variety of assignments. Their boss, John McGrath, knows a lot about knitting hosiery, but is from the old school of management, and doesn’t know much about using statistics to solve business problems. We will see Ann or Kevin, or both, in every chapter. By the end of the book, they may solve enough problems, and use enough statistics, to earn promotions.
Data and information, samples and populations
Though we tend to use data and information interchangeably in normal conversation, we need to think of them as different things when we are thinking about statistics. Data is the raw numbers before we do anything with them. Information is the product of arranging and summarizing those numbers. A listing of the score everyone earned on the first statistics test I gave last semester is data. If you summarize that data by computing the mean (the average score), or by producing a table that shows how many students earned A’s, how many B’s, etc. you have turned the data into information.
Imagine that one of Foothill Mill’s high profile, but small sales, products is Easy Bounce, a cushioned sock that helps keep basketball players from bruising their feet as they come down from jumping. John McGrath gave Ann and Kevin the task of finding new markets for Easy Bounce socks. Ann and Kevin have decided that a good extension of this market is college volleyball players. Before they start, they want to learn about what size socks college volleyball players wear. First they need to gather some data, maybe by calling some equipment managers from nearby colleges to ask how many of what size volleyball socks were used last season. Then they will want to turn that data into information by arranging and summarizing their data, possibly even comparing the sizes of volleyball socks used at nearby colleges to the sizes of socks sold to basketball players.
Some definitions and important concepts
It may seem obvious, but a population is all of the members of a certain group. A sample is some of the members of the population. The same group of individuals may be a population in one context and a sample in another. The women in your stat class are the population of “women enrolled in this statistics class”, and they are also a sample of “all students enrolled in this statistics class”. It is important to be aware of what sample you are using to make an inference about what population.
How exact is statistics? Upon close inspection, you will find that statistics is not all that exact; sometimes I have told my classes that statistics is “knowing when its close enough to call it equal”. When making estimations, you will find that you are almost never exactly right. If you make the estimations using the correct method however, you will seldom be far from wrong. The same idea goes for hypothesis testing. You can never be sure that you’ve made the correct judgement, but if you conduct the hypothesis test with the correct method, you can be sure that the chance you’ve made the wrong judgement is small.
A term that needs to be defined is probability. Probability is a measure of the chance that something will occur. In statistics, when an inference is made, it is made with some probability that it is wrong (or some confidence that it is right). Think about repeating some action, like using a certain procedure to infer the mean of a population, over and over and over. Inevitably, sometimes the procedure will give a faulty estimate, sometimes you will be wrong. The probability that the procedure gives the wrong answer is simply the proportion of the times that the estimate is wrong. The confidence is simply the proportion of times that the answer is right. The probability of something happening is expressed as the proportion of the time that it can be expected to happen. Proportions are written as decimal fractions, and so are probabilities. If the probability that Foothill Hosiery’s best salesperson will make the sale is .75, three-quarters of the time the sale is made.
Why bother with statistics?
Reflect on what you have just read. What you are going to learn to do by learning statistics is to learn the right way to make educated guesses. For most students, statistics is not a favourite course. Its viewed as hard, or cosmic, or just plain confusing. By now, you should be thinking: “I could just skip stat, and avoid making inferences about what populations are like by always collecting data on the whole population and knowing for sure what the population is like.” Well, many things come back to money, and its money that makes you take stat. Collecting data on a whole population is usually very expensive, and often almost impossible. If you can make a good, educated inference about a population from data collected from a small portion of that population, you will be able to save yourself, and your employer, a lot of time and money. You will also be able to make inferences about populations for which collecting data on the whole population is virtually impossible. Learning statistics now will allow you to save resources later and if the resources saved later are greater than the cost of learning statistics now, it will be worthwhile to learn statistics. It is my hope that the approach followed in this text will reduce the initial cost of learning statistics. If you have already had finance, you’ll understand it this way—this approach to learning statistics will increase the net present value of investing in learning statistics by decreasing the initial cost.
Imagine how long it would take and how expensive it would be if Ann and Kevin decided that they had to find out what size sock every college volleyball player wore in order to see if volleyball players wore the same size socks as basketball players. By knowing how samples are related to populations, Ann and Kevin can quickly and inexpensively get a good idea of what size socks volleyball players wear, saving Foothill a lot of money and keeping John McGrath happy.
There are two basic types of inferences that can be made. The first is to estimate something about the population, usually its mean. The second is to see if the population has certain characteristics, for example you might want to infer if a population has a mean greater than 5.6. This second type of inference, hypothesis testing, is what we will concentrate on. If you understand hypothesis testing, estimation is easy. There are many applications, especially in more advanced statistics, in which the difference between estimation and hypothesis testing seems blurred.
Estimation
Estimation is one of the basic inferential statistics techniques. The idea is simple; collect data from a sample and process it in some way that yields a good inference of something about the population. There are two types of estimates: point estimates and interval estimates. To make a point estimate, you simply find the single number that you think is your best guess of the characteristic of the population. As you can imagine, you will seldom be exactly correct, but if you make your estimate correctly, you will seldom be very far wrong. How to correctly make these estimates is an important part of statistics.
To make an interval estimate, you define an interval within which you believe the population characteristic lies. Generally, the wider the interval, the more confident you are that it contains the population characteristic. At one extreme, you have complete confidence that the mean of a population lies between – ∞ and + ∞ but that information has little value. At the other extreme, though you can feel comfortable that the population mean has a value close to that guessed by a correctly conducted point estimate, you have almost no confidence (“zero plus” to statisticians) that the population mean is exactly equal to the estimate. There is a trade-off between width of the interval, and confidence that it contains the population mean. How to find a narrow range with an acceptable level of confidence is another skill learned when learning statistics.
Hypothesis testing
The other type of inference is hypothesis testing. Though hypothesis testing and interval estimation use similar mathematics, they make quite different inferences about the population. Estimation makes no prior statement about the population; it is designed to make an educated guess about a population that you know nothing about. Hypothesis testing tests to see if the population has a certain characteristic—say a certain mean. This works by using statisticians’ knowledge of how samples taken from populations with certain characteristics are likely to look to see if the sample you have is likely to have come from such a population.
A simple example is probably the best way to get to this. Statisticians know that if the means of a large number of samples of the same size taken from the same population are averaged together, the mean of those sample means equals the mean of the original population, and that most of those sample means will be fairly close to the population mean. If you have a sample that you suspect comes from a certain population, you can test the hypothesis that the population mean equals some number, m, by seeing if your sample has a mean close to m or not. If your sample has a mean close to m, you can comfortably say that your sample is likely to be one of the samples from a population with a mean of m.
Sampling
It is important to recognize that there is another cost to using statistics, even after you have learned statistics. As we said before, you are never sure that your inferences are correct. The more precise you want your inference to be, either the larger the sample you will have to collect (and the more time and money you’ll have to spend on collecting it), or the greater the chance you must take that you’ll make a mistake. Basically, if your sample is a good representation of the whole population—if it contains members from across the range of the population in proportions similar to that in the population—the inferences made will be good. If you manage to pick a sample that is not a good representation of the population, your inferences are likely to be wrong. By choosing samples carefully, you can increase the chance of a sample which is representative of the population, and increase the chance of an accurate inference.
The intuition behind this is easy. Imagine that you want to infer the mean of a population. The way to do this is to choose a sample, find the mean of that sample, and use that sample mean as your inference of the population mean. If your sample happened to include all, or almost all, observations with values that are at the high end of those in the population, your sample mean will overestimate the population mean. If your sample includes roughly equal numbers of observations with “high” and “low” and “middle” values, the mean of the sample will be close to the population mean, and the sample mean will provide a good inference of the population mean. If your sample includes mostly observations from the middle of the population, you will also get a good inference. Note that the sample mean will seldom be exactly equal to the population mean, however, because most samples will have a rough balance between high and low and middle values, the sample mean will usually be close to the true population mean. The key to good sampling is to avoid choosing the members of your sample in a manner that tends to choose too many “high” or too many “low” observations.
There are three basic ways to accomplish this goal. You can choose your sample randomly, you can choose a stratified sample, or you can choose a cluster sample. While there is no way to insure that a single sample will be representative, following the discipline of random, stratified, or cluster sampling greatly reduces the probability of choosing an unrepresentative sample.
The sampling distribution
The thing that makes statistics work is that statisticians have discovered how samples are related to populations. This means that statisticians (and, by the end of the course, you) know that if all of the possible samples from a population are taken and something (generically called a “statistic”) is computed for each sample, something is known about how the new population of statistics computed from each sample is related to the original population. For example, if all of the samples of a given size are taken from a population, the mean of each sample is computed, and then the mean of those sample means is found, statisticians know that the mean of the sample means is equal to the mean of the original population.
There are many possible sampling distributions. Many different statistics can be computed from the samples, and each different original population will generate a different set of samples. The amazing thing, and the thing that makes it possible to make inferences about populations from samples, is that there are a few statistics which all have about the same sampling distribution when computed from the samples from many different populations.
You are probably still a little confused about what a sampling distribution is. It will be discussed more in the chapter on the Normal and t-distributions. An example here will help. Imagine that you have a population—the sock sizes of all of the volleyball players in the South Atlantic Conference. You take a sample of a certain size, say six, and find the mean of that sample. Then take another sample of six sock sizes, and find the mean of that sample. Keep taking different samples until you’ve found the mean of all of the possible samples of six. You will have generated a new population, the population of sample means. This population is the sampling distribution. Because statisticians often can find what proportion of members of this new population will take on certain values if they know certain things about the original population, we will be able to make certain inferences about the original population from a single sample.
Univariate and multivariate statistics statistics and the idea of an observation
A population may include just one thing about every member of a group, or it may include two or more things about every member. In either case there will be one observation for each group member. Univariate statistics are concerned with making inferences about one variable populations, like “what is the mean shoe size of business students?” Multivariate statistics is concerned with making inferences about the way that two or more variables are connected in the population like, “do students with high grade point averages usually have big feet?” What’s important about multivariate statistics is that it allows you to make better predictions. If you had to predict the shoe size of a business student and you had found out that students with high grade point averages usually have big feet, knowing the student’s grade point average might help. Multivariate statistics are powerful and find applications in economics, finance, and cost accounting.
Ann Howard and Kevin Schmidt might use multivariate statistics if Mr McGrath asked them to study the effects of radio advertising on sock sales. They could collect a multivariate sample by collecting two variables from each of a number of cities—recent changes in sales and the amount spent on radio ads. By using multivariate techniques you will learn in later chapters, Ann and Kevin can see if more radio advertising means more sock sales.
Conclusion
As you can see, there is a lot of ground to cover by the end of this course. There are a few ideas that tie most of what you learn together: populations and samples, the difference between data and information, and most important, sampling distributions. We’ll start out with the easiest part, descriptive statistics, turning data into information. Your professor will probably skip some chapters, or do a chapter toward the end of the book before one that’s earlier in the book. As long as you cover the chapters “Descriptive Statistics and frequency distributions”, “The normal and the t-distributions”, “Making estimates” and that is alright.
You should learn more than just statistics by the time the semester is over. Statistics is fairly difficult, largely because understanding what is going on requires that you learn to stand back and think about things; you cannot memorize it all, you have to figure out much of it. This will help you learn to use statistics, not just learn statistics for its own sake.
You will do much better if you attend class regularly and if you read each chapter at least three times. First, the day before you are going to discuss a topic in class, read the chapter carefully, but do not worry if you understand everything. Second, soon after a topic has been covered in class, read the chapter again, this time going slowly, making sure you can see what is going on. Finally, read it again before the exam. Though this is a great statistics book, the stuff is hard, and no one understands statistics the first time.