General Statistics |
Study Guide / Overview Introduction to Probability and Statistics |
Introduction
Ever since the first hunter came back from the hunt and claimed that he saw 10,000 buffalos grazing beyond the trees or the first scout was sent out to estimate and report to the tribe about the strengths and skills of the enemy tribe, men and woman have been using statistics to make sense of a world too large to comprehend, too unclear to know for sure and too distant and uncertain to predict.
So the tools we used to gather, classify, describe and make educated guests about our vast world are grouped into a body of knowledge called probability and statistics.
Probability is the tool or science of defining uncertainly so as to narrow the gap between what is certainly and uncertainty. Sometimes if done right we become more sure of what is unsure or more aware of the degree of chance of events and the nature of the outcomes they are governed by.
There are some statisticians who believe that events are governed by laws and rules that can be learned and if used systematically can gives us a firmer grasp on uncertainty. The methods presented in this and other statistical text have underlined assumptions. Students of this body of knowledge not only must learn when and how to use these tools but also the limitations that govern their applications. The principles that govern most statistical inference are rooted in scientific methods of experiments, where an hypothesis is made about an assumption about the characteristics being studied or evaluated. This hypothesis is tested or examined through the rigor of experimentation and then evaluated against the premise of the hypothesis on which conclusion or inference are made with some degree of acceptable risk or uncertainty.
Statistics, even though it seems more certain and easily executed than probability through formulated techniques and methodologies, is rooted in probability. Statistics attempts to use a small sample in most cases to say something about the population which in most cases is not well known. The uncertainty part of statistics examines how information is gathered through sampling and if not done "right" will lead to false conclusions about the unknown population and the level of subjective risk taken when making inference about this sample. Even if we were to study the entire population, we cannot be certain that the population characteristics will remain unchanged in the future or uniform throughout every aspect of its domain (range of values).
This is an online text so chapters can be studied in any order, since prior knowledge is ensured through links to key concepts required for the understanding of the subject being studied.
Chapter 1. Introduces statistics in a formal way, showing its relationship to probability with some mention of the importance of randomness and sampling to the introduction of statistical experimental design.
Descriptive statistics
Chapter 2. Presents techniques used to summary data either graphically or with some simple statistical parameter (summary statement or value).
Chapter 3. Presents techniques to describe and model relationships between variables or attributes or characteristics of a sample: Correlation describes the strength and direction of the relationship and regression modeling such relationship, if one exist.
Probability
Chapter 4. Introduces the science of probability, a branch of mathematics dealing with chance, outcomes of events with uncertainty.
Chapter 5. Introduces various probability distributions that are classified as discrete, i.e. the values of the input variable cannot be easily broken into fractional parts.
Chapter 6. Introduces various probability distributions that are classified as continuous, i.e. the values of the input variable can be easily broken into fractional parts of which the normal distribution is a part.
Statistical Inference
Chapter 7. Provides techniques for making inference about a population's mean and proportions and also introduces hypothesis testing.
Chapter 8. Provides techniques for making inference when comparing two population parameters.
Chapter 9. Addresses inference concerning the regression and correlation statistics presented in chapter 3.
Chapter 10. Presents Analysis of Variance which is an introduction to experimental design to study treatment effects on outcome statistically.
Chapter 11. Presents the chi-square analysis technique for evaluating categorical data (multiple non parametric population study).
Chapter 12. Presents a collection of techniques used to measure and make inference about non normal distributed population parameters.
Other chapters will be added - especially several on Decision Making in future editions of this online learning module.
The goal of this resource is to help reader learn by doing so interactive statistical tools are presented in each chapter to help partisicpants integrated concepts with solution strategy of a chapter's related problems.
Probability is the branch of mathematics that deals
with the
laws of chances and is important when making statistical inference or
hypothesis
testing.
Example 1.1 What is the possibility of having 3 women walking
successively
into a restaurant wearing pants?
These are probability statement of whose solutions require some knowledge of probability. |
So any branch of statistics that deals with chance or uncertainty or possible outcomes require some basic knowledge of probability.
Probability is often expressed in terms of percent, proportions, fraction of decimal equivalent of fractions. A weather man may say that there is a 10% chance of it raining tomorrow (percent). The statement that "I have a 50/50 chance of getting head if I flip a fair coin" is a probability statement expressed as proportion. If one reports that about 1/3 the people attending tonight's game will be under 30 years old, then one is making a probability statement using fraction. Often these fractions and ratios can be expressed as decimal between 0 and 1.0. It is often necessary to be able to convert between other expressions of probability to its decimal equivalence.
Often researchers used experiments to study probability.
Section Question: What is the probability of getting two heads with one flip of 2 coins?
Statistics
Statistics is a branch of mathematics dealing with the
collection,
summary of , description of and inference of data or observations or
results
of an experiment.
There are 2 branchs of statistics: 1. Descriptive Statistics - the organization, summary
and description
of data
Example 1.8 Taking a collection of data and summarizing it
into a graphical
representation such as a bar
chart
|
The field of statistics is important because often it is not possible to collect all the data or observations for all the member of a group. Nor is it always possible to know all the characteristics needed to know about the entire group being studied.
The population is a collection
of all possible sample space or all element of interest or all data for
a group or all member of a particular defined variable.
The population is defined by the researcher.
A defined variable is a characteristics that defines an
attribute
(example, People, Cars, Age, Male, Height of Basketball Players, Colors
of the Rainbow).
Any portion of the population, including the entire population, is
considered
a sample; however people often use the term sample to indicate
a
portion of the population and not the entire population. The number of
sample is often refer to as the sample size and is often
denoted
by the letter, n.