Section 1: Data Concepts and Analysis Uses of Statistical Analysis  Description and analysis  Inference  Assessing risk and probability  Identifying important relationships Descriptive Statistics Measures of central tendency  Mean o Sum of scores divided by number of scores  Median o Order scores by size, then take middle datapoint  Mode o Most common score or category When summarising data use descriptive statistics as it reduces the amount of information in order to increase its clarity Deceptive Descriptive Things to look out for;  Precision verses accuracy o Precision refers to the exactitude with which something can be stated, accuracy is how well the information fits the truth  The question being asked o The way in which the question is asked affects the answer  Unit of analysis o The focus of the analysis can affect the accuracy of the result in representation of the whole population  Mean verses median o Both convey different information, the dataset can seem larger or smaller depending on the central tendency measure chosen  Units of measurement o Unit of measurement chosen can affect the output  Percent verses absolute o i.e. 10% raise sounds fair until one person has $50,000 and another has $500,000  Aggregation o How the data is aggregated or subdivided creates different impressions  Measuring the right thing o What is measured can alter incentives Correlation The correlation coefficient can be used to summarise the nature of relationship between two variables. r is called the pearson correlation coefficient and when using it the units don’t have to be the same. Pearsons’ correlation coefficient is a number between -1.0 and 1.0 and includes information about direction (positive or negative) and strength (magnitude of number). Measures the linear relationship (not good for non-linear). Correlation is NOT causation Probability Probability is the study of events and outcomes involving an element of uncertainty. Probabilities are all between 0.0 and 1.0, some events have inherent probabilities and others are inferred from past data. Cumulative probabilities have important implications. If many variables are measured in a study and there are no true relationships the chance of seeing a large correlation for one specific variable is fairly low, but, the chance of finding a large correlation for one of the variables may be high even if this is a fluke. Therefore it is very important not to assume that correlation means causation. There are two types of multiple event probabilities, joint events (the probability of one event AND another event occurring) and disjoint events (the probability of one event OR another event occurring). Expected value is the sum of the values of the payoff of each outcome each multiplied by its probability. Law of large numbers: as the number of independent trials increase the n=mean of the outcomes will get closer to the expected value Problems with Probability Probabilities allow us to quantify future events and are thus important aids to good decision making but we have to be aware of the errors that may arise in calculating and interpreting probabilities;  Assuming events are independent when they are not  Not understanding when events ARE independent  Clusters happen  The prosecutor’s fallacy o Can’t necessarily infer one probability from the other, we need to compare the probabilities  Regression to the mean  Statistical discrimination Three different types of probability  Classic probability is deduced from the properties of well defined objects  Frequentist probability is derived from the frequency of events  Subjective probability expresses a belief about the likelihood of an event

视频信息