## Statistics Project, Statistics

Statistics Project, Statistics

Project description

For all questions below, please use the QoG dataset on the E-learning site (either the reduced one that we used in class, the Basic one, or the standard one) or another dataset related to your research interests (in which case you should also submit the data in .Rdata format or any format that is easily read by Deducer, such as csv or sav). Please do not choose exactly the same variables that were used as examples in the lecture slides on that topic. You are welcome to work on getting the Deducer output in groups, but when you need to choose a variable for the question, each person should choose a different variable and write his or her own answers to the questions.

Make a table summarizing the basic descriptive statistics for 3 variables. This should include mean, median, minimum/maximum values, standard deviation, and the number of observations on which this data was calculated (Valid N). If possible, choose interval (ratio-scale) variables. If you choose ordinal or nominal variables, indicate which descriptive statistics can be used for those variables. (Deducer command: Analysis – Descriptives)

Create a separate histogram for each of the three variables to see how the data is distributed over the range and highlight any features of the distribution that you should keep in mind in future analysis. For example, are the observations evenly distributed across the range, are there more at lower or higher levels, do you observe any outliers? (Deducer command: Plots-Plot builder-Histogram)

Choose (or create) at least one nominal variable that divides your observations into (at least) two groups (for example, democracies/ dictatorships, low/medium/high development). Create box plots that compares the groups for any interval variable.

(Deducer command: Plots-Plot builder-Group boxplot)

Describe what you observe about the distribution of the data and the differences between the groups by comparing the box plots.

Select two interval variables that you think may be related. Create a scatterplot for the two interval variables using the ‘scatter smooth’ template.

(Deducer command: Plots-Plot builder and under smoothing select "lm: linear model")

Discuss the relationship between these two variables based on the scatterplot. Does it appear to be positive/negative/no relationship? Describe ONE potential causal path from the independent variable to the dependent variable. (Remember scatterplots only show possible relationships, but process-tracing must be used to establish causation.)

Calculate a 95% confidence interval around the mean for any interval variable.

(Deducer command: Analysis – One Sample Test, tick "One-sample t-test).

Report the mean of the variable, the lower bound of the confidence interval (95% CI Lower), and the upper bound of the confidence interval (95% CI Uppper). Tell how to precisely interpret the confidence interval.

Create a graph showing the confidence intervals around the means for any interval variable comparing two groups using a nominal variable that divides your observations into exactly two groups (for example, democracies/ dictatorships, male/ female, old/ young).

(Deducer command: Plots – Plot builder – Mean (in Templates), Variable= interval variable, By=variable that defines the two groups)

Now do a difference of means test on the same variable with the same two groups.

(Deducer command: Analysis – Two Sample Test, Outcomes = interval variable, Factor= nominal variable that defines your two groups, Only "T-test" should be ticked).

State the null and alternative hypotheses for this hypothesis test. What is the mean for each group (i.e. democracies/dictatorships)? What is the p-value? What do you conclude based on this p-value (using 95% confidence level)?

Please save the graphs you created in a Word (.doc or .docx) document, together with the answers to the questions, and upload the document on the e-Learning site.