Statistical Analysis Assignment-SCU Assignment Help Part C
Get the best assignment help from MakeMyAssignments.com
MAT10251 STATISTICAL ANALYSIS
Data Analysis Project – Part C
(Marked out of 40 but worth 20% of final assessment)
Due: Week 12 Tuesday 17 May 2016
Objectives: 1 to 5
Topics: 7 to 9
Purpose: To answer questions about used car prices by applying your knowledge of statistical inference and regression and correlation and to communicate the results.
Part C Preparation
While the submission date for Part C is Tuesday 17 May 2016, you should be working on Part C during Weeks 9 to 11.
It is recommended that you follow the following timetable
- Question 1 covering Topic 7 should be attempted in Week 9
- Question 2 covering Topic 8 should be attempted in Week 10
- Question 3 covering Topic 9 should be attempted in Week 11
Task 1 Part C - Appendix Statistical Inference and Regression and Correlation (26 marks)
The following statistical tasks should appear as appendices to your written answer. This should include all necessary steps and appropriate Excel, or equivalent, output.
These appendices should come after your written answer within your single word document for Part C.
In preparing your appendices you may use one of the following formats:
- Word with Excel output added.
- Handwritten with Excel output added. This will then need to be scanned and added to your word document.
Choose a level of significance for any hypothesis test and a level of confidence for any confidence interval. Enter these values on page 2 of the Part C cover sheets along with the sample number from Part A.
Use your sample and appropriate statistical inference and regression and correlation techniques to answer the following questions.
Question 1 Statistical Inference Topic 7 (10 marks)
Your relative or friend asks you if used car prices are generally higher for cars with automatic transmission than those with manual.
Use Price and Transmission data (where A = Automatic transmission, M = Manual transmission) for all cars in your sample and an appropriate statistical inference technique to answer the following question
On average is the price of cars, of the specified make and model for sale in the specified state, with automatic transmission higher than those with manual transmission?
Question 2 Simple Linear Regression model Topic 8 (7 marks)
Your friend or relative asks you how the value of the car that they decide to purchase will depreciate in value.
Use Age (independent variable) and Price (dependent variable) to model the relationship between age of a used car and its price.
Then to provide an answer on how how the value of the car that your friend or relative decides to purchase will depreciate in value explore this relationship by
- Plotting the data with a scatter plot.
- Calculating the least squares regression line, correlation coefficient and coefficient of determination.
Question 3 Multiple Linear Regression model Topic 9 (9 marks)
Your relative or friend now wants to know what other factors may have an influence on price.
To explore this add Kilometres and Transmission as additional independent variables to the regression model developed in Question 2. Then explore the relationship between these variables by
- Calculating the multiple regression equation, multiple correlation coefficient, and coefficient of multiple determination
- Using appropriate tests to determine which independent variables make a significant contribution to the regression model.
Hence, determine which independent variables to include in your model.
- You may need to transform or manipulate the given data, before using Excel for the corresponding statistical calculations.
- Use Excel for the statistical calculations. You do not need to repeat any Excel calculations by hand. However, make sure that you define your random variables and include any steps not given by Excel. For example, in a hypothesis test include the null and alternative hypotheses, along with the decision to reject or not reject the null hypothesis.
- Mention any assumptions you need to make.
- In Question 2 fit a linear model even if from your scatter plot you decide that a non-linear relationship better fits the data or that no apparent relationship exists. However, mention this in your written answer and/or corresponding appendix.
- In Statistical Question 3 while there may be interaction between independent variables, you are not required to add interaction terms to your model or test for interaction.
- Similarly in Statistical Question 3 while there may be collinearity of pairs of independent variables, you are not required to consider this or calculate a variance inflation factor (VIF).
- Comment on why a test or interval has been chosen
- Make sure you interpret intervals and write a conclusion to hypothesis tests.
Task 2 – Part C - Written Answer – Emails or Letter (14 marks)
For each question present the results of your calculations, with your interpretation and conclusion, as part of a letter or email to your friend or relative.
Use the instructions given on pages four and five of the Part C coversheets.
This should be 500 to 900 words and three to seven pages.
It should be submitted as a Word file with Excel output embedded.
Make sure you:
- Introduce each question and put it in context.
- Answer the questions in non-statistical language.
- Present the result of your procedures, intervals and/or tests without unnecessary statistical jargon.
- Include conclusions which answer the given questions.
In particular, for Question 2:
- Explain the choice of independent and dependent variables.
- Include your graph.
- From your scatter plot discuss any apparent relationship between age and price. Comment on the strength, shape and sign of the relationship.
- Interpret the gradient and vertical intercept of the simple linear regression equation.
- Discuss and interpret the values of correlation coefficient and coefficient of determination. In particular, are these values consistent with your graph.
- Mention any concerns you may have about the validity of your results due do a non-linear relationship, extreme values etc.
- Provide an answer on how how the value of the car that your friend or relative decides to purchase will depreciate in value
In particular, for Question 3
- Interpret the values of the multiple regression coefficients. Compare these with the corresponding values in the simple linear regression model.
- Discuss and interpret the values of the multiple correlation coefficient and coefficient of multiple determination. In particular, compare these with the corresponding values for the simple linear regression model.
- Include and justify a recommendation on which independent variables to include in your model.
Marking Criteria – Part C
Read these marking criteria carefully and consider them when preparing your Part C submission. See the marking and feedback sheet, page 3 Part C coversheets, for allocation of marks.
- For the statistical inference calculations (Questions 1 and 3) marks will be given for:
- Choice of appropriate statistical technique/s.
- Random variable/s defined.
- Correct hypotheses for any tests.
- Correct statistical calculations, including Excel.
- Correct interpretation of results.
- To obtain full marks your graph (Question 2) must be correct, including correct labels on both axes and a title. Marks will be deducted if:
- Graph incorrect.
- Excel not used.
- Axes incorrectly or not labelled.
- Incorrect independent and dependent variables.
- No title.
- Scale on axes distorts graphs.
- For the regression and correlation coefficients (Questions 2 and 3) use either:
- The Regression command in Data Analysis and copy resultant tables.
- Or the simple/multiple regression command in PhStat and copy the resultant tables.
- Or for simple linear regression (Question 2) insert a trendline on a scatter plot, with both the equation and value showing; you will then need to manually calculate value of r.
- For the regression and correlation coefficients (Questions 2 and 3) marks will be deducted if Excel is not used and also for incorrect equations or coefficients, so check:
- Your independent and dependent variables.
- Your sample size.
Written Answer - Emails or Letter
- 500 to 900 words and three to seven pages - marks will be deducted if this is greatly exceeded.
- To obtain full marks must:
- Be well structured and analysed.
- Clearly communicate the results of the Excel output in language appropriate for your audience.
- Include an introduction to each question and your conclusions.
- Include appropriate Excel output.
- Answer the questions in non-statistical language.
- Marks will be deducted if:
- There is little or no comment on, or interpretation of, the Excel output.
- Unnecessary statistical jargon and equations appear.
- It is confusing or not readable.
- For each major spelling and/or grammatical error half a mark will be deducted, up to a maximum of two marks.
- Also up to two marks may be deducted for poor structure and presentation.