**Data Analysis #4**

**ST314**

**Total: 40 points**

**Due: Tuesday November 17 ^{th} at 11:59pm**

* *

*Please Download, complete and upload as PDF or Word Document in Canvas by the due date.*

*No other format will be accepted. Typing or entering answers by hand is accepted as long as solutions are neatly given and document is uploaded as PDF. *

*If you download this Data Analysis as a Word Document, you may type out your answers below each question as they appear in the assignment or type in your answer in the space provided.**If you are downloading this Data Analysis as a PDF, you can write out your answers in the space provided.**If you are using a separate sheet of paper or word document to type out your answers, please make sure answers are clearly labeled.*

* *

*Material from Week 6 and 7 Course Notes, Chapters 9 and 10 in the text and R Code provided below and in instructions on Canvas are covered on this analysis.*

* *

### Part I of Data Analysis 4 – Two Sample T Test

# Upload data set called EPAFE2016Data.

fueldata = read.csv(file.choose(), header= TRUE)

head(fueldata)

# Make a box plot to compare the combined fuel efficiency of cars made by American #companies and international companies for 2016.

boxplot(fueldata$CombFE~fueldata$International, horizontal = TRUE, col= “light green”, main = “Combined Fuel Efficiency for 2016 Vehicles:

American vs International Car companies”)

# Get Summary Statistics

# American Car Companies Average, Standard Deviation and Sample Size

USMean = mean(fueldata$CombFE[fueldata$International==”American”])

USMean

USStdDev = sd(fueldata$CombFE[fueldata$International==”American”])

USStdDev

USsamplesize = length(fueldata$CombFE[fueldata$International==”American”])

USsamplesize

# International Car Companies Average, Standard Deviation and Sample Size

INTMean = mean(fueldata$CombFE[fueldata$International==”International”])

INTMean

INTStdDev = sd(fueldata$CombFE[fueldata$International==”International”])

INTStdDev

INTsamplesize = length(fueldata$CombFE[fueldata$International==”International”])

INTsamplesize

# Perform a Two Sample T Test

t.test(fueldata$CombFE~fueldata$International, conf.level=0.90)

#####################################################################

# OPTIONAL Part 2 Data Table

#####################################################################

# Create a 2X2 table of International vs Guzzler Status.

table(fueldata$International, fueldata$Guzzler)

#####################################################################

# Part 3 ANOVA

#####################################################################

# Creat a boxplot of Estimated Annual Fuel Cost vs Fuel Types.

boxplot(fueldata$EPACalculatedAnnualFuelCost~fueldata$Fuel, horizontal = TRUE, col= “light blue”, main = “Estimated Annual Fuel Costs

for 2016 Vehicles among Fuel Types”)

# Test whether any means differ from each other with an Overall F test.

mod = aov(fueldata$EPACalculatedAnnualFuelCost~fueldata$Fuel)

summary(mod)

# IF the F statistic in the ANOVA is significant. Perform a multiple comparisons test to see #which fuel types are significantly different.

# Tukeys multple comparisons test

TukeyHSD(mod, conf.level = 0.95)

** **

Before you begin this analysis you need to read the instructions and obtain the data set under the Data Analysis #4 link in Canvas.

**Part I:** (20 points) Each year the EPA does an analysis on the current models of vehicles sold in the United States. The data provided in the data set EpaFE2016Data.csv is a subset of this analysis, if you are curious you may access the full data set from the EPA website http://www.fueleconomy.gov/feg/download.shtml.

Using the data set EpaFE2016Data.csv, compare the average combined (city and highway) fuel efficiency of the 2016 car models for American car companies and International car companies.

*Do these data provide strong evidence of a difference between the average fuel efficiency of cars made from US companies and International Companies?*

Assume that conditions for inference are satisfied. Use a significance level of 0.10.

Use the R code and instructions under Data Analysis #4 link in Canvas to obtain descriptive statistics, a side by side box plot and results from a two sample t test.

- (3 points) Include a side-by-side box plot of the data.
- Is there visual evidence that the average fuel efficiency is different between the US and International car companies? Explain.

- (2 points) Provide an organized table of the summary statistics. Include the sample means, standard deviations and sample sizes.
- (2 points) State the null and alternative hypothesis to answer the question of interest.
- (3 points) From the summary statistics, calculate the test statistic and degrees of freedom “by hand”.
*Show work. Conservative degrees of freedom are okay.* - (1 points) Obtain a p-value based on your calculated test statistic and degrees of freedom from a t table.
*(You may use either method to get your df)*. Show work. - (3 points) From the summary statistics, calculate the 90% Confidence Interval “by hand”. Show work.
- (2 points) Obtain a p-value from t test and confidence interval using R. Paste the output. Are your answers different? Why, yes/no?
- (4 points) Using the R output (from g) give a
(shown in notes) and thoroughly answer the question of interest.__four part conclusion__ - Optional: Append Code to the end of your document. Not graded.

**Part II. (11 points)** In 1978 congress established a Gas Guzzler Tax to discourage the production and purchase of fuel-inefficient vehicles. Every vehicle currently produced is labeled as a Guzzler if its fuel efficiency in MPG is below a certain amount. Trucks, SUVs and Minivans were uncommon in 1978 so are exempt from the Tax. Read more: http://www.epa.gov/fueleconomy/guzzler/

From the data set EpaFE2016Data.csv only 4 out of the 249 vehicles made from American companies are considered “Guzzlers” whereas 47 out of the 633 vehicles made from International companies are considered “Guzzlers”.

** Is there evidence the proportion Guzzlers among the of the 2016 car models for American and International car companies is different?** Use a significance level of 0.01

- (1 points) State the null and alternative hypothesis to answer the question of interest.
- (2 point) Check Conditions. If they are not met state so and why.
*Proceed either way.* - (2 points) Calculate the test statistic.
*Show work.* - (1 points) Obtain a p-value based on your calculated test statistic.
- (2 points) Calculate the 99% Confidence Interval “by hand”. Show work.
- (3 points) Give a
(shown in notes) and thoroughly answer the question of interest.__four part conclusion__

** **

**Part III. (9 points) **Included in the EPA analysis of 2016 vehicles is a variable that estimates the annual fuel cost, it is assumed that required fuel types for each vehicle might have an impact on the annual fuel cost. There are five different fuel types:

DU = Diesel, ultra low sulfur (15 ppm, maximum)

G = Gasoline (Regular Unleaded Recommended)

GM = Gasoline (Mid Grade Unleaded Recommended)

GP = Gasoline (Premium Unleaded Recommended)

GPR = Gasoline (Premium Unleaded Required)

*Does the EPA data provide evidence of a difference between at least one average annual fuel cost between fuel types: DU, G, GM, GP and GPR? If so which are different? *

Assume that conditions for inference are satisfied. Use a significance level of 0.05.

Use the R code under the Data Analysis #4 Instructions to obtain graphical display, perform a Single Factor ANOVA F TEST and test of multiple comparisons.

- (2 points) From the side-by-side box plot does there look to be a difference in the average annual fuel cost among the different fuel types? Include the plot and explain your reasoning.
- (2 points) State the appropriate null and alternative hypothesis for the ANOVA F test.
- (3 points) Use the F statistic and p-value from the ANOVA table to state whether there is a significant difference between at least two of the fuel types average annual fuel cost.
- Paste R output.
- Include a statement in regards to your significance level.
- Include a statement in terms of the strength of evidence in terms of the alternative.

- (2 points) Using the Tukey’s Multiple Comparison procedure output. Are there any individual comparisons that are significant at the 0.05 significance level?
- Paste R output.
- List all comparisons that are significant. Which significant comparison has the largest difference in annual cost? Give the estimate.p(3)
**Place your order now to enjoy great discounts on this or a similar topic.****People choose us because we provide:**Essays written from scratch, 100% original,

Delivery within deadlines,

Competitive prices and excellent quality,

24/7 customer support,

Priority on their privacy,

Unlimited free revisions upon request, and

Plagiarism free work,