show all the descriptive statistics
Q1: By providing example, discuss the roles of decision tree in Big Data Analytics.
Q2: Given the yearly sales in yearly_sales .csv file, complete the following:
Show all the descriptive statistics of sales_total, including its standard deviation and variance.
Correlation of number_of_order to sales_total.
Plot the scatter graph of number_of_order to sales_total.
Perform linear regression of number_of_order to sales_total.
Draw the line of best fit (abline) over your graph.
Perform T test as shown below and show your conclusion.
Perform ANOVA test as shown below and show your conclusion.
T test
This is to test for the mean of one group; here we have sale_total.
t.test(sales_total, mu = 249) # R command for t test
H0:mu = 249 # null hypothesis
H1: mu ≠249 # alternative hypothesis
Confidence level = 0.05
Do not Reject H0 if p-value is <= 0.05
Reject H0 if p-value is > 0.05
ANOVA test
ANOVA is used to test the equality of mean for two groups; here we have Male and Female.
Anova(lm(data = myData, sales_total ~ factor(gender))) # R command for ANOVA
H0: There is significant difference between Male and Female sales_total.
H1: There is no significant difference between Male and Female sales_total.
Confidence level = 0.05
Do not Reject H0 if p-value is <= 0.05
Reject H0 if p-value is > 0.05