dataset <- read.csv("MarketPlace_Sales_Data.csv")
dataset <- dataset[, 1:11]
head(dataset, 10)
## Sales Ad_Spend Email_Campaigns Website_Traffic Avg_Rating Discount_Pct
## 1 431.59 9.84 14 257.97 3.85 9.38
## 2 502.24 9.55 7 255.41 3.40 10.01
## 3 408.97 18.55 8 236.68 2.63 2.39
## 4 300.93 56.57 7 56.35 4.10 13.89
## 5 294.82 14.00 13 122.79 3.99 4.21
## 6 360.82 27.15 5 152.42 4.76 22.17
## 7 347.69 10.30 9 261.69 4.16 4.70
## 8 276.15 6.79 9 167.70 3.71 22.55
## 9 393.03 60.54 9 130.73 3.91 6.56
## 10 208.80 14.11 4 99.58 4.41 8.82
## Competitor_Price_Index Inventory_Level Social_Media_Engagement Season
## 1 85.62 697 27.26 Winter
## 2 105.69 625 67.88 Winter
## 3 106.04 874 40.99 Spring
## 4 103.77 872 38.75 Spring
## 5 112.59 1203 58.97 Spring
## 6 114.66 545 41.46 Summer
## 7 94.00 1729 16.90 Summer
## 8 83.50 677 48.91 Summer
## 9 115.98 616 48.47 Fall
## 10 90.63 1233 65.41 Fall
## Product_Type
## 1 Home
## 2 Home
## 3 Home
## 4 Home
## 5 Home
## 6 Home
## 7 Home
## 8 Home
## 9 Home
## 10 Home
dataset$Season_Spring <- ifelse(dataset$Season == "Spring", 1, 0)
dataset$Season_Summer <- ifelse(dataset$Season == "Summer", 1, 0)
dataset$Season_Fall <- ifelse(dataset$Season == "Fall", 1, 0)
Interpretation: Choosing ‘Winter’ as the reference category is a strategic decision. Winter often represents the peak sales season for an e-commerce company, likely driven by holiday shopping. By setting this peak season as the reference, the model’s coefficients for Season_Spring, Season_Summer, and Season_Fall will directly measure sales performance relative to the company’s most critical period, making the seasonal impact immediately clear.
dataset$Product_Type_Home <- ifelse(dataset$Product_Type == "Home", 1, 0)
dataset$Product_Type_Fashion <- ifelse(dataset$Product_Type == "Fashion", 1, 0)
Interpretation: Selecting ‘Electronics’ as the reference category is a strategic decision. Electronics are often high-ticket, high-volume products that can serve as a primary driver of revenue and website traffic. By using Electronics as the baseline, the coefficients for Product_Type_Home and Product_Type_Fashion will directly measure how their sales performance compares against this core, high-revenue category. This allows MarketPlace to quickly assess if other categories are underperforming or overperforming relative to this key business driver.
Model1 <- lm(Sales ~ Ad_Spend + Email_Campaigns + Website_Traffic +
Avg_Rating + Discount_Pct + Competitor_Price_Index +
Inventory_Level + Social_Media_Engagement +
Season_Spring + Season_Summer + Season_Fall +
Product_Type_Home + Product_Type_Fashion,
data = dataset)
summary(Model1)
##
## Call:
## lm(formula = Sales ~ Ad_Spend + Email_Campaigns + Website_Traffic +
## Avg_Rating + Discount_Pct + Competitor_Price_Index + Inventory_Level +
## Social_Media_Engagement + Season_Spring + Season_Summer +
## Season_Fall + Product_Type_Home + Product_Type_Fashion, data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -94.160 -16.390 0.177 16.939 75.765
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.730e+01 5.019e+00 11.418 < 2e-16 ***
## Ad_Spend 2.482e+00 1.860e-02 133.435 < 2e-16 ***
## Email_Campaigns 1.038e+00 1.425e-01 7.282 4.02e-13 ***
## Website_Traffic 1.201e+00 4.612e-03 260.385 < 2e-16 ***
## Avg_Rating 1.475e+01 8.360e-01 17.643 < 2e-16 ***
## Discount_Pct 2.066e-01 6.316e-02 3.271 0.00108 **
## Competitor_Price_Index 4.101e-01 2.668e-02 15.370 < 2e-16 ***
## Inventory_Level -4.879e-02 9.002e-04 -54.198 < 2e-16 ***
## Social_Media_Engagement 2.207e-01 2.136e-02 10.331 < 2e-16 ***
## Season_Spring -1.977e+01 1.111e+00 -17.794 < 2e-16 ***
## Season_Summer -3.034e+01 1.155e+00 -26.258 < 2e-16 ***
## Season_Fall -1.989e+01 1.315e+00 -15.125 < 2e-16 ***
## Product_Type_Home -2.972e+01 9.825e-01 -30.246 < 2e-16 ***
## Product_Type_Fashion -4.469e+01 1.018e+00 -43.901 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.31 on 3585 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9632, Adjusted R-squared: 0.9631
## F-statistic: 7218 on 13 and 3585 DF, p-value: < 2.2e-16
coef_table <- summary(Model1)$coefficients
kable(coef_table,
digits = 3,
caption = "Model 1: Full Regression Model of Monthly Sales",
col.names = c("Estimate", "Std. Error", "t-value", "p-value"))
| Estimate | Std. Error | t-value | p-value | |
|---|---|---|---|---|
| (Intercept) | 57.302 | 5.019 | 11.418 | 0.000 |
| Ad_Spend | 2.482 | 0.019 | 133.435 | 0.000 |
| Email_Campaigns | 1.038 | 0.143 | 7.282 | 0.000 |
| Website_Traffic | 1.201 | 0.005 | 260.385 | 0.000 |
| Avg_Rating | 14.749 | 0.836 | 17.643 | 0.000 |
| Discount_Pct | 0.207 | 0.063 | 3.271 | 0.001 |
| Competitor_Price_Index | 0.410 | 0.027 | 15.370 | 0.000 |
| Inventory_Level | -0.049 | 0.001 | -54.198 | 0.000 |
| Social_Media_Engagement | 0.221 | 0.021 | 10.331 | 0.000 |
| Season_Spring | -19.766 | 1.111 | -17.794 | 0.000 |
| Season_Summer | -30.335 | 1.155 | -26.258 | 0.000 |
| Season_Fall | -19.895 | 1.315 | -15.125 | 0.000 |
| Product_Type_Home | -29.716 | 0.982 | -30.246 | 0.000 |
| Product_Type_Fashion | -44.693 | 1.018 | -43.901 | 0.000 |
dat_used <- model.frame(Model1)
# Components for the Global F-test
n <- nrow(dat_used) # Total number of observations used in the model
k <- 13 # Number of predictors in the model
y <- dat_used$Sales # The Sales values corresponding to the NA-free rows
# Sum of Squares
sst <- sum( (y - mean(y))^2 ) # Total Sum of Squares (based on used data)
sse <- sum( residuals(Model1)^2 ) # Error/Residual Sum of Squares
ssr <- sst - sse # Regression Sum of Squares (SST - SSE)
# Degrees of Freedom
df_reg <- k # k
df_res <- n - k - 1 # n - k - 1
df_tot <- n - 1 # n - 1
# Mean Squares and F-statistic
msr <- ssr / df_reg
mse <- sse / df_res
Fval <- msr / mse
pval <- pf(Fval, df_reg, df_res, lower.tail = FALSE)
kable(
data.frame(
Source = c("Regression","Residuals","Total"),
Df = c(df_reg, df_res, df_tot),
`Sum Sq`= c(ssr, sse, sst),
`Mean Sq`= c(msr, mse, NA),
`F value`= c(Fval, NA, NA),
`Pr(>F)`= c(pval, NA, NA)
),
digits = 4,
caption = "ANOVA Table with Single Overall F (Multiple Regression Model)"
)
| Source | Df | Sum.Sq | Mean.Sq | F.value | Pr..F. |
|---|---|---|---|---|---|
| Regression | 13 | 55456265 | 4265866.5750 | 7218.238 | 0 |
| Residuals | 3585 | 2118679 | 590.9845 | NA | NA |
| Total | 3598 | 57574945 | NA | NA | NA |
qf(0.95, df1 = 13, df2 = 3585)
## [1] 1.722883
Statistical Conclusion: The calculated F-statistic of 7218.24 is vastly greater than the critical F-value of 1.72. Correspondingly, the p-value is essentially zero, which is far less than the significance level of \(\alpha = 0.05\). Therefore, we reject the null hypothesis (\(H_0\)).
Interpretation for Tracy’s Team: The rejection of the null hypothesis is a crucial first step. It provides strong statistical evidence that the overall regression model is valid and that the chosen set of predictors, when taken together, has a highly significant ability to explain the variation in monthly sales.
Business Implications: This finding is a major validation of the analytics team’s efforts. It confirms for the executive team that the factors being tracked, from marketing spend to operational metrics, are indeed related to revenue performance. This result provides Tracy with a green light to proceed with the analysis, confident that the model is a reliable tool for identifying key sales drivers and informing strategic decisions, such as the $20 million marketing budget allocation.
summary_m1 <- summary(Model1)
coef_table <- summary_m1$coefficients
p_values <- coef_table[, 4]
significance_indicator <- ifelse(p_values < 0.05, "Yes", "No")
coef_report_df <- data.frame(
Variable = rownames(coef_table),
Estimate = coef_table[, 1],
Std_Error = coef_table[, 2],
t_statistic = coef_table[, 3],
P_value = p_values,
Significant_at_0_05 = significance_indicator,
row.names = NULL # Remove default R row names
)
kable(
coef_report_df,
digits = 4,
caption = "Comprehensive Coefficient Table for Model 1",
align = 'lrrrrr'
)
| Variable | Estimate | Std_Error | t_statistic | P_value | Significant_at_0_05 |
|---|---|---|---|---|---|
| (Intercept) | 57.3022 | 5.0186 | 11.4179 | 0.0000 | Yes |
| Ad_Spend | 2.4819 | 0.0186 | 133.4355 | 0.0000 | Yes |
| Email_Campaigns | 1.0378 | 0.1425 | 7.2821 | 0.0000 | Yes |
| Website_Traffic | 1.2010 | 0.0046 | 260.3848 | 0.0000 | Yes |
| Avg_Rating | 14.7494 | 0.8360 | 17.6427 | 0.0000 | Yes |
| Discount_Pct | 0.2066 | 0.0632 | 3.2713 | 0.0011 | Yes |
| Competitor_Price_Index | 0.4101 | 0.0267 | 15.3698 | 0.0000 | Yes |
| Inventory_Level | -0.0488 | 0.0009 | -54.1975 | 0.0000 | Yes |
| Social_Media_Engagement | 0.2207 | 0.0214 | 10.3307 | 0.0000 | Yes |
| Season_Spring | -19.7663 | 1.1108 | -17.7944 | 0.0000 | Yes |
| Season_Summer | -30.3354 | 1.1553 | -26.2581 | 0.0000 | Yes |
| Season_Fall | -19.8947 | 1.3154 | -15.1247 | 0.0000 | Yes |
| Product_Type_Home | -29.7157 | 0.9825 | -30.2464 | 0.0000 | Yes |
| Product_Type_Fashion | -44.6931 | 1.0180 | -43.9009 | 0.0000 | Yes |
ad_spend_row <- coef_table["Ad_Spend", ]
t_statistic_ad_spend <- ad_spend_row["t value"]
p_value_ad_spend <- ad_spend_row["Pr(>|t|)"]
print(paste("Ad_Spend t-statistic:", round(t_statistic_ad_spend, 4)))
## [1] "Ad_Spend t-statistic: 133.4355"
print(paste("Ad_Spend p-value:", format(p_value_ad_spend, scientific = TRUE)))
## [1] "Ad_Spend p-value: 0e+00"
Statistical Conclusion: Since the p-value (approx. 0) is far less than the significance level of \(\alpha = 0.05\), we reject the null hypothesis (\(H_0\)). There is overwhelming statistical evidence that advertising spend has a significant linear relationship with sales, even after accounting for all other factors.
Business Meaning (Advertising Effectiveness): This result provides a clear, data-driven justification for the marketing budget. It demonstrates that advertising expenditures are not just a cost center but a significant and positive driver of revenue. This finding directly addresses the CFO’s concerns, confirming that marketing expenditures are generating statistically significant returns.
coef_ad_spend <- coef_table["Ad_Spend", "Estimate"]
cat("Coefficient Value:", coef_ad_spend)
## Coefficient Value: 2.481939
Interpretation: For every $1,000 increase in
monthly digital advertising spend (a 1-unit increase in
Ad_Spend), sales are predicted to increase
by $2,481.90, holding all other variables constant.
Business Implication: This coefficient suggests a Return on Ad Spend (ROAS) of approximately 2.48:1. For every $1 spent on advertising, MarketPlace generates about $2.48 in sales revenue, which is a strong, positive return. This evidence provides a clear justification for the marketing budget and supports the case for increasing ad spend, as it appears to be a highly effective revenue driver.
coef_avg_rating <- coef_table["Avg_Rating", "Estimate"]
cat("Coefficient Value:", coef_avg_rating)
## Coefficient Value: 14.74939
Interpretation: For every one-point increase in the average customer product rating (on its 1-5 scale), monthly sales are predicted to increase by $14,749.40, holding all other variables constant.
Business Implication (Importance of Customer Satisfaction): This result quantifies the immense value of customer satisfaction and product quality. It shows that a seemingly small improvement in customer ratings—which might be the difference between an average 3.5-star product and a 4.5-star product—can have a substantial impact on revenue.
coef_season_summer <- coef_table["Season_Summer", "Estimate"]
cat("Coefficient Value:", coef_season_summer)
## Coefficient Value: -30.3354
Interpretation: Compared to Winter (the reference season), Summer sales are predicted to be lower by $30,335.40 per month, controlling for all other factors.
Business Implication (Seasonal Strategy): This quantifies a significant and costly seasonal slump. The model predicts that, even for the exact same product with the same marketing support, sales will be substantially lower in the summer.
coef_product_fashion <- coef_table["Product_Type_Fashion", "Estimate"]
cat("Coefficient Value:", coef_product_fashion)
## Coefficient Value: -44.69313
Interpretation: Compared to Electronics (the reference category), Fashion products are predicted to have lower sales by $44,693.10 per month, holding everything else constant.
Business Implication (Product Strategy): This highlights a major structural challenge. The model shows that even if a Fashion product and an Electronics product had the exact same ad spend, website traffic, discount percentage, and customer rating, the Fashion product is still predicted to sell $44.69k less.
Model2 <- lm(Sales ~ Season_Spring + Season_Summer + Season_Fall + Product_Type_Home +
Product_Type_Fashion +
Ad_Spend + Website_Traffic + Avg_Rating, data = dataset)
summary(Model2)
##
## Call:
## lm(formula = Sales ~ Season_Spring + Season_Summer + Season_Fall +
## Product_Type_Home + Product_Type_Fashion + Ad_Spend + Website_Traffic +
## Avg_Rating, data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -136.760 -22.003 1.224 23.940 98.302
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.17274 5.30226 13.80 <2e-16 ***
## Season_Spring -20.06259 1.55018 -12.94 <2e-16 ***
## Season_Summer -31.69613 1.61159 -19.67 <2e-16 ***
## Season_Fall -20.40885 1.83441 -11.13 <2e-16 ***
## Product_Type_Home -28.28912 1.37072 -20.64 <2e-16 ***
## Product_Type_Fashion -43.43353 1.42047 -30.58 <2e-16 ***
## Ad_Spend 2.48348 0.02597 95.62 <2e-16 ***
## Website_Traffic 1.20136 0.00644 186.56 <2e-16 ***
## Avg_Rating 14.29123 1.16700 12.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.95 on 3590 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9281, Adjusted R-squared: 0.928
## F-statistic: 5796 on 8 and 3590 DF, p-value: < 2.2e-16
coef_table <- summary(Model2)$coefficients
kable(coef_table,
digits = 3,
caption = "Model 2: Regression Model of Monthly Sales with Reduced Predictors",
col.names = c("Estimate", "Std. Error", "t-value", "p-value"))
| Estimate | Std. Error | t-value | p-value | |
|---|---|---|---|---|
| (Intercept) | 73.173 | 5.302 | 13.800 | 0 |
| Season_Spring | -20.063 | 1.550 | -12.942 | 0 |
| Season_Summer | -31.696 | 1.612 | -19.668 | 0 |
| Season_Fall | -20.409 | 1.834 | -11.126 | 0 |
| Product_Type_Home | -28.289 | 1.371 | -20.638 | 0 |
| Product_Type_Fashion | -43.434 | 1.420 | -30.577 | 0 |
| Ad_Spend | 2.483 | 0.026 | 95.620 | 0 |
| Website_Traffic | 1.201 | 0.006 | 186.560 | 0 |
| Avg_Rating | 14.291 | 1.167 | 12.246 | 0 |
se_m1 <- summary(Model1)$sigma
print(se_m1)
## [1] 24.31017
se_m2 <- summary(Model2)$sigma
print(se_m2)
## [1] 33.94863
adj_r_sq_m1 <- summary(Model1)$adj.r.squared
print(adj_r_sq_m1)
## [1] 0.9630679
adj_r_sq_m2 <- summary(Model2)$adj.r.squared
print(adj_r_sq_m2)
## [1] 0.9279769
Recommendation: Model 1 is the strongly recommended model for Tracy’s team to use for business decisions.
Justification:
Discount_Pct in the model, the team cannot analyze the
effectiveness of promotional pricing strategies.Ad_Spend, Email_Campaigns, and
Social_Media_Engagement individually. The completeness of
Model 1 is essential for this strategic analysis and far outweighs the
minor parsimony of Model 2.new_product_data <- data.frame(
Ad_Spend = 25,
Email_Campaigns = 10,
Website_Traffic = 150,
Avg_Rating = 4.5,
Discount_Pct = 15,
Competitor_Price_Index = 105,
Inventory_Level = 1000,
Social_Media_Engagement = 70,
# Set dummy variables according to Season = Fall and Product_Type = Home
Season_Spring = 0, # Not Spring
Season_Summer = 0, # Not Summer
Season_Fall = 1, # Is Fall
Product_Type_Home = 1, # Is Home
Product_Type_Fashion = 0 # Not Fashion
)
point_prediction <- predict(Model1, newdata = new_product_data)
print(paste("Predicted Sales: $", round(point_prediction, 2), "thousand"))
## [1] "Predicted Sales: $ 339.46 thousand"
Finding: The predicted monthly sales for this new product is $339,460.
# 95% Prediction Interval for a single new product's sales
prediction_interval <- predict(Model1, newdata = new_product_data, interval = "prediction", level = 0.95)
print(prediction_interval)
## fit lwr upr
## 1 339.4607 291.7286 387.1928
Interpretation Statement: We are 95% confident that the actual sales for this single new product in its first month will fall between $291,730 and $387,190.
Business Planning Explanation:
# 95% Confidence Interval for the average sales of products with these characteristics
confidence_interval <- predict(Model1, newdata = new_product_data, interval = "confidence", level = 0.95)
print(confidence_interval)
## fit lwr upr
## 1 339.4607 336.8961 342.0253
Interpretation Statement: The 95% confidence interval for the average sales of all products with these characteristics is from $336,900 to $342,025.
Strategic Planning Explanation:
Full Model (Model 1) Equation: \(Sales = \beta_0 + \beta_1 \text{Ad_Spend} + \beta_2 \text{Email_Campaigns} + \beta_3 \text{Website_Traffic} + \beta_4 \text{Avg_Rating} + ... + \beta_{13} \text{Product_Type_Fashion} + \epsilon\)
Reduced Model (Model 3 - excludes the 3 marketing variables) Equation: \(Sales = \beta_0 + \beta_3 \text{Website_Traffic} + \beta_4 \text{Avg_Rating} + \beta_5 \text{Discount_Pct} + ... + \beta_{13} \text{Product_Type_Fashion} + \epsilon\)
Statistical Hypotheses:
Business Interpretation:
\(H_0\): After accounting for website traffic, ratings, pricing, inventory, season, and product type, the combined effect of Ad Spend, Email Campaigns, and Social Media Engagement does not significantly help explain sales. Essentially, these specific marketing activities, as a group, add no extra explanatory value.
\(H_1\): At least one of these three marketing activities significantly adds to the explanation of sales, even when controlling for all other factors in the model. The marketing portfolio provides significant additional insight.
Model3_Reduced <- lm(Sales ~ Website_Traffic + Avg_Rating + Discount_Pct +
Competitor_Price_Index + Inventory_Level + Season_Spring +
Season_Summer + Season_Fall + Product_Type_Home +
Product_Type_Fashion,
data = dataset)
summary_m3 <- summary(Model3_Reduced)
r_sq_m3 <- summary_m3$r.squared
sse_m3 <- sum(residuals(Model3_Reduced)^2)
cat("R-squared for Model 3 (Reduced):", round(r_sq_m3, 4), "\n")
## R-squared for Model 3 (Reduced): 0.7791
cat("SSE for Model 3 (SSE_R):", round(sse_m3, 2), "\n")
## SSE for Model 3 (SSE_R): 12716986
partial_f_test_result <- anova(Model3_Reduced, Model1)
print(partial_f_test_result)
## Analysis of Variance Table
##
## Model 1: Sales ~ Website_Traffic + Avg_Rating + Discount_Pct + Competitor_Price_Index +
## Inventory_Level + Season_Spring + Season_Summer + Season_Fall +
## Product_Type_Home + Product_Type_Fashion
## Model 2: Sales ~ Ad_Spend + Email_Campaigns + Website_Traffic + Avg_Rating +
## Discount_Pct + Competitor_Price_Index + Inventory_Level +
## Social_Media_Engagement + Season_Spring + Season_Summer +
## Season_Fall + Product_Type_Home + Product_Type_Fashion
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3588 12716986
## 2 3585 2118679 3 10598306 5977.8 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
partial_F_stat <- partial_f_test_result$F[2]
partial_p_value <- partial_f_test_result$`Pr(>F)`[2]
cat("\nPartial F-statistic:", round(partial_F_stat, 4), "\n")
##
## Partial F-statistic: 5977.769
cat("p-value:", format(partial_p_value, scientific = TRUE, digits=4), "\n")
## p-value: 0e+00
Statistical Conclusion: Since the p-value (essentially 0) is much smaller than the significance level \(\alpha = 0.05\), we reject the null hypothesis (\(H_0\)).
Business Recommendation to the CFO: The partial F-test provides extremely strong statistical evidence (F = 5977.8, p < 0.001) that the group of marketing variables (Ad Spend, Email Campaigns, Social Media Engagement) jointly adds significant explanatory power to the sales model, even after considering all other factors. This means these marketing activities, taken together, are crucial drivers of sales performance. The recommendation is to maintain or potentially increase investment in these areas, as they are demonstrably linked to revenue. Reducing this budget would likely lead to a significant decrease in sales, according to the model.
Full Model (Model 1): Includes all 13 predictors.
Reduced Model (Model 3): Excludes the 3 marketing variables (Ad_Spend, Email_Campaigns, Social_Media_Engagement). This is the same reduced model used in the Part 7 partial F-test.
sse_full_model1 <- sum(residuals(Model1)^2)
cat("SSE for the Full Model (SSE_F - Model 1):", round(sse_full_model1, 2))
## SSE for the Full Model (SSE_F - Model 1): 2118679
sse_reduced_model3 <- sum(residuals(Model3_Reduced)^2)
cat("SSE for the Reduced Model (SSE_R - Model 3):", round(sse_reduced_model3, 2))
## SSE for the Reduced Model (SSE_R - Model 3): 12716986
partial_r_squared_marketing <- (sse_reduced_model3 - sse_full_model1) / sse_reduced_model3
cat("Partial R^2 =",
paste0("(", round(sse_reduced_model3, 2), " - ", round(sse_full_model1, 2), ") / ",
round(sse_reduced_model3, 2)))
## Partial R^2 = (12716985.85 - 2118679.48) / 12716985.85
cat("Coefficient of Partial Determination for Marketing Variables:", round(partial_r_squared_marketing, 4))
## Coefficient of Partial Determination for Marketing Variables: 0.8334
Interpretation: After controlling for all other variables in the model (including website traffic, average rating, pricing, inventory, season, and product type), the three main marketing controllable variables (Ad Spend, Email Campaigns, and Social Media Engagement) collectively explain 83.34% of the sales variance that those other factors failed to account for. This extremely high value demonstrates that our marketing activities provide substantial and unique insight into sales performance. They are not redundant with other metrics.
Business Decision-Making Implication: This is a very high value and indicates that these three marketing activities provide substantial and unique explanatory power beyond what the other operational and market factors capture. Removing them would significantly weaken the model’s ability to explain sales variations. Therefore, these marketing variables are critically important, provide unique insights, and should absolutely remain in the model used for analysis and decision-making regarding budget allocation and sales forecasting. They represent key levers the company can pull to influence revenue.
coefs_model1 <- summary(Model1)$coefficients
season_coeffs <- c(
Winter = 0, # Reference category explicitly set to 0
Spring = coefs_model1["Season_Spring", "Estimate"],
Summer = coefs_model1["Season_Summer", "Estimate"],
Fall = coefs_model1["Season_Fall", "Estimate"]
)
season_ranking <- sort(season_coeffs, decreasing = TRUE)
season_table <- data.frame(
Season = names(season_ranking),
Expected_Sales_Adjustment_vs_Winter_Thousand_USD = unname(season_ranking),
row.names = NULL
)
kable(season_table,
digits = 2,
caption = "Ranking of Seasons by Expected Sales Impact (Relative to Winter)",
col.names = c("Season", "Sales Adjustment vs. Winter ($k)"),
align = 'lr')
| Season | Sales Adjustment vs. Winter ($k) |
|---|---|
| Winter | 0.00 |
| Spring | -19.77 |
| Fall | -19.89 |
| Summer | -30.34 |
Interpretation: Based on the coefficients of Model 1, holding all other factors constant:
winter_summer_diff <- season_coeffs["Winter"] - season_coeffs["Summer"]
abs(winter_summer_diff)
## Winter
## 30.3354
summer_p_value <- coefs_model1["Season_Summer", "Pr(>|t|)"]
summer_p_value
## [1] 3.82288e-139
Business Implication: The model confirms a significant seasonal slump in Summer compared to the peak Winter season. On average, Summer sales are expected to be about $30.34k lower than Winter sales, even when controlling for marketing, product type, and other factors. This statistically significant difference is crucial for inventory planning (avoiding Summer overstock) and marketing strategy (potentially shifting budget or using targeted promotions to mitigate the Summer downturn).
# H0: Season_Spring - Season_Fall = 0
spring_fall_test <- linearHypothesis(Model1, "Season_Spring - Season_Fall = 0")
spring_fall_p_value <- spring_fall_test$`Pr(>F)`[2]
spring_fall_p_value
## [1] 0.9159895
Business Implication: Although the point estimates differ slightly (-19.77k for Spring vs. -19.89k for Fall), the difference is not statistically significant (p = 0.916). For practical business purposes, MarketPlace can treat Spring and Fall as having a similar impact on sales compared to Winter. This could simplify seasonal planning, allowing for similar inventory levels and marketing approaches during these two “shoulder” seasons.
product_coeffs <- c(
Electronics = 0, # Reference category explicitly set to 0
Home = coefs_model1["Product_Type_Home", "Estimate"],
Fashion = coefs_model1["Product_Type_Fashion", "Estimate"]
)
product_ranking <- sort(product_coeffs, decreasing = TRUE)
product_table <- data.frame(
Product_Type = names(product_ranking),
Sales_Adjustment_vs_Electronics_Thousand_USD = unname(product_ranking),
row.names = NULL
)
kable(product_table,
digits = 2,
caption = "Ranking of Product Types by Expected Sales Impact (Relative to Electronics)",
col.names = c("Product Type", "Sales Adjustment vs. Electronics ($k)"),
align = 'lr')
| Product Type | Sales Adjustment vs. Electronics ($k) |
|---|---|
| Electronics | 0.00 |
| Home | -29.72 |
| Fashion | -44.69 |
Interpretation: Based on Model 1 coefficients, holding all other factors constant:
home_fashion_diff <- product_coeffs["Home"] - product_coeffs["Fashion"]
home_fashion_diff
## Home
## 14.97747
# H0: Product_Type_Home - Product_Type_Fashion = 0
home_fashion_test <- linearHypothesis(Model1, "Product_Type_Home - Product_Type_Fashion = 0")
home_fashion_p_value <- home_fashion_test$`Pr(>F)`[2]
home_fashion_p_value
## [1] 1.435708e-50
Business Implication: Home products are expected to generate significantly more sales (approx. $14.98k more per month) than Fashion products, even after controlling for marketing spend, seasonality, ratings, etc. This difference is statistically significant (p < 0.001).
Should MarketPlace invest more in one category over the other? Not necessarily based solely on this coefficient. While Home goods generate more revenue on average for the same level of other inputs, this doesn’t automatically mean they are more profitable or have higher growth potential. The Fashion category might have higher margins, faster inventory turnover, or appeal to a different, valuable customer segment.
Recommendation: MarketPlace should investigate why Fashion underperforms relative to Home goods in terms of sales volume generated per unit of input (like ad spend or traffic). Is it product assortment, pricing strategy within the category, conversion rates on the website for fashion items, or overall market demand? Simply shifting investment away from Fashion might be premature without understanding the root cause and the category’s overall contribution to profitability and customer acquisition. A deeper dive into the Fashion category’s specific performance metrics is warranted.