Armaan Chowdhry
Product Manager

Case Study: Time Series Analysis of Starbucks Sales Data

Posted 15 December 2022
Monitoring-Auditing

1. Introduction

This study examines Starbucks sales data over time with the primary goal of understanding the trend behavior and identifying drivers behind sales dynamics. Data obtained from WRDS provides quarterly observations, which are utilized to estimate various regression models. Both level and log transformations of sales are considered alongside models that incorporate lagged differences to account for time-dependent effects. This multifaceted approach allows for a deeper insight into how sales evolve and what factors contribute to their variation.

2. Data and Methodology

Data Source and Variables

The analysis uses quarterly data for Starbucks sales with a total of 76 observations in the primary datasets. Key variables include:

  • Sales: Actual sales figures.
  • Log(Sales): Logarithmic transformation of sales to analyze growth in percentage terms.
  • Time: A sequential variable representing the time period.
  • Lagged and Differenced Variables:
    • lag1: The sales value one quarter behind.
    • Diff1: The change calculated as the first difference of lagged sales.
    • lag4_diff: The four-quarter lag on the differenced variable, used as part of a multivariate regression model to assess seasonal or delayed effects.
    • Lagged and Differenced Variables:
  • X Variables: In the final multivariate model, two explanatory variables (denoted as X Variable 1 and X Variable 2) are introduced to capture additional dynamics in the lagged effects.

Methodological Approach

The study estimates four key models:

  • Model 1 (Sales vs. Time) – A simple linear regression exploring how sales change over time.
  • Model 2 (Log(Sales) vs. Time) – A regression where the natural logarithm of sales is modeled as a function of time, indicating the growth rate.
  • Model 3 (Diff1 vs. Time) – A model incorporating the first differenced lag variable to capture short-run adjustments.
  • Model 4 (Multivariate Regression with Lagged Differences) – A regression that uses a four-quarter lag differenced variable along with two predictors (X Variable 1 and X Variable 2) to better explain sales dynamics.
  • For each model, both the regression statistics and the ANOVA results have been carefully examined.

    3. Regression Results

    Model 1: Sales as a Function of Time

    Regression Statistics
    Multiple R 0.979562298
    R Square 0.959542295
    Adjusted R Square 0.958995569
    Standard Error 233.2115638
    Observations 76
    ANOVA
    df SS MS F Significance F
    Regression 1 95454137.77 95454137.77 1755.070623 2.71413E-53
    Residual 74 4024684.878 54387.63348
    Total 75 99478822.65
    Coefficients
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept -428.5281095 54.03477737 -7.930598224 1.75211E-11 -536.1947536 -320.8614653
    time 51.08638972 1.219432916 41.89356303 2.71413E-53 48.6566166 53.51616284
    plot-model-1

    Interpretation: The high R square indicates that approximately 96% of the variability in sales is explained by the time trend. The positive coefficient on time suggests that Starbucks sales are increasing steadily over time.

    Model 2: Log(Sales) as a Function of Time

    Regression Statistics
    Multiple R 0.972249052
    R Square 0.94526822
    Adjusted R Square 0.944528601
    Standard Error 0.248189138
    Observations 76
    ANOVA
    df SS MS F Significance F
    Regression 1 78.72501322 78.72501322 1278.048105 1.96161E-48
    Residual 74 4.558240769 0.061597848
    Total 75 83.28325399
    Coefficients
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 5.130345746 0.057505059 89.2155543 4.44712E-77 5.015764414 5.244927078
    time 0.046394255 0.001297749 35.74979868 1.96161E-48 0.043808434 0.048980076
    plot-model-2

    Interpretation: By transforming sales into its logarithm, the model captures percentage changes rather than absolute levels. The results imply a consistent growth rate over time, with time having a statistically significant positive effect on log(sales).

    Model 3: First Differenced Regression (Diff1)

    Regression Statistics
    Multiple R 0.393399556
    R Square 0.154763212
    Adjusted R Square 0.143023812
    Standard Error 0.076189131
    Observations 74
    ANOVA
    df SS MS F Significance F
    Regression 1 0.076525799 0.076525799 13.18323033 0.000525376
    Residual 72 0.41794442 0.005804784
    Total 73 0.49447022
    Coefficients
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 0.066879078 0.010138108 6.596800656 6.06927E-09 0.046669128 0.087089025
    Diff1 -0.381343864 0.105028181 -3.630871842 0.000525376 -0.590713716 -0.171974012
    plot-model-3

    Interpretation: This model, which focuses on the first difference of lagged sales, reveals a statistically significant negative relationship. Although the overall explanatory power is low (R Square ≈ 15%), the significant coefficient on Diff1 suggests short-term corrections or adjustments in sales levels.

    Model 4: Multivariate Regression with Lagged Differences

    Regression Statistics
    Multiple R 0.903462253
    R Square 0.816244042
    Adjusted R Square 0.810839455
    Standard Error 0.034053583
    Observations 71
    ANOVA
    df SS MS F Significance F
    Regression 2 0.350278222 0.175139111 151.0280144 9.64292E-26
    Residual 68 0.078855963 0.001159647
    Total 70 0.429134185
    Coefficients
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 0.008398948 0.005852967 1.434989891 0.155873564 -0.003280465 0.020078361
    X Variable 1 -0.060218919 0.053957254 -1.116048622 0.268329667 -0.167888943 0.047451105
    X Variable 2 0.804704249 0.052375719 15.3640706 8.36055E-24 0.700190128 0.909218369
    plot-model-4

    Interpretation: The multivariate model explains about 82% of the variability in sales differences, indicating a strong model fit. While X Variable 1 does not have a statistically significant impact, X Variable 2 is highly significant. This suggests that the second predictor—possibly representing a key lagged or seasonal factor—plays an important role in explaining changes in sales.

    4. Discussion

    The sequential modeling approach highlights several important aspects of the Starbucks sales data:

    • Time Trend and Growth:
      Both the level and log models (Models 1 and 2) indicate a strong upward trend in sales. The high R Square values (0.9595 and 0.9453) and significant time coefficients underscore that time is the primary driver of sales increase.
    • Log Transformation Insights:
      Model 2’s log transformation provides a clear interpretation in terms of growth percentages. This is valuable for assessing relative changes over time.
    • Short-Run Dynamics:
      Model 3, which examines the first difference (Diff1), suggests that deviations from the long-run trend trigger adjustments. Even though the model’s explanatory power is lower, the significant negative coefficient signals possible mean-reversion behavior in the short-term dynamics of sales.
    • Lagged Effects and Multivariate Analysis:
      The multivariate model (Model 4) introduces lagged difference variables and distinguishes between two predictors. The high overall model fit (R Square = 0.8162) indicates that the inclusion of lagged factors is crucial for explaining sales variability. In particular, the highly significant positive coefficient on X Variable 2 points to its influential role, while the insignificance of X Variable 1 suggests that not every lagged factor may be relevant.
    • Diagnostic Plots:
      The placeholders for the line fit and residual plots are intended to assess the goodness-of-fit and potential model violations. For example, the Time Line Fit Plot and Time Residual Plot will help examine any patterns or autocorrelation in the error terms, while the fit and residual plots for X Variable 1 and X Variable 2 will provide further insight into the multivariate regression’s reliability.

    5. Conclusion

    The regression analyses conducted on the Starbucks sales data reveal a robust and statistically significant time trend in both the level and log-transformed models. The short-run correction model (Model 3) and the multivariate lagged model (Model 4) further enhance the understanding of the dynamics at play. Notably, the strong influence of X Variable 2 in Model 4 suggests that some lagged factors have a critical impact on sales behavior. Future work may include a deeper exploration of autocorrelation issues and the potential for seasonal adjustments, which could offer additional insights into the persistence of sales trends.

    Next Steps:

    • Insert and analyze the respective diagnostic plots ([Time Line Fit Plot], [Time Residual Plot], [X Variable 1 Residual Plot], [X Variable 1 Line Fit Plot], [X Variable 2 Residual Plot], and [X Variable 2 Line Fit Plot]).
    • Validate model assumptions such as homoscedasticity and absence of autocorrelation.
    • Consider incorporating additional explanatory variables to further refine the model.

    This case study provides a solid foundation for understanding Starbucks’ sales evolution through robust statistical modeling and paves the way for further analysis and forecasting.

© 2025 Armaan Chowdhry. All rights reserved.