Time Series Regression Analysis with Chat GPT4

The following chart is one of hundreds that I use in my day-job as Chief Eurozone Economist at Pantheon Macroeconomics. It plots a normalised Z-score index of surveyed new manufacturing orders in Germany alongside year-over-year growth in factory orders, ex-major orders. It’s worthwhile spelling out the meaning of this chart in the world of economic research and forecasting. The factory orders numbers are so-called hard data, which in this case means that they’re official numbers of real activity reported by the statistical office. The PM new orders index, by contrast, is my home-cooked index of so-called soft data. Specifically, these are survey data, compiled by the likes of the EU Commission, IFO, S&P, and national statistical offices. We’re only interested in these numbers to the extent that they tell us something about the official/hard new orders data, which in turn could help us pin down trends in industrial production, exports, GDP growth, employment and so on. From simply eye-balling the chart, the two series look coincident, but note that the surveys are released ahead of the official data, so that we always have survey numbers that are one-to-two months ahead of the official data. In other words, when it comes time to forecast new orders for the month of December, we will already have survey data for that month. This should, in theory, help us to better forecast the official real new orders data.

I uploaded these data to a GPT I have created to help me analyse and forecast time series—the AS Time Series and Forecast GPT—and asked it to run some analysis and models on these data. I have included the entire output below, with the accompanying charts created by the GPT. You can see the output without charts here. My thoughts on the process and the output are summarised below

  • This is the second time that I do analysis with this dataset, and the first time it was much easier to get to the chart I wanted. As you can see in the output below, the GPT got confused about my requests to use “yy” labels on the x-axis, and it also got the order of the data—running right to left instead of left to right—wrong. I will say, though, that it is pretty easy and intuitive to get the GPT to change the layout, labels and other aspects of the chart. I am thinking it might make sense to create a separate GPT to do more tailor-made charts, leaving the time series GPT to simple plots.

  • The actual time-series analysis was very easy to run and it was simple and quick to get the GPT to run different models with different parameters such as omitted variables, a smoothed dependent variable etc. This is where this technology really empowers the analyst. I could have done all this with the help of Excel or a statistical analysis software, but nowhere near as quickly as I did here. Remember also, as I showed previously, the GPT can spit the results and estimated in a CSV file, which is very powerful.

You should read the whole output below to get a real feel for the kind of analysis you can do with GPT 4, and the dialogue you’re having with it as you go along, but if you’re just interested in the end result, here is my GPT summarising the work I asked it do.

Certainly, here's a summary of the regression analyses conducted, ordered by their coefficient of determination (R-squared):

  1. Regression with Three-Month Average of Factory Orders (Excluding 2020 and 2021):

    • Equation: factoryorders_3m_avg = 3.0669 + 9.1779 * new_orders

    • R-squared: 0.718

    • Observations: Excluding 2020 and 2021 data

  2. Regression with Factory Orders (Excluding 2020 and 2021):

    • Equation: factoryorders_ex_major_orders = 3.1455 + 9.1413 * new_orders

    • R-squared: 0.694

    • Observations: Excluding 2020 and 2021 data

  3. Regression with Three-Month Average of Factory Orders (Full Sample):

    • Equation: factoryorders_3m_avg_full = 3.0514 + 9.5042 * new_orders

    • R-squared: 0.659

    • Observations: Full dataset including 2020 and 2021

  4. Initial Regression with Factory Orders (Full Sample):

    • Equation: factoryorders_ex_major_orders = 3.1838 + 9.8213 * new_orders

    • R-squared: 0.641

    • Observations: Full dataset including 2020 and 2021

The R-squared value, which indicates the proportion of variance in the dependent variable that is predictable from the independent variable, varies across different models and data subsets. The highest R-squared value was observed when the three-month average of factory orders was used as the dependent variable, excluding the 2020 and 2021 data, suggesting a stronger explanatory power of the model under these conditions.

If you need further insights or analysis based on these regressions, feel free to ask!

Full GpT 4 output

Anonymous

Hi, this dataset contains the monthly data for my proprietary new orders index "new_orders" and year-over-year growth in German factory orders, ex-major orders, "factoryorders_ex_major_orders". The date column "date" is in ascending order, with the most recent date (30/11/2023) first. There are no observations for "factoryorders_ex_major_orders" for November 2023.

AS Time Series and Forecast GPT

The dataset contains three columns: "date", "new_orders", and "factoryorders_ex_major_orders". The dates are indeed in ascending order with the most recent date first (30/11/2023). It's noted that for November 2023, there is no observation for "factoryorders_ex_major_orders".

How would you like to proceed with this data? We can analyze the statistical properties, check for stationarity, perform a linear regression analysis, or develop a forecast model. Please specify your requirements. ​​

Anonymous

Yes, please provide summary statistics of the data and check them for stationarity.

AS Time Series and Forecast GPT

Summary Statistics

The dataset includes 271 observations for "new_orders" and 270 observations for "factoryorders_ex_major_orders". Here's a summary of these variables:

New Orders (new_orders):

  • Mean: -0.16

  • Standard Deviation: 0.96

  • Minimum: -3.40

  • 25th Percentile: -0.72

  • Median: -0.09

  • 75th Percentile: 0.54

  • Maximum: 1.97

Factory Orders Ex Major (factoryorders_ex_major_orders):

  • Mean: 1.67

  • Standard Deviation: 11.71

  • Minimum: -38.25

  • 25th Percentile: -3.26

  • Median: 1.64

  • 75th Percentile: 6.30

  • Maximum: 79.25

Stationarity Check

Stationarity of the series was assessed using the Augmented Dickey-Fuller (ADF) test:

New Orders (new_orders):

  • Test Statistic: -5.19

  • p-value: 0.000009 (Significantly low, suggesting stationarity)

  • Critical Values:

    • 1%: -3.46

    • 5%: -2.87

    • 10%: -2.57

Factory Orders Ex Major (factoryorders_ex_major_orders):

  • Test Statistic: -4.55

  • p-value: 0.00016 (Significantly low, suggesting stationarity)

  • Critical Values:

    • 1%: -3.46

    • 5%: -2.87

    • 10%: -2.57

Both series appear to be stationary as the test statistics are lower than the critical values and the p-values are significantly small. This indicates that the mean, variance, and covariance of the series do not change over time.

Would you like to proceed with further analysis such as regression or forecasting? ​​

Anonymous

Can you plot the data in the same chart with "new_orders" on the left-hand y-axis and "factoryorders_ex_major_orders" on the right hand y-axis. Can you set the left-hand side axis to +3.0 to -3.5 and the right-hand side axis to +18 to -15. Can you format the x-axis in "yy" format.

AS Time Series and Forecast GPT

The chart above visualizes the "new_orders" and "factoryorders_ex_major_orders" over time. The left-hand y-axis, representing "new_orders", is scaled from -3.5 to 3.0, and the right-hand y-axis, for "factoryorders_ex_major_orders", ranges from -15 to 18. The x-axis is formatted in the "yy" format to display the years.

This visualization helps in comparing the trends and patterns between these two datasets over the same time period. If there's any specific aspect you'd like to explore further or any other analysis you need, feel free to let me know! ​​

Anonymous

Can you re-do this chart with monthly observations on the x-axis

AS Time Series and Forecast GPT

The updated chart now displays the "new_orders" and "factoryorders_ex_major_orders" with monthly observations on the x-axis, formatted as 'yy-mm'. The left-hand y-axis corresponds to "new_orders", scaled from -3.5 to 3.0, and the right-hand y-axis represents "factoryorders_ex_major_orders", ranging from -15 to 18.

To enhance readability, the number of x-axis labels has been limited and rotated for clarity. This layout provides a more detailed view of the monthly variations in both datasets. If further analysis or adjustments are needed, please let me know! ​​

Anonymous

Thanks can you run the x-axis in the opposite direction (oldest to new from left to right) and re-formal the labels on the x-axis to "yy".

AS Time Series and Forecast GPT

The chart has been updated to display the data with the x-axis running from the oldest to the newest data (from left to right). The x-axis now only shows the year in the 'yy' format. On this chart, "new_orders" is represented on the left-hand y-axis (scaled from -3.5 to 3.0), and "factoryorders_ex_major_orders" is on the right-hand y-axis (ranging from -15 to 18).

The number of labels on the x-axis has been limited, and they are rotated for better readability. This arrangement offers a clear chronological view of the data trends. If you have any more requests or need further analysis, feel free to ask! ​​

Anonymous

Thanks can you re-plot this chart with monthly observations.

AS Time Series and Forecast GPT

The chart is now updated to display the "new_orders" and "factoryorders_ex_major_orders" with monthly observations, running from the oldest (left) to the newest (right). The x-axis shows the dates in 'yy-mm' format.

The left-hand y-axis corresponds to "new_orders" (scaled from -3.5 to 3.0), and the right-hand y-axis represents "factoryorders_ex_major_orders" (ranging from -15 to 18). The labels on the x-axis have been limited in number and rotated to enhance readability.

This layout provides a detailed month-by-month view of the data, in chronological order. If you need any further adjustments or additional analysis, please let me know. ​​

Anonymous

I would like to run regression with "factoryorders_ex_major_orders" as the dependent y variable and "new_orders" as the independent x variable. Can you do that. Can you output the estimated equation as well as a scatter plot.

AS Time Series and Forecast GPT

Regression Analysis

The linear regression analysis was conducted with "factoryorders_ex_major_orders" as the dependent variable and "new_orders" as the independent variable. The estimated regression equation is:

\text{factoryorders_ex_major_orders} = 3.1838 + 9.8213 \times \text{new_orders}

This equation suggests that for every unit increase in "new_orders", "factoryorders_ex_major_orders" increases by approximately 9.8213 units, holding all else constant.

Regression Summary

  • R-squared: 0.641 - This indicates that approximately 64.1% of the variability in "factoryorders_ex_major_orders" is explained by "new_orders".

  • F-statistic: 477.5 - This is significant, suggesting that the model is statistically significant.

  • p-value: Extremely low (<< 0.05), indicating that the relationship between "new_orders" and "factoryorders_ex_major_orders" is statistically significant.

  • Coefficients: Both the intercept and the coefficient for "new_orders" are significant.

Scatter Plot with Regression Line

The scatter plot visualizes the relationship between "new_orders" and "factoryorders_ex_major_orders". The red line represents the regression line, which captures the linear relationship between these two variables.

This analysis provides a robust statistical understanding of how changes in "new_orders" are associated with changes in "factoryorders_ex_major_orders". If you require further analysis or a different approach, please let me know. ​​

Anonymous

Can you plot "factoryorders_ex_major_orders" in a chart alongside its forecast?

AS Time Series and Forecast GPT

To plot "factoryorders_ex_major_orders" alongside its forecast, I will first need to create a forecast model for it. This could be done using various methods, such as a linear regression model, ARIMA, or another suitable time series forecasting model.

Could you specify the following details to proceed with the forecasting:

  1. The forecasting method you prefer (e.g., linear regression, ARIMA, etc.).

  2. The time horizon for the forecast (e.g., how many months ahead you want to forecast).

  3. Any specific model parameters you want to use, if you have a preference.

Once I have this information, I can generate the forecast and then plot both the actual and forecasted values for "factoryorders_ex_major_orders".

Anonymous

I meant, can you plot "factoryorders_ex_major_orders" in the chart as the forecast you just generated with "new_orders" ?

AS Time Series and Forecast GPT

The chart above displays the actual values of "factoryorders_ex_major_orders" alongside the forecasted values derived from the regression model based on "new_orders". The blue line represents the actual data, and the red dashed line represents the forecasted values.

This visualization allows you to compare how the forecasted values align with the actual historical data, providing insight into the effectiveness of the regression model in capturing the relationship between "new_orders" and "factoryorders_ex_major_orders".

If there are any specific adjustments or further analyses you need, please let me know. ​​

Anonymous

Can you re-run the regression above, omitting outliers during Covid. Specifically, can you re-run the regression omitting values of 2020 and 2021, and see whether coefficient of determination increases?

AS Time Series and Forecast GPT

The regression analysis was re-run for "factoryorders_ex_major_orders" against "new_orders", omitting the data from 2020 and 2021 to exclude the Covid period outliers. The updated regression equation is:

\text{factoryorders_ex_major_orders} = 3.1455 + 9.1413 \times \text{new_orders}

Comparison with Previous Model:

  • Original Model R-squared: 0.641

  • New Model R-squared (without Covid data): 0.694

The coefficient of determination (R-squared) has increased from 0.641 to 0.694, indicating an improvement in the model's explanatory power when the Covid period is excluded. This suggests that the relationship between "new_orders" and "factoryorders_ex_major_orders" is better captured without the outlier effects of the Covid years.

New Regression Summary:

  • F-statistic: 554.2, indicating the model is statistically significant.

  • p-value: Extremely low, confirming the significance of the relationship.

  • Coefficients: Both the intercept and the slope are significant.

This analysis shows that excluding the outlier years (2020 and 2021) enhances the model's ability to explain the variance in "factoryorders_ex_major_orders". If you need further analysis or a different approach, please let me know. ​​

Anonymous

Can you re-create a new series which is the three-month average of "factoryorders_ex_major_orders" and re-run the regression above, excluding observations in 2020 and 2021. Can you plot the new estimated forecast alongside "factoryorders_ex_major_orders"

AS Time Series and Forecast GPT

The regression analysis was re-run for a three-month average of "factoryorders_ex_major_orders" against "new_orders", excluding the data from 2020 and 2021. The updated regression equation for this analysis is:

\text{factoryorders_3m_avg} = 3.0669 + 9.1779 \times \text{new_orders}

Regression Summary for 3-Month Average:

  • R-squared: 0.718, indicating that about 71.8% of the variability in the three-month average of "factoryorders_ex_major_orders" is explained by "new_orders".

  • F-statistic: 617.4, which is significant, suggesting that the model is statistically robust.

  • p-value: Extremely low, indicating that the relationship between "new_orders" and the three-month average of "factoryorders_ex_major_orders" is statistically significant.

  • Coefficients: Both the intercept and the slope are significant.

Chart: Actual vs Forecasted Values

The chart above displays the actual values of the three-month average of "factoryorders_ex_major_orders" alongside the forecasted values derived from the regression model based on "new_orders". The blue line represents the actual data, while the red dashed line represents the forecasted values.

This visualization shows how the forecasted values align with the actual data, providing insights into the model's performance in capturing the relationship between "new_orders" and the three-month average of "factoryorders_ex_major_orders".

If there are any further adjustments, analyses, or different approaches you would like to explore, please let me know. ​​

Anonymous

Can you re-run this regression with the full sample, this is to say including the 2020 and 2021 values, and plot the results in a chart as above.

AS Time Series and Forecast GPT

The regression analysis was re-run for the three-month average of "factoryorders_ex_major_orders" against "new_orders" for the full dataset, including the values from 2020 and 2021. The updated regression equation for this full sample analysis is:

\text{factoryorders_3m_avg_full} = 3.0514 + 9.5042 \times \text{new_orders}

Regression Summary for 3-Month Average (Full Sample):

  • R-squared: 0.659, indicating that about 65.9% of the variability in the three-month average of "factoryorders_ex_major_orders" is explained by "new_orders".

  • F-statistic: 513.9, which is significant, suggesting that the model is statistically robust.

  • p-value: Extremely low, indicating that the relationship between "new_orders" and the three-month average of "factoryorders_ex_major_orders" is statistically significant.

  • Coefficients: Both the intercept and the slope are significant.

Chart: Actual vs Forecasted Values (Full Sample)

The chart above displays the actual values of the three-month average of "factoryorders_ex_major_orders" alongside the forecasted values derived from the regression model based on "new_orders" for the full dataset. The blue line represents the actual data, while the red dashed line represents the forecasted values.

This visualization shows how the forecasted values for the full sample, including the outlier years of 2020 and 2021, align with the actual data, providing insights into the model's performance in capturing the relationship between "new_orders" and the three-month average of "factoryorders_ex_major_orders".

If there are any further adjustments, analyses, or different approaches you would like to explore, please let me know. ​​

Anonymous

Can you summarize all the regressions you've run above, ordering them by coefficient of determination?

AS Time Series and Forecast GPT

Certainly, here's a summary of the regression analyses conducted, ordered by their coefficient of determination (R-squared):

  1. Regression with Three-Month Average of Factory Orders (Excluding 2020 and 2021):

    • Equation: factoryorders_3m_avg = 3.0669 + 9.1779 * new_orders

    • R-squared: 0.718

    • Observations: Excluding 2020 and 2021 data

  2. Regression with Factory Orders (Excluding 2020 and 2021):

    • Equation: factoryorders_ex_major_orders = 3.1455 + 9.1413 * new_orders

    • R-squared: 0.694

    • Observations: Excluding 2020 and 2021 data

  3. Regression with Three-Month Average of Factory Orders (Full Sample):

    • Equation: factoryorders_3m_avg_full = 3.0514 + 9.5042 * new_orders

    • R-squared: 0.659

    • Observations: Full dataset including 2020 and 2021

  4. Initial Regression with Factory Orders (Full Sample):

    • Equation: factoryorders_ex_major_orders = 3.1838 + 9.8213 * new_orders

    • R-squared: 0.641

    • Observations: Full dataset including 2020 and 2021

The R-squared value, which indicates the proportion of variance in the dependent variable that is predictable from the independent variable, varies across different models and data subsets. The highest R-squared value was observed when the three-month average of factory orders was used as the dependent variable, excluding the 2020 and 2021 data, suggesting a stronger explanatory power of the model under these conditions.

If you need further insights or analysis based on these regressions, feel free to ask!