December 23 - Chat GPT 4 in action, on Eurozone inflation

This year’s advent calendar project was primarily done on ChatGPT 3.5 and DALL E2. I only recently acquired access to GPT 4.0, which is infinitely more powerful, especially in its ability to respond to real-time events—it is trained on current data—to create more detailed and realistic pictures and, as I will show here, to analyse data. So, I thought that I would pivot on this entry and show an example of some empirical and statistical economic analysis I’ve done with GPT 4.0.

The first step was to create my own GPT optimised for time series analysis done on data uploaded in excel or CSV format. This didn’t take long, though it is not clear to me that my little GPT is any better than the generic "Data analysis” GPT, but over time, it should become better at the task that I am interested in, in theory.

For the second step, I uploaded an excel file with the month-to-month change in the Eurozone HICP from March 1999 to November 2023. Specifically, this is the month-to-month change in the non-seasonally adjusted euro area inflation index. Note that inflation data in the Eurozone are, as a rule, un-adjusted.

My first prompt was to describe the data and ask it to generate a forecast model.

“Hi, the data in column "hicpmm" is the month-to-month change in Eurozone HICP prices. Can you create a statistical forecast model for these data?”

Given that I have asked this GPT to be a time series model, it immediately set out to chew through the data using the tools of the trade; checking for stationarity, de-composing the data into trend, seasonality and residuals, and proposing a forecast model.

The GPT quickly identified that the series is stationary in statistical terms, reporting the result of an ADF test, and it also correctly identified that these data have a strong seasonal component, noting that:

Given the stationary nature of the series and the insights from the decomposition, we can consider models like ARIMA (AutoRegressive Integrated Moving Average) or its variations. ARIMA is suitable for non-seasonal data, while SARIMA (Seasonal ARIMA) is an extension that handles seasonality.

Given that the data shows some level of seasonality, a SARIMA model could be a good choice. SARIMA models are specified with the parameters (p, d, q)(P, D, Q)s, where:

p and seasonal P are the autoregressive terms.
d and seasonal D are the differencing orders.
q and seasonal Q are the moving average terms.
s is the seasonality period.

The GPT then suggested the following;

Determine the optimal parameters for the SARIMA model.
Fit the model to the data.
Validate the model, for example, using time series cross-validation.
Forecast future values.

I asked it to proceed, and it ran into trouble, basically outputting an error in its computation. I am not sure here whether this is because my particular GPT wasn’t able to access the computing power needed to complete the task at that time, or whether it was analysing itself into too many iterations, an infinity loop or a combination of the two.

I tried to help by prompting it to show me the equation for the SARIMA model that it had estimated, but it then proceeded to just give me theoretical representation of the model. I then decided to become more concrete. I asked:

“What are the statistically significant autoregressive terms in this model between 1 and 12?”

To which the GPT finally managed to progress;

The statistically significant autoregressive (AR) terms in the model, among the first 12 lags, are as follows:

AR1:
- Coefficient: 0.87770
- P-Value: 6.71×10^-88

The AR1 term is statistically significant, as indicated by its very low p-value, far below the common significance level of 0.05. This term suggests a strong influence of the previous month's value on the current month's HICP change. Other AR terms between 2 and 12 were either not included in the model or not statistically significant.

I then asked:

“If you omit AR1, what autoregressive terms between 2 and 12 are statistically significant?”

GPT confirmed what I suspected, namely that this month-to-month series is basically an AR(1) process.

After omitting the first autoregressive term (AR1) and re-fitting the SARIMA model, there are no statistically significant autoregressive terms between lags 2 and 12. This indicates that, without considering the immediate previous month (AR1), none of the subsequent months (from the second to the twelfth lag) have a statistically significant impact on the current month's HICP change in the adjusted model.

I then returned to the seasonality of the data, asking the GPT;

“The data "hicpmm" is non-seasonally adjusted. Can you seasonally adjust it, and show the results in a chart?”

Interestingly, the GPT had to go for the second-best solution;

To seasonally adjust the "hicpmm" data, we can employ a method such as the X-13ARIMA-SEATS, provided by the U.S. Census Bureau, or use simpler methods like moving averages or seasonal differencing. Since the X-13ARIMA-SEATS method might not be available in this environment, I'll use a simpler method. One common approach is to calculate and subtract the seasonal component obtained from the decomposition of the series.

I'll perform the following steps:

Decompose the series to extract the seasonal component.
Subtract the seasonal component from the original series to get the seasonally adjusted data.
Visualize the original and seasonally adjusted series in a chart for comparison.

I didn’t press in on whether it would be possible to run the X-13ARIMA-SEATS analysis, and if so, how. It is possible that I would have to specifically input this methodology into the GPT to allow it to run such models. The GPT produced the following chart.

To complete the analysis, I asked GPT to output the seasonally adjusted HICP in a CSV file, and it replied.

The seasonally adjusted "hicpmm" series has been saved as a CSV file. You can download it using the link below:

Download the Seasonally Adjusted HICPMM CSV File 

Summary

My initial thoughts on this, and what will undoubtedly, be further examples to come is as follows.

I could have done this analysis in Excel, or with the help of a statistical software such as Eviews or STATA. But I doubt that I would have been able to do it as quickly and as intuitively as with GPT. All I am doing here is having a conversation, asking the GPT to run different tasks, effectively like asking a research assistant to do specific work.
If you look under the hood of what the GPT above is doing, it is effectively translating my prompts into executable Python code, and it is giving me full access to the code. This is extremely powerful, and empowering, especially for analysts like me who don’t know how to code, at least not well.
It is not clear to me how much elasticity this environment has. But it seems clear to me that the way you set up the data, the nature of your prompts and how narrowly, or broadly, you define your GPTs are key performance variables for the quality of the analysis. Another way to put this is that the more prepared you are in terms of what you want the GPT to do, the better output it will produce. This seems trivial, but it is empowering for people working with this technology.
More broadly, the power of GPT 4 compared to GPT 3.5 raises some serious questions about AI, and whether this will be a democratic technology, or an exclusive one. If you have an iPhone 15, you’re not going to be significantly more capable than me with my iPhone 12. But the way AI currently is evolving, exemplified via the difference between GPT 4 and GPT 3.5 raises serious questions about the exclusivity of the technology. This is particularly the case if you look at the difference between those with relatively more computing power, than those without it, Will it be the case that rich people who can afford to pay for the best AI will simply become so much quicker and capable than everyone else? Or will we reach a level where some form of capability becomes widely available and scales more or less equally across socioeconomic classes.

Advent Calendarclaus vistesenDecember 23, 2023