Can AI beat your analyst? It depends

Finance researchers delve under the hood of machine learning models to find out when they beat market pros, and when they don’t
Illustration of a retro fortune telling robot with business people looking under hood to see how the predictions are made. Can a machine learning analyst outperform a human? Research shows that its depends on how the program's built and what companies you're focused on.

In the race to integrate artificial intelligence into investment banking, analysts may see machine learning as a “magic bullet” for improving trades, but research has found that how the bullet’s built makes a huge difference.

In “Expectations Matter: When (not) to Use Machine Learning Earnings Forecasts,” University of Georgia finance researchers Zhongjin (Gene) Lu and John L. Campbell evaluated more than 3,000 machine learning model configurations to determine when, why, and for whom machine forecasts beat analysts.

Lu, Campbell and their co-authors — Harrison Ham at Clemson University and Katherine Wood at Oklahoma State University — published their findings this February in Management Science.

The team tackled the question of what makes a “good” machine analyst after noticing how many different models were being used in “man versus machine” finance research.

“When I dug into the debate about whether analysts or machines or the analysts teamed with machines are better, it seemed that people were testing different machine learning models when they made their arguments,” said Lu, an associate professor of finance in UGA’s Terry College of Business. “There is no standard machine learning model that they’re testing. Depending on what they advocate, they are using different machine learning models.”

There are dozens of variables that can be tweaked to build a machine-learning bot that analyzes firm performance and makes earnings predictions, Lu said. The team tested all 3,024 combinations against historical predictions made by analysts to see which combination of variables made the most accurate prediction bots.

“We essentially spent five years of computing time to go through and train all these models,” Lu said, describing the strain the project put on UGA’s Georgia Advanced Computing Resource Center. “Sometimes we actually maxed out their computing resources. That’s pretty amazing.”

Lu and Campbell relied on external data-processing capacity leased by the college to complete the project. They found that 80% of the machine learning variable combinations failed to beat human analysts when forecasting earnings.

Three variables — a mean absolute error loss function, a time series cross-validation, and using an indirect approach that corrects existing forecasts rather than generating them from whole cloth — controlled how accurate a machine learning model was when making earnings predictions. Other variables could be tweaked to cut down on the computing time, Lu added.

The main strength of the 20% of machine-learning models that did beat human analysts’ predictions is they corrected for intrinsic human biases, such as optimism or pessimism, and the influence of past failures and successes.

“We didn’t see this success when the model just generated an earnings prediction,” Lu said. “If you don’t feed the model an existing analyst’s forecast, then the machine does not perform as well as a human. But if you allow the machine to serve as a correction mechanism — to pick up the biases in the forecast and then correct it on the margins — you can see improvement.”

The team found the most improvement when the models were predicting earnings at small firms that don’t receive much industry attention or coverage in the financial press.

“For large firms, the amount the model outperformed human analysts was, depending on your perspective, modest,” Lu said. “Some people might even say it was small — meaning that analysts are doing a pretty good job.”

But with smaller firms, human analysts are less likely to pick up information and cues from fellow analysts and professional networks.

“If you are the only analyst covering a firm, your predictions are not being peer reviewed. You’re not able to compare your work to other analysts’ predictions,” Lu said. “Versus, if you work on Starbucks, for instance, there are 20 other analysts doing the same thing. You learn from each other and fact-check each other. That process is generally going to lead to more accurate forecasts.”

Being one of only a few analysts working allows an analyst’s biases to take greater hold in their predictions. More bias means more room for improvement by machines, Lu said.

As research into the ways machine-learning models can help improve market predictions continues, Lu is hopeful that the work he and his team performed will give researchers a baseline model they can tweak and test. He and his co-authors included their code in the paper, so it’s available for other academics to use.

“What was lacking in the literature was a scientific, systematic evaluation of the impact of these choices on the models’ performance,” Lu said. “We’re making this available so everyone can use it in the future and can build on it. That’s what we are hoping for, that we provided a benchmark model that can move the literature forward.”