Artificial Intelligence Confronts a 'Reproducibility' Crisis

Sometimes, basic information is missing because it’s proprietary—an issue especially for industry labs.

getting the best results often involves tuning thousands of little knobs, what Dodge calls a form of “black magic.” Picking the best model often requires a large number of experiments.

The vast computational requirements—millions of experiments running on thousands of devices over days—combined with unavailable code, made the system “very difficult, if not impossible, to reproduce, study, improve upon, and extend,” they wrote in a paper published in May. (The Facebook team ultimately succeeded.) The AI2 research proposes a solution to that problem.

You can still report the best model you obtained after, say, 100 experiments—the result that might be declared “state of the art”—but you also would report the range of performance you would expect if you only had the budget to try it 10 times, or just once.

Instead, the idea is to offer a road map to reach the same conclusions as the original research, especially when that involves deciding which machine-learning system is best for a particular task.

A side benefit, he adds, is that the approach could encourage greener research, given that training large models can require as much energy as the lifetime emissions of a car.

