Machine learning is being used to help solve development problems with promising results, say researchers who have produced a roadmap to guide future projects against common pitfalls.
Increasingly popular in rich countries, machine learning is a type of artificial intelligence (AI) in which computers learn — without being explicitly programmed — by finding statistical associations within vast quantities of data. But using it to solve development problems has been more difficult.
Despite a number of attempts to apply it to tackle poverty, famine or displacement, “we have yet to see successful stories of machine learning truly advancing development”, according to Maria De-Arteaga, of the Machine Learning and Public Policy program at Carnegie Mellon University in Pittsburgh Pennsylvania in the United States.
De-Artega spoke at the UNESCO conference Tech4Dev in Switzerland last week (27-29 June), where other researchers presented some projects that show its promise.
Stefano Ermon, Assistant Professor in the Department of Computer Science at Stanford University in the US state of California, told the meeting that he believes machine learning could solve the “data drought” in developing countries.
“We are getting access to a lot of new data streams from satellites, from social media, from phones,” he said. “They clearly contain a lot of information about the outcomes we care about … they have largely not been used because they are massive and they are unstructured.
“But AI is really booming. There has been a lot of progress in terms of developing new methods to make sense of this data.”
Machine learning leaves it to the computer to work out a formula that describes the structure of a mass of data. One form — supervised machine learning — works by feeding datasets such as weather and climate data into the computer along with output data, such as figures on crop yield. By examining the millions of possible links between these data the computer figures out how they are connected. From then on it can, in theory, predict future crop yield from new weather and climate data.
Ermon described his team’s success predicting crop productivity in the United States in the months before harvest: their accuracy was comparable to the US Department of Agriculture’s on-the-ground surveys. The model has now been used in Argentina, Brazil and India, he said, and is being extended to Africa.
But he admitted there were problems adapting to smaller field sizes and poorer ground data with which to train the system.
Ermon’s team also developed a tool that matches survey data with features within satellite images of 30 African countries. It can now predict some indices of development from these data, such as access to electricity, piped water and sewage.
“We are transitioning from an active information search to a supervised information extraction”
“For a few crude measures of infrastructure quality this model seems to work pretty well,” he said.
Wesley van der Heijden, a Masters student at Tilburg University in The Netherlands, described attempts to predict hunger in Ethiopia without the need for costly surveys. The computer worked out correlations between satellite, economic and demographic data, on the one hand, and hunger predictions from the Famine Early Warning System, FEWS NET, on the other. The results worked for predicting urgent hunger situations but not for less serious ones, he said.
And Leonardo Milano, a senior data scientist at the Geneva-based Internal Displacement Monitoring Centre, said its machine learning system has learned to extract information about internally displaced people from 5,000 media reports daily. This produces smaller, manageable quantities of information that humans can analyse.
“We are transitioning from an active information search to a supervised information extraction,” he told the meeting.
De-Arteaga was more cautious, warning of the dangers of adopting a “trickle down” approach in which a successful Machine Learning project from the developed world is repurposed for poorer countries.
“What happens as a result [of this approach] is that the tools we have encode the infrastructure and context of developed regions,” she said, and so such data could end up being biased or inappropriate in poorer countries.
Thinking more carefully about local problems that it will be useful for is also important, according to De-Artega. “While this might sound very obvious, it is very often missing from ML4D projects,” she said.
De-Artega is working with colleagues to collate research on what they call a new research area, Machine Learning for Development (ML4D). She told the conference that the roadmap, to be published in a peer-reviewed journal, defines what it is and who should be involved, outlines how it should be approached and identifies its research challenges, such as data reliability. One aim is to avoid repeating mistakes made in other development fields, she said.