Access to water is a fundamental human right, and it is one of UNICEF’s global sustainable development goals (SDG #6). Around the world, nearly one billion people (mostly in Africa and Asia) rely on rural water points, such as hand pumps or taps, for their daily water use. These water points are a big part of the community and an essential factor for life. Unfortunately, after about three years of service, these water points tend to break. In fact, it’s estimated that at any given time, roughly 25% of the world’s water points are not functioning.In addition to health problems, the lack of access to drinkable water has huge negative impacts on other aspects of life. Without functioning water points, people are forced to walk long distances, over 30 minutes to an hour, to wait in line for their daily water supply and then carry it all the way back to their homes. This task usually falls to women, which has large impacts on gender equity and education, as this task can occupy a substantial part of their day.
The Data-Driven Solution
DataRobot’s customer, the Global Water Challenge, wanted to understand why these breaks were occurring, so they began gathering data for the first time. Although there had been massive investments in water point construction, no one had a complete picture of water point functionality. Data was scattered across multiple sources, even within one country, and generally collected in different formats and mediums.
Enter Brian Banks, Director of Strategic Initiatives at the Global Water Challenge. Brian wanted to harness the data that existed in a holistic way that was useful. But, what data is the right data to collect? Brian spent nearly two years traveling around the world asking experts this question: “How do we create a data standard for water points?”
Out of these conversations, the team built what is now known as The Water Point Data Exchange (WPDx), the first harmonized database of water points from around the world. WPDx allows countries and organizations to share their water data, resulting in a database that grew from tens of thousands of data points to over half a million today.
Consolidating the data was a huge task, and once complete, begged the question: What do we do with it? Brian is not a data scientist but knew there were useful insights in the data that were beyond simple dashboarding. Brian tried all of the ‘data for good’ routes available to him: free consulting, cloud resources, and even working for months to set up a hackathon. Some results were interesting, but none really had the impact he was hoping for, and in all cases (since Brian couldn’t code), he couldn’t work with the code-based products they left him.
When Brian started working with DataRobot, things changed. In a few hours, Brian was able to upload his data from WPDx and build a model to answer some of the important questions he’d been looking for, such as, “Can we predict which water point will be broken in the future?” In an afternoon, he was able to accomplish on his own what other groups had attempted to do over the course of a year.
Working with DataRobot, Brian built models for 13 countries and began integrating these predictions in to a web app that maps out which water points are working (or not working) along with meta-data around the type of water points, the water source, location, repair priority, and (crucially) which water points are likely to be broken in the future.
What Comes Next?
The models Brian built with DataRobot are some of the first, if not the first, large scale uses of machine learning on water point data. The initial response by governments has been overwhelmingly positive as these tools help focus resources in resource-constrained environments. Now we’re exploring other areas where DataRobot can help address issues with distributed infrastructure and how we can actively work with key stakeholders on the ground to improve the data coming into the tool.
We are continuing to work with local governments on the ground to train users on the new tools, sharing how machine learning can help them in their daily lives, and collecting constant feedback to ensure the data is useful. We’ve seen an enormous response from the first six month pilot in Sierra Leone. Today, they are using the output of the platform to inform the planning process for repairs, maintenance, and new construction of water points, impacting nearly two million citizens across the country. And we are in the process of expanding this program to other countries as well!
We’re very excited about the work that addresses the important issue of access to water and are honored to be a part of this project with Brian and the Global Water Challenge. We have big dreams for how we can leverage automated machine learning to solve the world’s biggest challenges.
About the Author:
Chandler McCann is a Senior Data Scientist at DataRobot, where he leads the federal data science practice, as well as the AI for Good: Powered by DataRobot program. Chandler has over 15 years of experience in analytics and data science. He received his Masters in Information and Data Science from UC Berkeley and his undergraduate in Materials Science Engineering from the University of Maryland. With GWC, Chandler has worked closely with the governments of Liberia and Sierra Leone to improve access to water and has a passion for leveraging AI for societies’ toughest challenges.