This is part 1 of a series of blog posts detailing the development of the Relationships by DataRobot web app.
A colleague of mine did a talk not too long ago about a Stanford study called How Couples Meet and Stay Together. The study was done by Michael Rosenfeld, a sociology professor who studies, among many other things, how the Internet impacts society. The study itself tracks several thousand couples over the course of several years. Most of the work (both academic and popular) using this data is related to what kinds of people meet and how they meet; e.g., interracial couples, same-sex couples, online dating, etc.
We were interested in identifying the characteristics of couples that tend to be predictive of long-lasting relationships; predicting how likely a given couple is to stay together; and most importantly, building a simple quiz application that any couple could use to determine how long their own relationship might last.
Why did we do this?
First, people need to know more about machine learning. You can’t read the news these days without stumbling across an article bemoaning the supposed rise of the machines -- how all of our jobs are going to be consumed by robot overlords and how this will result in the collapse of modern society. Selfishly, if people knew more about machine learning, then my Facebook feed would be far less irritating.
Second, machine learning is coming to the masses. Instead of spending years in school and even more years in practical training, modern tools like DataRobot can bring machine learning into reach for everyone, with much less training required. While we used Python for the data preparation in this case, a tool like Alteryx, Trifacta, or Talend could have reduced the coding burden for that bit as well.
Finally, I’ve been married for 17 years (as of today, as a matter of fact), and I’m intensely interested in relationships and what makes them last. I learned so much throughout this process, and I’m excited to share our findings with people.
What is and isn't?
What it is
- Fun. Share your score and see how your friends stack up. Don’t take this too seriously.
- Based on solid science. Out-of-sample validation; systematically tuned machine learning models; and dozens of benchmarks with competing approaches.
- Accessible to aspiring data scientists. We used DataRobot to do 100% of the modeling work -- no coding required.
- Open source -- we’ll be making all the code available to anyone who wants to look at it.
What it isn’t
- Relationship advice. We’re not relationship experts and we’re not trying to give you advice about yours. The factors that we considered may or may not be causal. We’re just using the correlation to make a prediction.
- An exhaustive study of relationships -- we’re not social scientists.
- Perfectly descriptive of your relationship. The prediction that we make is just a prediction based on data that we didn’t collect.
- A complex data science project. We didn’t set out to extend the frontiers of data science here. The dataset is small, and the data science is relatively straightforward.
So here it is!
- Introduction and Background to Relationships by DataRobot
- Preparing the Data for Relationships by DataRobot
- Building the Models for Relationships by DataRobot
- An Inside Look at the Design Process for Relationships by DataRobot