DataRobot was proud to be a sponsor of the 2019 Queen City Hackathon in Charlotte, North Carolina. Although DataRobot did not officially compete in the hackathon, we had the opportunity to do a test-run of the DataRobot platform against the dataset used in the hackathon.
Within minutes of starting the competition, DataRobot was already modeling the data. And with a limited amount of cognitive effort using autopilot mode, we submitted our predictions. Our approach was highly competitive (2nd place) and on par with some of the Queen City’s best data scientists.
The 2019 Queen City Hackathon hosted by the Analytics and Big Data Society is Charlotte’s largest hackathon, dedicated to using the power of analytics to improve the lives of people in the city of Charlotte, NC.
This year’s focus was on opioid substance abuse, which has become a serious national crisis that affects public health as well as social and economic welfare. Every day, over 130 people in the United States die because of opioid-related overdoses. Meanwhile, millions more are opioid addicted. The Centers for Disease Control and Prevention estimates that the total "economic burden" of prescription opioid misuse alone in the United States is $78.5 billion a year, including the costs of healthcare, lost productivity, addiction treatment, and criminal justice involvement.
Hackathon participants were challenged to use data from the Centers for Medicare and Medicaid Services to predict whether patients would complete their treatment (binary classification) for opioid substance abuse as well as a patient’s length of stay at a treatment center (regression variable).
Within minutes of uploading the CMS dataset, DataRobot surfaced valuable exploratory insights, such as the predictive power of each of the data elements and the distribution of the target -- length of stay. Most of the patients in this study either stayed at a treatment center for longer than 30 days or less than 10 days.
Next, DataRobot automatically built dozens of models against the training dataset. The best performing model included data preparation steps such as the transformation of categorical features using ordinal encoding and also multiple feature engineering steps before feeding an XGBoost algorithm.
DataRobot was also able to generate explanations for why certain patients received their corresponding predictions for the length of stay. As seen below, the top three patients had a predicted length of stay of 39-40 days. However, the reasons for driving these predictions varied from patient to patient. In the real world, these kinds of prediction explanations, when applied to new prospective patients, could make an empirical score much more explainable to a healthcare professional, and perhaps may even impact treatment protocols for a given patient.
In summary, what DataRobot was able to do in a short amount of time was run a comprehensive set of modeling approaches, extract most of the signal from the input dataset, and provide a very legitimate benchmark for modeling. In fact, we believe in an automation-first philosophy when it comes to data science. Data scientists armed with DataRobot can build a highly accurate model in minutes to solve 75% of the challenges. With the extra productivity that DataRobot gives them, they can now focus their expertise on the 25% that requires specific domain expertise, feature engineering, and model tuning to see if additional gains are possible. The best models will often have the best engineered features and feature interactions, which come from data scientists who leverage their strong understanding of the problem. By allowing this experimentation, DataRobot allows data scientists to test out several iterations of model building, allowing teams to focus on the business objectives.
Artificial intelligence can’t act on its own. Solving a public health crisis like opioid addiction requires guidance from new, talented individuals entering the workforce, such as the hundreds of hackers who worked through the night on this important social issue. It is our view at DataRobot that automation, combined with human creativity and the ability to iterate quickly, will ultimately force-multiply impact. In the context of the opioid epidemic, this means identifying the prevention efforts and enabling healthcare, law enforcement, and judicial personnel with the most effective, evidence-based decision making.
About the Author:
Omkar Kharkar is a Customer Facing Data Scientist at DataRobot, where he is responsible for working with clients and customers to help them understand the DataRobot platform and effectively applying DataRobot for machine learning use cases. Prior to DataRobot, Omkar worked at Ernst & Young, where he helped financial institutions apply advanced analytics to solve business problems and has spent several years in the Financial Services industry.