In release 5.2, DataRobot brings the next-generation of automated feature engineering capabilities to its flagship Automated Machine Learning and Automated Time Series products.
The difference between the models of legend and those of mediocrity lies in one place: feature engineering. Feature engineering is the critical step on the path to value and often determines the success or failure of an AI project. However, enterprises adopting machine learning often experience significant challenges going from raw data to a deployable model. This is for a number of reasons:
- Finding the Data: Data is rarely in one place. It’s usually spread out across multiple source systems and tables and requires integration before it can even be used.
- Feature Engineering: Transformations are often needed to make features useful for models. This “feature engineering” often requires engineering skills and significant trial and error to get right.
- Domain Expertise: Beyond engineering skills, a keen understanding of the business and data are also needed. For example, in medical use cases, domain expertise is needed to know that two alternatively named pharmaceuticals are actually the same drug.
Simply put, feature engineering is a labor-intensive, human-driven process that is extremely time-consuming and prone to error. Given its importance, a different approach other than manual feature engineering is needed.
DataRobot’s Next-Generation Automated Feature Engineering
Current approaches to feature engineering can be greatly accelerated using AI and automation. By allocating the arduous and repetitive tasks to machines, humans can focus on managing the process, intervening when business understanding needs incorporating.
At DataRobot we have always believed in automation first, which is why we’ve invested so much in Automated Feature Engineering throughout the DataRobot Enterprise AI Platform. These capabilities take the form of:
- Exploratory data analysis to prepare basic features from raw data.
- Specialized automated feature engineering and reduction for time series data.
- DataRobot blueprints that optimize features for the unique requirements of each and every algorithm in our library.
As confirmed by our customers, this combination of approaches achieves incredible results for every individual model evaluated, but we’re not stopping there.
Beginning over two years ago, we started looking at revolutionary ways to inject an even higher level of automation into Feature Engineering. After exploring the best products in the market, we realized our vision could only be realized by building something in-house. With that, we formed a new team, led by our Head of R&D in Singapore, Kenny Chua, focused entirely on building the next generation of Automated Feature Engineering.
The first wave of results from this investment now emerges in the latest 5.2 release of DataRobot. In our latest version, we have automated the discovery and extraction of explanatory features from multiple related datasets. We are calling this new capability, Feature Discovery. DataRobot Feature Discovery will allow you to build better machine learning models in less time without first needing to integrate data from multiple disparate sources.
Feature Discovery is All About Relationships
The introduction of AI Catalog enables users to access a shared catalog of data assets from a wide variety of source systems and locations. Users can now access all the datasets they need for their projects in one place. Not only can they access all of this data but they can also use their domain expertise to inform DataRobot of known relationships between the datasets.
Above: How DataRobot incorporates large numbers of related datasets into a single AI project.
DataRobot Feature Discovery uses these, often complex, relationships to intelligently generate large numbers of new and useful features from all the datasets in play. This gives each and every model tested a much broader set of relevant features upon which to base predictions. And the results we’ve seen are staggering.
We Beat Other Tools on Accuracy and Performance
Throughout our journey, we’ve constantly measured the performance of our engineered features against other products in the market. We recently compared our capabilities with Feature tools from Feature Labs. DataRobot has better accuracy, is 85% faster, and uses less than half the number of generated features for a variety of different models. This is because DataRobot intelligently focuses on creating the right features for each and every model evaluated, versus a 'one size fits all' approach that creates large numbers of valueless and redundant features. In addition, DataRobot Feature Discovery is able to operate on datasets that have complex, multi-key or manny-to-many relationships to support a wider variety of real-world use cases. We strongly believe that DataRobot's Feature Discovery, and all of its additional Automated Feature Engineering capabilities, will be the new bar to beat in the market.
DataRobot’s next-generation Automated Feature Engineering with Feature Discovery provides AI-accelerated transformation of data into machine learning assets. This allows you to build better machine learning models in less time and increase the pace of innovation with AI. If you’re as excited as we are and want to try out our next-generation of automated feature engineering capabilities, reach out for a demo today.
About the Author:
Richard Tomlinson is a Director in Product Marketing at DataRobot where he works closely with product, marketing, and sales teams to drive adoption and enablement of data management and data engineering capabilities in the DataRobot AI platform. Richard has been working in the data warehouse, BI and analytics space for over 20 years with the last eight years focused on Hadoop and cloud platforms. He is based in Chicago but is originally from the UK and has a degree in statistics from the London School of Economics.