In this Data Scientist Spotlight, you’re going to meet Sergey Yurgenson, the Director of Advanced Data Science Services at DataRobot. Sergey is a Kaggle Grandmaster who was named one of the top ten Kaggle data scientists in 2012. He has been with DataRobot since the early days and has a lot of insight and experience to share.
Sergey, can you tell us a bit about your background?
You know, people come to data science from all walks of life, and I started out in experimental physics. Surprisingly though, in addition to being fluent in math and statistics, it gave me some skills very relevant to data science; for example, it taught me that a theory is useful only if it can predict or describe something in reality. A data scientist works on the same principle because what we are doing is analyzing real-life data.
Later, I worked at the Harvard Medical School in the Department of Neurobiology where I learned some more helpful lessons (though different from physics): for instance, real data is very dirty. When you repeat an experiment in physics, you expect to get, more or less, the same result. In biology, every result is slightly different; reactions vary with each subject and with each situation, they are affected by multiple factors much like in data science. I think that all the discoveries I’ve made in my professional life have made me more suitable to be a data scientist.
Is that how you got into data science and joined DataRobot?
Funny you should ask: actually, I became a data scientist almost by accident. I used to participate in various online competitions and stumbled upon Kaggle one day. Through Kaggle, I met Jeremy who eventually founded DataRobot. At first, Jeremy and I competed against each other—until we did a one-eighty and combined our efforts to create a team, which turned out to be quite successful. That, in a nutshell, is how I joined DataRobot.
Can you explain what a data scientist does?
We don’t all do the same things. There are so many different fields that are combined under the umbrella label of “data science” that you could meet several people doing seemingly unrelated activities and find out they all are data scientists. For example, there are data science researchers who develop algorithms, and machine learning engineers who implement algorithms to make them run more efficiently. Someone could be an industry data scientist who is building models using libraries of algorithms. Then, there are more business-oriented data scientists who are helping businesses frame a problem in a way that can be solved by data science and machine learning.
What do you do as a data scientist at DataRobot?
I would say that I’m an applied data scientist, at least right now. My main responsibility is to work with our business clients. I help them understand where data science is best used, or how they can transform the needs of their business into a form that can be solved by data science and predictive modeling. I am the person who can build those models for them, explain the models, and help our clients see the business value of the product.
What advice do you have for people who want to pursue a career in data science?
There are a lot of resources out there, so what you do depends on the type of data science you wish to pursue and on your personality type. I, for one, like to learn by doing. So, if you are like me, you might try pursuing a data science project yourself, compete on Kaggle, or do your own coding. These are great ways to learn the tricks and tasks that you wouldn't learn from a book, and you will know with much more certainty if you actually like data science.
You’ve seen data science grow and change over the years. Where do you see it going in the future?
It’s very hard to predict, but I see a lot of differentiation happening in the field, so we probably won’t be growing the “unicorn of data science”: some single wonder breed of data scientist that wears every hat. Instead, I think, there will be further specialization moving forward, and maybe even different names for those specialists within the field.
How can today’s businesses become AI-driven?
Well, that’s a very popular question these days, yet the answer is not always as popular. Everyone wants to be AI-driven, but it requires a high level of commitment. It’s not the kind of project you can simply try out to see if it will work, and if it doesn’t work, try again at a later time. If you’re going for it, you have to go all in, and if at first it doesn’t work, you have to push and push and keep trying. There absolutely will be obstacles to overcome, so to become AI-driven, you will need some “extra drive”.
How does automation impact the work of data scientists?
Again, it depends on what type of data science you’re in and on the technical advances you’re working with. Let’s start with an example: Remember when spreadsheets were first introduced? Most accountants got very apprehensive, even scared for their jobs, because they thought accounting could now be done without knowing any math or writing or organizing anything; any high schooler could type in the numbers and do an accountant’s work. But, the profession of accounting did not disappear, did it? It became less about doing math and more about knowing the business side of the field.
So, if you are an applied data scientist who thinks your work is entirely limited to building models, you are putting yourself in a dangerous position. Data science is more than this. It also involves communicating with businesses and framing problems, such as the more complex problems that machines can’t solve on their own. If you are a more technically-oriented data scientist, this involves tasks like feature engineering, which usually requires a nuanced understanding of the data. Basically, automation replaces a small part of your activity, making you more productive, and lets you devote more time to the activities that require imagination and creativity.
What is the most memorable data set that you’ve worked with?
It was very early in my competing days on Kaggle. In the previous competitions, I had built my models from scratch, but this time, I used a machine learning algorithm for the first time ever, which was a neural network. The competition was to analyze cosmic images (images of galaxies and stars) with the purpose of figuring out the distribution of dark matter in the universe. It combined a lot of “firsts”: the first dark matter competition on Kaggle; one of my first competitions on Kaggle; one of the first that used machine learning approaches, data science approaches, and image analysis. I finished in second place.
Do you have a favorite DataRobot feature?
I’m one of those data scientists who try to tweak their models and I like using DataRobot to do that. I especially like advanced tuning. Even though DataRobot can build models automatically, we can still apply feature selection, blending, advanced tuning— tools that are already built into DataRobot. I like that with all the automation, I can still play with algorithms.
What do you like to do outside of DataRobot?
Among other things, I like to go places with my family. My wife and I have traveled to many countries in Europe. We’re always planning future trips, and right now Chile looks interesting. It’s just so naturally gorgeous, so we’d love to see it with our own eyes.