Alexey Grigorev is a lead data scientist from OLX Group. Starting as a software engineer focused on Java, he stumbled on something that made him want to switch to data science, thus took a Masters in Business Intelligence. In this chat, I probe him about his thought process, as well as his advice for others looking to emulate his career.
Interview conducted by Eugene Yan
Eugene Yan (EY): What does your role as a lead data scientist consist of?
Alexey Grigorev (AG): My role includes too many things and involves overseeing anything related to machine learning. I work a lot with infrastructure, such as making sure that once we have a model, we can serve it to real users. I also mentor a lot of people, such as data analysts or engineers who want to get into machine learning. I help them with training their first model and then deploying it.
EY: What made you decide to switch to data science?
AG: It’s funny to talk about this, because I actually saw a course video by Andrew Ng, and thought, “ok, this is what I want to do. This led me to taking more courses, also on Coursera, and chatting with a couple of companies who were looking for data scientists.
However, everyone was telling me that I didn’t have enough education and the background wasn’t a good fit. This was how I decided that I needed to get a Master’s. Back then, I didn’t know that Business Intelligence is not really data science (laughs). But, I still had a couple of courses on machine learning, and the BI courses were also helpful.
EY: What did you have to demonstrate to secure your first role as a data scientist at SearchMetrics after your Master?
AG: I remember that interview. It was a long interview for 2.5 hours and it was pretty tough. Most of the time, we talked about my thesis which was about math information retrieval.
My takeaway is that it’s good to have a project to talk about. It doesn’t have to be a thesis, you don’t really have to do a Master’s. But you should have a real-life project, a real application of machine learning. This is especially important if you don’t have a lot of experience in your CV, if it’s your first full-time job. It can be a thesis, a course project, a side project, a Kaggle competition. If you have something to talk about, it’s a big plus; it makes the conversation smoother.
EY: After two stints as a data scientist, you’re now a lead data scientist. For people who’ve had a few years of experience, what does it mean to be senior?
AG: Specific to data science, I think a senior is someone who can do a project end-to-end. This includes talking to stakeholders, figuring out if ML is actually the right tool, translating requirements into the language of ML, understanding if the problem is worth solving, breaking a big, ambiguous problem into smaller tasks for other members of the team.
It doesn’t mean that they are a rockstar who can do everything, but they’ll need to assess the situation—is it really worth spending time working on this problem? Do we need ML, or is something simpler good enough? They might work with data engineers, or build the data pipelines themselves. Then, they’ll train the model, and serve it.
To become a senior, the communication and problem framing aspects are essential. Some people might call this position a lead data scientist. For me, the main distinction is that a senior is mostly involved in one project. They’re making all the decisions in one project. They’re still very hands-on, spending more than 50% of the time coding.
For a lead, it’s more projects, less hands-on. It’s more communication with multiple stakeholders.
EY: What are the common challenges and pitfalls that most data scientists trip on?
AG: People often underestimate the amount of time it takes to deploy a model. Not just the deployment, but building data pipelines, etc. Also, we don’t spend enough time making sure that we’re solving the right problem, making sure that what we do actually matters. If you spend half a year developing this great model that nobody cares about, then you’ve just wasted half a year.
Ask yourself, why are we doing this? What kind of problem are we trying to solve? Who is the user? How will they use it? Are they going to use it the way we imagine, or will they do something different? Having this conversation with the user is very important.
EY: Looking back on your career, what are some things that were a waste of time? And what were some things that you think you should have done earlier?
AG: The Master’s, I don’t think it was really necessary. Maybe back then it kind of was, but now, it’s not necessary for sure.
One thing that helped was starting to freelance in parallel with my studies. That gave me a lot of projects, and gave me a great portfolio. That was helpful to me. While freelance is not for everyone, if you’re studying and have some free time, it’s probably a good idea to freelance a bit.
Another thing is this document I have for each project. From the very first project meeting, I capture everything in a document. What’s the problem? Why do they think ML is a good solution? What does success look like? What are the next steps? Every time we follow-up on a topic, I capture it in the document. And over time, it captures the history of how this problem evolves.
I started doing this in OLX, I didn’t do this previously. But now thinking back, I should have started doing this even when I was freelancing.
EY: In the data field, things progress very quickly. How do you decide what to learn? How do you know if it’ll be relevant and worth your time?
AG: I remember trying. I had one RSS reader to subscribe to arXiv RSS. It was basically impossible to keep up. I also had a folder on Dropbox, and I called it “To Read”. At some point, it became half a gigabyte. I remember that day, when I decided that I know I’m not going to read this, that was a good day (laughs).
At some point, I just thought to myself: Do I really need all this information? What am I going to do with it? Just realizing that there’s no way to digest all this information, it helps a lot.
How to choose what to learn? I don’t know, I just try to focus on the problem that I’m solving. Whatever works for that problem, I try to find.
Also, for the last 2 – 3 years, I’m trying to learn things outside of data science. A bit of marketing, how to speak, how to read. If you like something, you learn, and when you stop liking it, maybe you’ve learned enough, and you move on.
EY: Recently, you started the datatalks community. Why and how do you see it growing?
AG: It somehow happened naturally. I’m writing a book, and one of the readers asked, “Is there a place where I can talk about this book?” I realized that there’s actually no such place, and decided to create a place for that.
I also get a lot of questions on LinkedIn, Twitter, email, Quora. I try to answer these questions, but it doesn’t scale. That was another reason I started the community slack. People can ask these questions in public, and we can answer them. Maybe if I start by showing an example, more people can help with the questions?
The community also hosts meetups. One reason why I do this is that, sometimes I want to talk at a conference, I submit a great proposal, and I’m rejected (laughs). It’s disappointing, and I think, why do I have to submit a proposal? Can’t I just talk about it myself? And this is how it happened, maybe I can just host meetups.
Also, a friend asked, do you know of a place where I can give a talk? Why yes I do! This was the SageMaker event that you attended. It was our first talk.
EY: Currently, Alexey hosts one talk a week. I don’t know how he keeps up with the cadence, but it’s a great time to join datatalks.club.
Eugene Yan works at the intersection of machine learning & product to build ML systems. He’s currently an Applied Scientist at Amazon. Previously, he led the data science team at Lazada and uCare.ai. He writes on how to be effective at data science, machine learning, and career at eugeneyan.com and tweets at @eugeneyan. Follow him at Twitter or subscribe to his weekly newsletter to learn more about this space.
Disclaimer: This article was written by a contributor. All content is written by and reflects the personal perspective of the writer. If you’d like to contribute, you can apply here.