Over the last 9 years, we’ve worked on hundreds of AI/ML projects with our clients. The vast majority of these projects turned out to be very successful, but we have certainly made mistakes along the way. Lots of them.
We’ve learned from every one of these mistakes so we don’t do it again. From issues with the data, API’s, building ML models and getting them into production, we’ve messed up. However, over time, these lessons are invaluable in our success in delivering value to our clients. Quite simply, we’ve already made the mistakes so our clients don’t have to.
Here are our Top 5 Lessons Learned over the years.
1. Data Issues
The most important ingredient when building AI/ML solutions is the data. Each project is unique and requires different types of data, but we run into issues when a) there just isn’t enough historical data to create a good model, b) the data is not reliable and has quality issues, c) there are not enough data points for the model.
We have run into issues with all of these areas when building ML models. When you are getting started with AI/ML projects, whether it’s with your in-house team or with a partner, your first step is evaluating your data quality. Only when you have your data in a good place, should you attempt to start these projects as it leads to far less issues with your ML models.
2. Realistic Expectations
Over the last few years, the topic of AI and Machine Learning has gone mainstream. On one hand it is great since people are more aware of ways to apply AI/ML solutions and how it can benefit their business. On the other hand, expectations can sometime be unrealistic. Many project stakeholders think that AI can solve all of their problems, realize huge returns and implement an AI project in just a few weeks.
The reality is that building successful AI/ML solutions is a process. It takes an upfront investment and some time before you can see the true ROI. However once the solution is in production the gains can be significant. Although these results can be huge over time, it is important set realistic expectations and use an iterative approach. We prefer to take small bites and show early wins for our clients. If your goal is to eventually get to 90% efficiency or cost reductions, start with 60%, then 70%, etc, until you hit your goal.
3. Choosing the Right Model
Most of the time there is a news story around AI/ML achieving some amazing feat (for instance, identifying breast cancer in slides better than a pathologist) there is a neural network behind the scenes. These incredible models mimic the way a human brain functions, learning just like a toddler learns to speak. Needless to say, when clients who’ve been researching AI come to us, they want a neural network.
Choosing a model before starting a data science project would be like buying a car without thinking about how it’ll be used; a convertible Corvette sounds great until you’re facing a Midwest winter! In order to figure out what model makes the most sense for the situation, we start out with an assessment of what data is available and what the model needs to achieve in order to align business goals with a model’s output. If you need to understand how the model makes it’s decisions, whether to generate adoption or meet legal compliance requirements, tree-based models come in handy. If you’re modeling behavior types, an unsupervised clustering model might better suit your needs. Being flexible and allowing the data and the problems to guide the analysis results in models that can be converted to business value.
4. Putting a Model into Production
There are a seemingly endless number of traps that need to be avoided during the process of productionalizing an AI/ML model. Unfortunately, some of these traps can be fatal – causing the model to just not work. Others are more subtle, causing sporadic problems that are difficult to troubleshoot and may cause users to loose confidence in the model.
In practice, we have found there generally isn’t ‘one thing’ that causes issues with a model post deployment. However, most issues we have encountered in the productionalization process come from the simple fact that models are developed in a lab environment. The data has been precleaned, missing values have been removed or replaced, outliers have been removed, etc. The data has been sterilized, all dependencies are available, the model has been created and everything works perfectly. The reality is that production is often anything but sterile – and often times model developers underestimate how different these two environments will be.
So what is the solution? We find it is important to think and plan for production while you are still in the safe environment of the lab. Early planning for production will lead to valuable insights about how the model will be used, but will also save time troubleshooting issues post deployment.
5. Not Enough Coffee
Ok, ok. Maybe this isn’t the most technical lesson learned, but since you got this far down in this article we thought we might as well give you some humor. Ask any data scientist or software developer about what their “must have” tools are, and most would agree that ample amounts of coffee are on the top of the list. Our #1 mantra here at NLP Logix is “Data Science is a Team Sport”, but our #2 mantra is “…but coffee first.”