Our new expert spotlight series showcases the great minds of SMG—those who keep us innovating, growing, and driving meaningful change. We sat down with Eric Lemmon, Data Science Lead, to get the scoop on predictive analytics, text sentiment precision, and what it’s like to work with such a multidisciplinary team.
Describe your day-to-day at work.
Our team's mission is to infuse advanced analytics into SMG's offerings. That most often alludes to product enhancements, so I often work with several other teams—like Product, Marketing, and Client Insights—to identify upcoming opportunities and market needs that may benefit from a data science perspective. Downstream in our delivery pipeline, I also work with our Architecture and Engineering teams to come up with creative ways to integrate and expedite the delivery of our machine learning models into production.
One of my favorite parts is staying close to the data. I routinely wear my data engineering hat by assisting with general data access/delivery and data pipelines across our many data platforms: Hadoop, Spark, Kafka, SQL Server, MongoDB, etc.
What are the biggest challenges of your job?
Working with data scientists! Just being lighthearted, of course, but there is actually a lot of truth in that. There is a huge divide between a data scientist's approach to problem-solving and technical discipline: engineering, architecture, etc. This gap is my niche—I try to get everyone across teams speaking the same language and understanding how to deliver the end solution.
Also, the speed of our team's deliverables vary greatly—anything from a multiple-month project down to a couple of hours. We are always on the lookout for ways to more naturally fit into the rest of our regimented agile engineering process.
What do you love most about your job?
That's easy: the team. And I don't just mean the Data Science team (since that would be overtly biased!) I am referring to the organic, cross-functional teams that spring up across SMG to overcome pressing challenges. It’s a beautiful thing when you pull together subject matter experts spanning science, technology, and business disciplines—on three continents—to deliver value to our clients.
A recent example of this is how we discovered a deficiency in our text analytics sentiment performance for a specific industry, and we came together as a multidisciplinary team to tackle it head-on. The team ended up using state-of-the-art deep learning techniques to boost sentiment precision considerably, in a very short time.
What is your biggest accomplishment at SMG?
After building out our operational risk machine learning models for food safety and pest detection, we needed to rapidly deploy and demonstrate those capabilities. To do that, we assembled a small team of technical experts to build out a data & ML pipeline across several big data systems.
This allowed us to safely tap into our live data streams, apply our ML models, and show a dashboard of potential food safety and pest issues across all of our restaurant, grocery, and convenience store clients simultaneously. We went from concept to production in just a few days. With all of those sophisticated systems in play, that could only have been done efficiently (and with zero production impact!) with a seasoned, highly-skilled team.
How does data science continue to evolve, and what is SMG doing to innovate along with it?
Data science is rapidly evolving on many fronts. One topic that is booming right now is explainability: a "show-your-work" mindset for machine learning models that gives insight to the reason a model had a certain outcome. This heightened transparency is very important to us, and we are employing state-of-the-art techniques internally that we hope to later build into our products. This not only builds trust with our users, but also gives us critical clues as to how to curate our training data to remove any unintended bias.
Another area that is crucial for any organization participating in supervised learning is training cost. Acquiring training data is usually very expensive—especially for classification problems having a highly imbalanced distribution of values. For example, we built classifiers to detect a food safety issue on open-ended text comments, even though actual food safety cases only occur one in every several thousand surveys. We have billions of comments at our disposal, but it is impractical to label each of those for food safety issues. So instead, we used active learning techniques to pare down our data to get us enough positive cases to be useful for training and refinement, then transferred learning to layer our custom logic atop industry-leading models.
How did you end up in this line of work?
My last few years as an architect gravitated away from general technical architecture, toward data architecture and engineering. That coincided with our former leader of data science needing my skill set, so I jumped at the chance to help him build out his team.It has been very rewarding to play a role in building out capabilities that genuinely help others (such as our food safety alerting). A quote from John D. Rockefeller captures it much more eloquently: Don't be afraid to give up the good to go for the great.