More data isn’t always better. Sometimes it’s just more. Collecting large amounts of data without strategy can often create massive data set tangles to unravel. Large data dumps tend to waste time and cause frustration if the information presented is not relevant. To avoid snags, smaller data sets can be utilized to effectively identify insights.
Over the past decade, Ben Webster, NLP Modeling and Analytics Team Lead, has seen a shift in how teams interact with data. The question used to be: “Do I have enough data?” Most companies were just at the point of sufficient data for machine learning tasks. The better question is: “Is the data relevant to the use case?” Webster starts with an investigation of the use case to ensure that the data is capturing what is needed to support the use case (product or solution) and nothing more. This means filtering out bad or incomplete data, isolating recent data, and focusing only on data attributes that drive the use case.