Data Science - Simple is Best

Breaking news! It's not always the fancy tools that get you the results. Sometimes simple data investigation techniques are both the easiest and most appropriate.

Well, whilst I agree this is not exactly breaking news, I feel that it is something that we sometimes lose sight of. We have all these tools in our data science tool-kits, and feel that we just want to let loose and apply as many of them as we can get away with.

Even though a client has likely engaged you to do some fancy footwork in statistical analysis or machine learning, sometimes the questions they want answered don't necessarily require such a big bang approach. Delving into the data using a series of directed SQL queries may be all that is required to discover the value they seek. Sometimes even a series of charts embedded in an Excel workbook (perish the thought) are sufficient.

I've discovered that clients are normally impressed with the results of queries designed to answer their most pressing questions. We often get carried away trying to add 'value' where we we should first be trying to address the fundamentals. As long as the queries are correctly targeted, and as long as the submitted report outlines the methodology and results clearly, concisely and thoughtfully - hopefully containing a chart or two as well, then you have achieved what you set out to do.

What you can do, is leverage the must-have 'Next Steps' section in your reports to list possible next steps to pursue following the conclusion of the current investigation. It is here that you get the opportunity to outline your vision for further investigations designed to bring greater clarity or depth to the issue at hand. These may in fact include deeper analyses such as predictive analytics, forecasting and other machine learning techniques. But be aware that if the client is not disposed to committing more resources to the problem, or if your suggestions deviate too much from the core of the issue, the proposed list may not be supported.

Answering the most pressing questions the client has, is what really matters. How they are discovered is not overly critical, and machine learning should not be used just to appear impressive. Of course the questions may actually require ML, but if not, choosing a simpler method may in fact be faster and provide a lower chance of generating errors.

Good luck and happy data discovery!

Previous
Previous

Data Clarity

Next
Next

Python versus R: Who is choosing what?