I’m into the data management/data quality from several years. When I ask some people what is data management processes they simply reply, “well, we have some of our data stored in a database and other data stored on file shares with proper permissions.” This isn’t data management…it’s data storage. If you and/or your organization don’t have good, clean data, you are most definitely not ready for machine learning. Data management should be your first step before diving into any other data project(s).

Now I’d say if you have good data management and tagged for machine learning so give yourself a pause and think with some fairly standard regression approaches that can solve the problem. I am not saying you should not start with ML soon but start with the simplest thing that could work instead of running full-speed toward machine learning.

The requirements needed before you get to ML, lots of data cleaning, management, etc needed. There are several real-world tasks and problems that humans, businesses, and organizations try to solve day in and day out for our benefit. There are several scenarios when it might be beneficial to make machines learn and here are some of them mentioned as follows.

  1. Lack of sufficient human expertise in a domain (e.g., simulating navigations in unknown territories or even spatial planets).
  2. Scenarios and behavior can keep changing over time (e.g., availability of infrastructure in an organization, network connectivity, and so on).
  3. Humans have sufficient expertise in the domain but it is extremely difficult to formally explain or translate this expertise into computational tasks (e.g., speech recognition, translation, scene recognition, cognitive tasks, and so on).
  4. Addressing domain specific problems at scale with huge volumes of data with too many complex conditions and constraints.
  5. A messanger or Chatbot for finding and buying sustainable products and services. Powered by human assisted AI and using data capture and machine learning.

There is still confusion in differentiating problems that can be solved with basic programming from those that might benefit from ML. An unfortunate corollary is that the same people believe ML will magically turn piles of manually maintained Excel and PDF files into insight. I usually tell people to start with the basics first and try out regression then move to machine learning (Random Forest, SVM’s, etc etc) and […]

Leave a Reply

Your email address will not be published. Required fields are marked *