31
Teaching & Research Forum / Why should I learn Hadoop if I want to get into data mining and big data?
« on: November 28, 2015, 12:19:12 PM »
There was a treasure hunter who was told that there was a treasure trove of diamonds hidden beneath a massive desert of shifting sands. At his disposal was a myriad of tools including satellites, sonars, teams of scientists, and a large sand-moving excavator.
He wondered: Should I learn how to drive the excavator to find treasure and move Big Sand?
Your initial question is actually two questions in one.
You need to find the treasure and move the sand.
The excavator will help both, but it is mostly about moving Big Sand.
1) You don't need Hadoop to get into data mining.
Period. Hadoop is not a treasure hunting / data mining tool. It will not help you see 'sift the sand' in any more than rudimentary ways. It is not meant to do analytics. It is closer to the domain of data warehousing than analytics.
2) Hadoop might help if you want to get into Big Data.
Hadoop is a data storage, processing and management ecosystem. If you want to get into the infrastructure side of data, you could learn Hadoop. But remember that Hadoop is just one brand of excavator. (Albeit an open-source one, at least for now). It is by no means the only brand out there. As the other answer alluded to, you should focus on the principles and thinking in the space, not the tools.
In my travels I have met some excellent data miners, including a 1st place winner of Kaggle, a few analytics department heads from listed companies, and guys quietly working in the background on things that none of us will ever hear about because it is a source of competitive advantage that companies will never publicise. Depending on the industry and applications in question, 'big' data can be just one important piece of puzzle, but the focus certainly isn't on Hadoop per se.
He wondered: Should I learn how to drive the excavator to find treasure and move Big Sand?
Your initial question is actually two questions in one.
You need to find the treasure and move the sand.
The excavator will help both, but it is mostly about moving Big Sand.
1) You don't need Hadoop to get into data mining.
Period. Hadoop is not a treasure hunting / data mining tool. It will not help you see 'sift the sand' in any more than rudimentary ways. It is not meant to do analytics. It is closer to the domain of data warehousing than analytics.
2) Hadoop might help if you want to get into Big Data.
Hadoop is a data storage, processing and management ecosystem. If you want to get into the infrastructure side of data, you could learn Hadoop. But remember that Hadoop is just one brand of excavator. (Albeit an open-source one, at least for now). It is by no means the only brand out there. As the other answer alluded to, you should focus on the principles and thinking in the space, not the tools.
In my travels I have met some excellent data miners, including a 1st place winner of Kaggle, a few analytics department heads from listed companies, and guys quietly working in the background on things that none of us will ever hear about because it is a source of competitive advantage that companies will never publicise. Depending on the industry and applications in question, 'big' data can be just one important piece of puzzle, but the focus certainly isn't on Hadoop per se.