I need something like Flume but not on the infrastructure side, more on the receiving, accumulation, and preparation side.
Say I have a web logs. I can get them near Hadoop easily. I can get them in Hadoop pretty easy. And then it’s like what’s next?
- Bot Classification
- Dimension blow-out
- Core value extraction