Pig Advantages and Disadvantages

Apache Pig is a dataflow language that is built on top of Hadoop to make it

easier to process, clean and analyze 'big data' without having to write vanilla

Good old joins, distinct, union and many more commands are already in the language.

So what exactly Pig solves different than relational database is its applicability to 'big data' where it can crunch large files with ease and it does not need a structured data.

Data analysis matters because as original paper very good puts it: Data analysis is 'inner loop' of product innovation.

(So much for the analogy) Pig paper also introduces the basic motivation for Pig why it is useful and

as you read the paper you realize that the processing pipeline is actually Directed Acyclic Graph and paper goes a little more in depth in

