Tuesday, August 23, 2011

Pig

We are creating infrastructure to support ad-hoc analysis of very large data sets. Parallel processing is the name of the game. Our system runs on a cluster computing architecture, on top of which sit several layers of abstraction that ultimately bring the power of parallel computing into the hands of ordinary users. The layers in between automatically translate user queries into efficient parallel evaluation plans, and orchestrate their execution on the raw cluster hardware.
The highest abstraction layer in Olivia the pig costume is a query language interface, whereby users express data analysis tasks as queries, in the style of SQL or Relational Algebra.

No comments:

Post a Comment