Apache REEF™

What is REEF?

REEF, the Retainable Evaluator Execution Framework, is our approach to simplify and unify the lower layers of big data systems on modern resource managers.

For managers like Apache YARN, Apache Mesos, Google Omega, and Facebook Corona, REEF provides a centralized control plane abstraction that can be used to build a decentralized data plane for supporting big data systems. Special consideration is given to graph computation and machine learning applications, both of which require data retention on allocated resources to execute multiple passes over the data.

More broadly, applications that run on YARN will have the need for a variety of data-processing tasks e.g., data shuffle, group communication, aggregation, checkpointing, and many more. Rather than reimplement these for each application, REEF aims to provide them in a library form, so that they can be reused by higher-level applications and tuned for a specific domain problem e.g., Machine Learning.

In that sense, our long-term vision is that REEF will mature into a Big Data Application Server, that will host a variety of tool kits and applications, on modern resource managers.

How can I get started?

The official home for the REEF (and Tang and Wake) source code is at the Apache Software Foundation. You can check out the current code via:

$ git clone git://git.apache.org/reef.git

or directly access its GitHub page here.

Detailed information about REEF and using it can be found in the FAQ and the Tutorial.

If you wish to contribute, start at the Contributing tutorial or the Committer Guide!

Further questions?

Please visit our Frequently Asked Questions page or use our Mailing List to send us a question!