Hurricane: kicking skew out of analytics

Jun 23, 2018

Hurricane is a high-performance large-scale data analytics system that successfully tames skew in novel ways. Hurricane performs adaptive work partitioning based on load observed by nodes at runtime. Overloaded nodes can spawn clones of their tasks at any point during their execution, with each clone processing a subset of the original data. This allows the system to adapt to load imbalance and dynamically adjust task parallelism to gracefully handle skew. We support this design by spreading data across all nodes and allowing nodes to retrieve data in a decentralized way. The result is that Hurricane automatically balances load across tasks, ensuring fast completion times.

For more details, you can find the paper here.

Chaos: the paper that pushed the limits

Dec 21, 2015

Chaos is a graph processing system for analytics on big graphs using small clusters. Chaos builds on the X-Stream single-machine graph processing system, but scales out to multiple machines. Chaos treats the aggregate storage of all machines as a single flat disk and uses work stealing to balance the load across nodes in the cluster. It exposes the familiar scatter-gather-apply programming model.

For more details, you can find the paper here.

X-Stream: the paper that started it all

Feb 3, 2015

X-Stream is a system for analytics on big graphs using small clusters of machines. X-Stream is based on the philosophy that sequential access to data works best for all storage mediums: main memory, SSD and magnetic disk. It is therefore built around a storage-centric and streaming philosophy: rearrange graph algorithms to stream data from storage.

For more details, you can find the paper here.