522 views 18 secs 0 comments

BlazingSQL review: Fast ETL for GPU-based data science

In General
March 18, 2021


BlazingSQL is a GPU-accelerated SQL engine built on top of the RAPIDS ecosystem. BlazingSQL allows standard SQL queries to be distributed across GPU clusters, and the results to be fed directly into GPU-accelerated visualization and machine learning libraries. Basically, BlazingSQL provides the ETL portion of an all-GPU data science workflow.

RAPIDS is a suite of open source software libraries and APIs, incubated by Nvidia, that uses CUDA and is based on the Apache Arrow columnar memory format. CuDF, part of RAPIDS, is a Pandas-like DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data on GPUs.

For distributed SQL query execution, BlazingSQL draws on Dask, which is an open source tool that can scale Python packages to multiple machines. Dask can distribute data and computation over multiple GPUs, either in the same system or in a multi-node cluster. Dask integrates with RAPIDS cuDF, XGBoost, and RAPIDS cuML for…



Continue reading on source link

Leave a Reply
You must be logged in to post a comment.