BlazingSQL review: Fast ETL for GPU-based data science

1 Year Subscription

BlazingSQL is a GPU-accelerated SQL engine built on top of the RAPIDS ecosystem. BlazingSQL allows standard SQL queries to be distributed across GPU clusters, and the results to be fed directly into GPU-accelerated visualization and machine learning libraries. Basically, BlazingSQL provides the ETL portion of an all-GPU data science workflow.

RAPIDS is a suite of open source software libraries and APIs, incubated by Nvidia, that uses CUDA and is based on the Apache Arrow columnar memory format. CuDF, part of RAPIDS, is a Pandas-like DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data on GPUs.

For distributed SQL query execution, BlazingSQL draws on Dask, which is an open source tool that can scale Python packages to multiple machines. Dask can distribute data and computation over multiple GPUs, either in the same system or in a multi-node cluster. Dask integrates with RAPIDS cuDF, XGBoost, and RAPIDS cuML for…



UCSD DevOps CICD

Continue reading on source link

Leave a Comment

Your email address will not be published. Required fields are marked *

64 − 56 =