Speed Up Data Analytics on GPUs

Optimize your analytics workflows with GPUs

Video

Slow data processing times can be a big bottleneck, especially when dealing with the large datasets required to train modern AI models. This video introduces GPU-accelerated data analytics workflows on Google Cloud, allowing you to quickly transform large datasets and generate faster insights.

Get started with accelerated data analytics using NVIDIA cuDF

Codelab

This hands-on lab demonstrates how to use NVIDIA cuDF within a Google Cloud Colab Enterprise environment to dramatically accelerate `pandas` data analytics workflows with zero code changes.

Set up a cloud environment: Configure and connect to a GPU-accelerated runtime using Colab Enterprise runtime templates.
Accelerate pandas instantly: Use cuDF to accelerate `pandas` and `scikit-learn` code without modification.
Benchmark performance: Run a data analytics pipeline on both CPU and GPU to quantify and visualize the speedups.
Integrate with cloud storage: Practice reading and writing large Parquet datasets at high speed directly from and to Google Cloud Storage (GCS).

Take codelab

Fixing pandas memory errors: 3 easy solutions

Article

"Out of Memory" (OOM) errors are a common roadblock; they happen when pandas tries to load a file that consumes more RAM than your system has available. This blog covers three solutions to the issue: Swap Space is a simple fix that uses your slow hard drive as extra memory, which allows you to load the full dataset, but impacts performance. Sampling is a very fast, memory-light option for quick analysis, but it only uses a portion of your data, sacrificing full fidelity. NVIDIA cuDF with UVM uses the GPU and CPU RAM together to process the entire dataset at high speed without data loss.

Read article

Join the community

Optimize your analytics workflows with GPUs

Get started with accelerated data analytics using NVIDIA cuDF

Fixing pandas memory errors: 3 easy solutions