Speed Up Data Analytics on GPUs
Learn how to accelerate analytics for large data sets on Google Cloud by tapping into NVIDIA GPUs – no code changes required.
Go back
Join the community
Join today to access an exclusive forum, joint learning paths, rewards, and earn badges.
Optimize your analytics workflows with GPUs
Slow data processing times can be a big bottleneck, especially when dealing with the large datasets required to train modern AI models. This video introduces GPU-accelerated data analytics workflows on Google Cloud, allowing you to quickly transform large datasets and generate faster insights.
Get started with accelerated data analytics using NVIDIA cuDF
This hands-on lab demonstrates how to use NVIDIA cuDF within a Google Cloud Colab Enterprise environment to dramatically accelerate `pandas` data analytics workflows with zero code changes.
- Set up a cloud environment: Configure and connect to a GPU-accelerated runtime using Colab Enterprise runtime templates.
- Accelerate pandas instantly: Use cuDF to accelerate `pandas` and `scikit-learn` code without modification.
- Benchmark performance: Run a data analytics pipeline on both CPU and GPU to quantify and visualize the speedups.
- Integrate with cloud storage: Practice reading and writing large Parquet datasets at high speed directly from and to Google Cloud Storage (GCS).
Fixing pandas memory errors: 3 easy solutions
"Out of Memory" (OOM) errors are a common roadblock; they happen when pandas tries to load a file that consumes more RAM than your system has available. This blog covers three solutions to the issue: Swap Space is a simple fix that uses your slow hard drive as extra memory, which allows you to load the full dataset, but impacts performance. Sampling is a very fast, memory-light option for quick analysis, but it only uses a portion of your data, sacrificing full fidelity. NVIDIA cuDF with UVM uses the GPU and CPU RAM together to process the entire dataset at high speed without data loss.