Google just released a new AI agent for data scientists on Colab, and it’s free to use

(Image credit: Getty Images)

published 4 March 2025

Google Labs has made Data Science Agent available to all Colab users in a bid to help automate data analysis processes.

Google Colab is a free, cloud-hosted Jupyter Notebook where users can write and execute Python code.

According to the tech giant, the Data Science Agent can help users cut research and data analysis times from weeks to minutes, as there's no need for setup tasks like importing libraries, loading data, and writing boilerplate code.

"In fact, a scientist at Lawrence Berkeley National Laboratory working on a global tropical wetland methane emissions project estimated Google’s Data Science Agent reduced their analysis and processing time from one week to five minutes," said the firm.

Powered by Gemini 2.0, Data Science Agent users can describe the analysis they want to see in plain language. The necessary code, import libraries, and analysis are then generated in a working Colab notebook.

Data Science Agent provides fully functional Colab notebooks, Google said, and not just code snippets, but complete, executable notebooks. Solutions can be modified, with the generated code customizable and extendable.

"We’ve heard awesome feedback from trusted testers like 'Data Science Agent generated concise and high-quality code, effectively corrected errors, and proved to be user-friendly' and ‘I've already started exploring Data Science Agent and think it's a great product. This access will greatly help me streamline my data workflows and uncover valuable insights'," Google said.

Under the hood of Google's Data Science Agent

Data Science Agent works by orchestrating a composite flow that mimics a typical data scientist’s workflow, using the large language model (LLM) for task decomposition and planning.

A flow consists of individual atomic flows or subflows that specialize in concrete data science tasks such as data cleaning, data exploration, and data plotting.

Each atomic flow is a sequence of individual steps, using code execution and execution output feedback to complete a subtask and communicate downstream.

Combining natural language to code, the planning and reasoning capabilities of LLMs and code execution allow agentic capabilities such as self-refinement, error correction, and summarization, Google said.

Notebooks are generated dynamically by AI, including generated code, code outputs - for example plots - and text cells. They can be tailored by specifying the libraries, visualization types, algorithms, or evaluation metrics using natural language in the task description.

Users can also generate code for handling missing values, outliers, inconsistencies, and formatting issues, and create visualizations and summary statistics to understand the distribution, relationships, and characteristics of their data.

Similarly, they can implement hypothesis testing, correlation analysis, and other statistical techniques to draw meaningful insights from their data.

Meanwhile, predictive modeling allows machine learning models for regression or classification tasks based on your data and objective to be built and evaluated.

Google said future plans for Data Science Agent include interactive elements for user feedback and customization within the notebook generation process and enhancing natural language understanding for more flexible input descriptions.

It said it also plans to support additional data types for more complex tasks, along with the range of supported data science tasks and algorithms and larger file size upload support.

MORE FROM ITPRO

Emma Woollacott is a freelance journalist writing for publications including the BBC, Private Eye, Forbes, Raconteur and specialist technology titles.

Get the ITPro daily newsletter

Under the hood of Google's Data Science Agent

MORE FROM ITPRO