AI

BigQuery’s AI.AGG() Aggregates Unstructured Data in One SQL Line

BigQuery’s AI.AGG() preview adds natural-language aggregation to SQL. Watch the token-cost amplification, NULL-struct edge cases, and a disabled daily quota.

Published

7 hours ago

June 30, 2026

Logan Pierce

Google put AI.AGG() into preview for BigQuery on May 18, 2026, a new function that takes a natural-language instruction and runs it as an aggregate over millions of rows of unstructured or multimodal data inside a single SQL call. The function sits beside BigQuery’s existing managed AI lineup (AI.CLASSIFY(), AI.IF(), AI.SCORE(), and the general AI.GENERATE()) and ships as the first one designed to summarise across rows rather than score them one at a time.

The official walkthrough pairs the function with two demos (an Apache Spark log dump and a pet supply catalogue), but the operational details sit beneath the demos. AI.AGG() uses a multi-level batching pipeline that can raise token spend above the raw row count. It drops entire rows on a single NULL field inside a STRUCT. And it lands in a BigQuery environment whose daily token-quota control was turned off a week after it went generally available.

What AI.AGG() Does in One SQL Line

AI.AGG() accepts a natural-language instruction alongside a column of unstructured data and returns a single summarised value for each group, much like SUM() returns a single number per group. Google’s deep dive gives three sample questions the function is built to answer:

Customer feedback themes. The top three feature requests hidden in negative product reviews.
Operational signals. The most frequent user errors and where to start investigating them.
Agent reliability. The scenarios where an automated agent consistently fails to resolve customer issues.

Image inputs work too, via external object tables over Cloud Storage buckets. The pet supply demo pairs AI.AGG() with AI.CLASSIFY() to turn unstructured product descriptions into a structured category column that can be grouped, counted, and re-summarised in standard SQL. Output is always a string, even when the prompt asks for JSON or Markdown. The database engine does not enforce the structure the model returns, and LLM responses can shift between runs.

Other managed AI functions in BigQuery (AI.IF(), AI.SCORE(), AI.GENERATE_TEXT() under the AI.GENERATE() umbrella) cover scalar and row-level work, per the BigQuery generative AI overview. AI.AGG() is the first in the lineup that aggregates across rows with natural-language instructions. The docs list it as the latest entry in the managed AI family.

For BigQuery users, the preview is the first chance to ask natural-language questions of structured and unstructured data in the same SQL script. The accompanying docs page lists AI.AGG() alongside AI.CLASSIFY(), AI.IF(), AI.SCORE(), and AI.GENERATE() under the umbrella of managed AI functions. The function is available in preview to all BigQuery users per Google’s deep dive.

BigQuery AI.AGG function natural-language SQL aggregation

How AI.AGG() Handles LLM Context Windows

Large language models cannot accept millions of rows in a single forward pass, so AI.AGG() solves the context-window problem by splitting input rows into batches, summarising each batch, then aggregating those intermediate summaries into a final answer. The full AI.AGG() function walkthrough spells out the rule: the function ‘automatically divid[es] your input rows into batches, aggregating those batches, and then aggregating the results of those batches into a final answer.’ The two-step summarise-then-aggregate pattern is the function’s core trick. It lets a SQL script handle tables far larger than any single model call could swallow. Each batch summary is itself a model call, which is what makes the token math interesting later.

The pipeline never splits a single row across batches. If one row is too large for the model’s context window, the function skips it rather than truncating it. Google’s deep dive tells teams to keep each individual row smaller than the context window, since ‘Many smaller rows will give AI.AGG() more flexibility with how to batch each row.’

Model selection is explicit. With no endpoint set, AI.AGG() defaults to a recent model. Production pipelines can pass a short-form endpoint such as gemini-2.5-flash (resolved in the query execution region) or a fully-qualified name when the desired model lives outside that region. The deep dive frames explicit selection as the safer choice for production, since pinning the model removes drift across runs. Defaulting to ‘a recent model’ means a silent model upgrade can change results without anyone touching the query.

Preview since: May 18, 2026 (BigQuery release notes).
Inputs: text, references to text files, image data, and arrays of those types.
Outputs: always a string, even when the prompt asks for JSON or Markdown.
Multimodal output: not supported.
Default model: a recent Gemini if no endpoint is set.

Token Costs Stack, and the Quota Is Off

The multi-level pipeline that solves the context-window problem also raises the bill. Google warns that ‘the total input tokens sent to the model may be higher than the raw tokens in your starting table,’ because the same content is summarised once per batch and again at the final stage. The mitigation is the same one teams would write for any LLM call: pre-filter upstream and add a LIMIT clause before AI.AGG() runs.

Cost control became more urgent a few weeks later. The BigQuery release notes page shows daily token quotas for BigQuery’s generative AI functions reaching general availability on June 8, 2026. The same notes then show the feature ‘temporarily disabled’ on June 15, 2026. Google states it is ‘working to restore this feature as soon as possible.’

AI.AGG() is exactly the function a daily token cap would catch first, since its multi-level batching is the path that pushes tokens above the raw row count. Without the quota, the only safeguard is upstream filtering. Teams wiring the function into production pipelines will want to monitor job statistics the same way they watch slot usage. BigQuery exposes per-job model invocation counts in the same job statistics screen used for scalar AI functions. The Google team did not list a fix date for the disabled quota in the June 15 entry.

Until the quota returns, AI.AGG() sits in a strange spot. It is the most token-intensive managed AI function in BigQuery, by design. And it is the one running without the cost-control knob the rest of the managed AI lineup got for two weeks in June. Engineers adopting the function will need to enforce their own pre-filtering discipline, log the failed-row counts from BigQuery job statistics, and pin a specific model endpoint in production queries so a silent model upgrade does not change results.

NULLs, Errors, and the Silent Row Drop

AI.AGG() skips NULL input rows without processing them. The less obvious rule is what happens to structured inputs: AI.AGG() concatenates STRUCT fields in the same way the standard CONCAT() function does. So a single NULL field inside a STRUCT turns the entire row into NULL and drops it. The deep dive recommends IFNULL() fallbacks around any STRUCT field that might be missing. The result is the same: ‘guaranteeing that every product is taken into account even if its description is blank.’

On errors, AI.AGG() returns partial results. Rows the model rejects are excluded from the final output, and BigQuery job statistics show the count of failed rows for inspection. Outputs that depend on the model’s structured format are not deterministic across runs, which means the JSON returned by a categorisation prompt can shift between executions.

The combination matters. A team running AI.AGG() over a STRUCT with sparse fields can lose rows without any warning, then see the model return confidently on the rows that survived. Best practice is to scan the table for NULLs in any column AI.AGG() will pass through a STRUCT, and wrap those columns with IFNULL() fallbacks. Without that discipline, the function can quietly return a summary that looks reasonable but excludes a meaningful slice of the catalogue.

NULL input rows are skipped silently. No warning, no error.
One NULL field in a STRUCT drops the entire row. The whole struct is treated as NULL.
Wrap nullable fields in IFNULL(). Guarantees the row enters the aggregation.
AI.AGG() returns partial results on errors. Invalid input or model failures do not abort the query.
Failed row counts live in BigQuery job statistics. Same place scalar AI functions report.

What Google’s Two Demos Actually Show

The first demo is a public Apache Spark standard INFO log dataset from Loghub, a research collection of log files from distributed systems. Spark clusters can run into memory thrashing, clock drift, or broadcast bottlenecks without ever throwing a FATAL error, so the same logs that look clean on a dashboard can hide expensive slowdowns. The query groups logs by component and asks AI.AGG() to surface what looks normal alongside anything anomalous. The prompt explicitly tells the model it is allowed to say ‘everything is fine,’ which Google’s deep dive credits with preventing hallucinated errors. The demonstration runs on a public dataset, so teams can reproduce the same query in their own BigQuery projects to see how AI.AGG() ranks latent inefficiencies alongside clean runs.

The second demo uses cymbal_pets, a fictional pet supply shop in BigQuery’s public datasets. AI.AGG() first reads raw product names and descriptions to discover the categories in the catalogue, then returns those categories in JSON format. That JSON is passed to AI.CLASSIFY() to label every product. The labelled table is grouped by category for a final per-group AI.AGG() summary alongside conventional SQL row counts.

The same project also includes a Cloud Storage bucket of product photos, which the demo passes through an external object table. Because AI.AGG() supports image inputs natively, the function returns an aggregated summary of the visual content of the entire collection. That is the same shape of call that turns a folder of product images into a category description.

Dataset	What AI.AGG() does	Notes from the demo
Apache Spark INFO logs from Loghub	Groups by component and summarises normal operations alongside anomalies	Prompt tells the model it can say ‘everything is fine’ to limit hallucinated errors
cymbal_pets product catalogue	Discovers categories, labels products with AI.CLASSIFY(), then summarises per category	Combines AI.AGG() output with conventional SQL aggregations such as row counts

Frequently Asked Questions

When did AI.AGG() become available in BigQuery?

Google’s BigQuery release notes list AI.AGG() as a new feature in Preview, dated May 18, 2026. The same release notes entry describes the function as semantic aggregation over unstructured input data driven by natural language instructions.

Does AI.AGG() replace traditional SQL aggregates like SUM() and COUNT()?

No. AI.AGG() is an aggregate function, but instead of summing numbers, it summarises unstructured text and image data with a natural-language prompt. The pet supply demo pairs the function with conventional SQL aggregations, returning per-group row counts and AI-generated summaries side by side.

What happens if a single input row is too large for the model’s context window?

If a row outgrows the model’s context window, AI.AGG() drops it from the batch, because the multi-level pipeline batches whole rows and never splits them. The deep dive’s rule is to size rows to fit, since smaller rows let the function pick better batches.

Can I cap token spending on AI.AGG()?

Not at the moment. BigQuery release notes show daily token quotas for generative AI functions went generally available on June 8, 2026, then show the same feature temporarily disabled on June 15, 2026, with no restore date listed. Until it returns, the only built-in controls are upstream filtering and LIMIT clauses before AI.AGG() runs.

Does AI.AGG() work on images?

Yes. The function supports image inputs through external object tables over Cloud Storage buckets, and the pet supply demo aggregates a folder of product photos into a category summary. Multimodal output, such as generated images, is not supported, and the return value is always a string.