Local AI Infrastructure Notes (14/15) — Nano Banana 2 Real-World Cost

Three Token Types That Create the Gap Between the $0.067 Price Sheet and the $0.08 Invoice

핵심 요약

Official price sheet: 1K images = $0.067. Measured invoice: $0.08/image.
The primary driver of the gap is thinking tokens. Counting only image tokens yields ~17% underestimation.
Back-calculation over a 347-image batch: $0.075/image. Clearly above the $0.067 fixed-cost assumption due to thinking-length variance.
This post covers the token billing structure of Gemini 3.1 Flash Image, a redesigned budget formula, and three pipeline-level traps.

Problem Definition — Price Sheet and Invoice Do Not Match

The blog image auto-generation pipeline uses Gemini 3.1 Flash Image (Nano Banana 2). After a batch run across 121 Korean blog directories — 347 images total — the invoice came to 36,400 KRW (approximately $26 USD).

Applying the commonly cited unit price of $0.067/1K images, 347 images should cost roughly $23. The actual charge exceeded that by $3, a ~12% gap. In a one-off batch this is dismissible; at monthly scale or across larger batches, it becomes a systematic forecasting error.

The root cause is not a missing line item in the price sheet. A single summary figure hides two of the three per-request token channels that are actually billed.

How It Works — Three Token Channels Billed Per Request

Gemini 3.1 Flash Image consumes tokens across three channels per request.

Channel	Unit Price	Per 1K-image request
Input (text/image)	$0.50 / 1M tokens	~200 tokens → $0.0001
Output (text + thinking)	$3.00 / 1M tokens	~500 tokens → $0.0015
Output (image)	$60.00 / 1M tokens	~1,300 tokens → $0.078

The critical entry is the middle row: thinking tokens. Before generating an image, the model performs internal reasoning, and that reasoning is counted as output tokens. At an average of ~500 tokens and a unit price of $3/1M, this adds approximately $0.0015 per image.

Summing all three channels, the measured unit cost is image tokens ($0.067–$0.078) plus thinking and input, yielding approximately $0.08/image. The back-calculated figure of $0.075/image from the 347-image batch falls within this range — near the lower bound, consistent with a mix of images that required shorter thinking sequences.

Redesigned Budget Formula

Old formula: count × $0.067
Revised formula: count × $0.08

A single coefficient change, but it propagates through downstream operating parameters.

Single-batch ceiling at $30 → cap adjusted to approximately 375 images.
For simultaneous KR + EN publication, English posts reuse KR-generated images — additional image cost: $0.
5 new posts/week, average 30 images/post = 150 images → approximately $12/week.

The difference between $10/month (price-sheet assumption) and $12/month (measured baseline) is less a budget concern than a forecasting accuracy problem. A wrong coefficient produces recurring invoice variance; accumulated over time, that variance distorts batch-scaling decisions.

Pipeline Considerations — Three Traps Beyond Cost

Three operational rules that must be locked in at the pipeline level alongside the cost formula.

1. Korean-language Prompts Break In-Image Text Rendering

Passing Korean-language prompts to gemini-3.1-flash-image-preview produces garbled text within the generated image. A language-translation step (KR → EN) is a required preprocessing stage in the pipeline and cannot be skipped.

2. The Image Generation API Is `generate_content` — Not `generate_images`

Image generation uses the generate_content endpoint. generate_images is a valid Google API endpoint but targets Imagen, not Gemini Flash Image. Calling the wrong endpoint does not raise an error — it returns a different response modality silently. Additionally, response_modalities=['IMAGE'] must be explicitly set; without it, no image is returned.

3. SynthID Watermark Cannot Be Disabled

Generated images automatically embed a SynthID watermark that cannot be turned off. It is not visible to the eye but persists in metadata. Commercial use is permitted, but the fact that AI-generation traces remain in metadata is an input variable for any distribution-channel decision.

Constraints and Scope

Fixed unit cost: $0.08/image is the operating coefficient. $0.067 is retained for first-citation reference only.
Batch ceiling: 375 images per single run. Split into multiple runs if exceeded.
Debugging order: when the invoice diverges from the forecast — ① formula → ② API token usage fields → ③ invoice. In LLM billing, the default suspect is the formula, not the invoice.
Scope: this formula applies specifically to Gemini 3.1 Flash Image. Other multimodal image models differ in whether thinking tokens exist and how billing is structured; empirical measurement via token usage fields is required before applying any assumption.

Open Questions

Can thinking token length be reduced through prompt design? (Prompt simplification vs. quality trade-off requires empirical measurement.)
How far can a caching layer that increases image-reuse rates push the cost curve down?
What distribution-channel policy constraints does SynthID metadata persistence create?

LLM-based image generation costs cannot be meaningfully summarized as a single number. Surfacing all per-request token channels and back-calculating from actual invoices is the minimum-cost path to accurate forecasting.

Series overview: Series index

이 블로그 검색

MaJu Tech Notes