Local AI Infrastructure Notes (13/15) — Auto-Generating Blog Images with Nano Banana 2

Implementation notes on integrating Gemini 3.1 Flash Image (Nano Banana 2) with the Blogger API to auto-generate and insert images into blog posts.


Key Takeaways

  • Nano Banana 2 (Gemini 3.1 Flash Image) generates images via the generate_content API — a different call structure from Imagen 4.
  • The pipeline parses blog HTML to automatically determine per-section image insertion positions.
  • Actual cost including thinking tokens is ~$0.08/img, higher than the documented reference price of $0.067/img.

Background

The Blogger API auto-publish pipeline was already in place. The goal was to add image generation as a pipeline stage. Text-only posts hurt readability, and manual image creation becomes inefficient at scale. The solution: insert an image generation step into the existing API pipeline.


Body

1. Model Selection — Imagen 4 vs Nano Banana 2

Imagen 4 (imagen-4.0-generate-001) was evaluated first. It uses the generate_images API at approximately $0.02/img. However, text rendering inside images was frequently corrupted, and diagram expressiveness was limited.

Nano Banana 2 (Gemini 3.1 Flash Image, gemini-3.1-flash-image-preview) uses the generate_content API. Its SDK call structure differs from the Imagen family, and it consumes thinking tokens alongside output tokens — a factor that materially affects cost.

Item Imagen 4 Nano Banana 2
Model ID imagen-4.0-generate-001 gemini-3.1-flash-image-preview
API generate_images generate_content
Reference price (docs) ~$0.02/img ~$0.067/img
Actual price (thinking included) ~$0.08/img
Text rendering Frequent corruption Clean English text
Image quality Simple illustration Precise diagrams

Cost note: The documented $0.067 does not account for thinking token overhead. With thinking tokens included, the measured cost is $0.08/img. Use $0.08 as the planning figure for bulk generation.


2. API Call Structure

Nano Banana 2 uses generate_content, and its response structure differs from Imagen 4.

import google.generativeai as genai
from PIL import Image
import io, base64

genai.configure(api_key=API_KEY)
model = genai.GenerativeModel("gemini-3.1-flash-image-preview")

response = model.generate_content(
    contents=prompt,
    generation_config=genai.types.GenerationConfig(
        response_modalities=["IMAGE", "TEXT"]
    )
)

for part in response.candidates[0].content.parts:
    if hasattr(part, "inline_data") and part.inline_data:
        img_bytes = base64.b64decode(part.inline_data.data)
        img = Image.open(io.BytesIO(img_bytes))

Imagen 4's generate_images retrieves the image from response.generated_images. Nano Banana 2 requires iterating candidates[0].content.parts and extracting parts that carry inline_data. When using both models in the same codebase, image extraction logic must be branched per model.


3. Prompt Engineering — The Korean Text Trap

Passing Korean blog titles directly into prompts causes Korean characters to render corrupted inside the generated image. In some cases, meaningless character strings resembling CSS code appeared in the output.

Resolution strategy:

  1. translate_title_to_concept maps Korean technical terms to English visual concepts:
  2. "์—์ด์ „ํŠธ" → "AI agent"
  3. "๋ฐ์ดํ„ฐ ํ๋ฆ„" → "data flow"
  4. "์˜จํ†จ๋กœ์ง€" → "ontology"

  5. strip_korean forcibly removes any remaining Korean characters after the mapping step.

  6. Explicit language constraint in the prompt: All text must be in English only. No Korean, no Japanese, no Chinese characters.

With both steps applied, the probability of Korean characters appearing in the image converges to zero.

Automatic visual theme detection:

Keywords extracted from section content are injected as visual hints into the prompt.

def build_visual_hint(section_text: str) -> str:
    if "flow" in section_text or "ํŒŒ์ดํ”„๋ผ์ธ" in section_text:
        return "Connected nodes with data flowing between components"
    if "architecture" in section_text or "์•„ํ‚คํ…์ฒ˜" in section_text:
        return "Modular system architecture with interconnected blocks"
    return "Clean technical diagram on white background"

4. Structural Analysis — Automatic Image Insertion Positioning

Rather than placing all images at the top, the pipeline parses HTML structure to select contextually appropriate positions.

Parsing:

from bs4 import BeautifulSoup

def extract_sections(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "html.parser")
    sections = []
    for tag in soup.find_all(["h2", "h3"]):
        content = []
        for sib in tag.next_siblings:
            if sib.name in ["h2", "h3"]:
                break
            content.append(sib.get_text())
        sections.append({
            "title": tag.get_text(),
            "body": " ".join(content),
            "length": len(" ".join(content))
        })
    return sections

Placement rules:

  • Header image: 1 image always inserted at the top of the post.
  • Section images: priority given to sections with 200+ characters of body text containing technical keywords (structure, pipeline, architecture, flow, etc.); maximum 2 images.
  • Total image cap: 3 images per post.

5. Rate Limit Management

Nano Banana 2 Tier 1 limits:

Limit Value
RPM (requests per minute) 100
TPM (tokens per minute) 200,000
RPD (requests per day) 1,000

Each image consumes approximately 1,300 tokens (prompt + output + thinking), placing the TPM ceiling at roughly 153 images per minute. The actual bottleneck is RPM 100. With a safety margin applied, intervals are set to 2 seconds between images and 3 seconds between posts.

Bulk processing within the RPD 1,000 limit is achievable within a single day.


6. Blogger API Integration

Generated images are base64-encoded and embedded directly in the Blogger HTML. This eliminates the need for external image hosting and bypasses Blogger's image upload constraints.

def insert_image_to_html(html: str, img_bytes: bytes, position: str) -> str:
    b64 = base64.b64encode(img_bytes).decode()
    img_tag = f'<img src="data:image/png;base64,{b64}" style="width:100%;max-width:800px;" />'
    soup = BeautifulSoup(html, "html.parser")
    if position == "header":
        soup.body.insert(0, BeautifulSoup(img_tag, "html.parser"))
    else:
        target = soup.find("h2", string=position)
        if target:
            target.insert_after(BeautifulSoup(img_tag, "html.parser"))
    return str(soup)

When updating via the Blogger API in batch mode, a 10-second interval between requests is mandatory to avoid rate limit errors.


Lessons Learned

Imagen 3 deprecated: Initial attempts used Imagen 3, which had already been discontinued. Always verify model availability against the latest API documentation before selecting a model.

generate_images vs generate_content confusion: Within the same Google AI SDK, Imagen-family and Gemini-family models have different call structures. Response parsing logic must be branched per model family.

Residual Korean text: Korean terms absent from the translate_title_to_concept mapping table pass through to the prompt and produce corrupted text in the image. The strip_korean function was added as a second-pass safeguard.

Underestimated thinking token cost: Budgeting against the documented price ($0.067) leads to cost overruns against actual spend ($0.08). Thinking token overhead must be factored into budget planning.


Conclusion

Three things drive the Nano Banana 2 auto-image generation pipeline.

  1. API structuregenerate_content and generate_images differ in call signature and response parsing.
  2. Prompt engineering — Two-pass Korean removal and automatic visual theme detection control image quality.
  3. Cost planning — Budget against the measured price ($0.08/img, thinking tokens included).

The pipeline applies identically to new post publishing. The section analysis logic and prompt mapping table are extensible to accommodate additional post types.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System