AI Operations Economics Series (4 parts)

공유 링크 만들기
Facebook
X
Pinterest
이메일
기타 앱

AI Operations Economics Series (4 parts)

Cost, routing, caching, context — production LLM ops decisions

Prerequisites	Coding Agents in Practice (recommended)
Next series	LLM Core Study Series (6 parts)

All parts

1	AI Operations Economics (1/4) — Token Cost Structure and Measurement Pitfalls "Token rate × usage" looks simple, but the actual bill always diverges from that simple…
2	AI Operations Economics (2/4) — Model Routing: The Cost / Quality / Latency Triangle "The most expensive model" is not the answer — over 80% of tasks can hit the same outcom…
3	AI Operations Economics (3/4) — Prompt Caching Guide: 1-hour vs 5-minute Cache Caching is not always savings. It is savings if the hit rate is high enough — otherwise…
4	AI Operations Economics (4/4) — Context Management Patterns: auto-compact, Memory, RAG Cost Comparison Context is cost. There are three ways to shrink it — compress, externalize, or retrieve.

Recommended pace

Each part takes 25–40 minutes on average. One to three parts per week is the sweet spot for retention.

공유 링크 만들기
Facebook
X
Pinterest
이메일
기타 앱

댓글 쓰기

이 블로그의 인기 게시물

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

← 1/10 SQLite Memory Engine f… 📚 Series Index 3/10 memcore Completed → A memory engine that runs without daemons, without dependencies, anywhere 핵심 요약 This post covers how to design an AI agent memory engine (memcore) on a single-file SQLite backend. - Structure: 25 modules, ~6,464 lines, 22 DB tables - Principles: no daemon, single-file portability, no required external dependencies, host-agnostic - Core techniques: 4-layer hybrid search (topic · vector · FTS5 · LIKE), U-tag dialectic-based confidence evolution, ingestion-layer bias rejection rules What This Post Covers One approach to building AI agent memory without a server or cloud vector DB. Specifically: (1) the limitations of text-file-based memory, (2) vector DB alternatives compared and selection criteria, (3) how to layer search tiers, (4) how to evolve confidence scores over time, and (5) how to block biased memory accumulation at the ingestion stage. Design Principles: Why Single-File SQLite Text-file-based ...

← 8/9 Deep Learning Architec… 📚 Series Index (series end) The final part. So far we've focused on models. Now we focus on the tools that actually run them . We'll lay out the philosophical difference between PyTorch and TensorFlow, and trace how Hugging Face transformers , llama.cpp , MLX , and Ollama built the bridge to running large language models on your own machine. By the end you should have the full mental model of "download a pretrained LLM and serve it locally." 0. Learning Objectives Explain the eager-vs-graph difference between PyTorch and TensorFlow. Explain, in graph terms, how autograd automates the backward pass. Use Hugging Face transformers ' AutoModel , AutoTokenizer , and pipeline abstractions. Describe llama.cpp's GGUF quantization, INT4 inference, and CPU-first flow. Describe Apple MLX's use of unified memory and how it differs from PyTorch. Run a local LLM with Ollama and call it through an OpenAI-compatible API. ...

Series overview: Series index 📚 Series Index Without an evaluation set, every RAG improvement is a story, not evidence. RAG systems fail in more than one place: retrieval, context selection, answer generation, and grounding. That means "looks better" is not a metric. Part 14 explains how to build a golden dataset , how RAGAS and DeepEval fit into the workflow, and why teams must separate retrieval quality from answer quality if they want tuning decisions to hold up. 0. Prerequisites Part 13 reranking — system quality now depends on multi-stage ranking. Part 1 grounding — a correct-looking answer is not enough. Part 12 Hybrid — multiple retrieval settings need comparison on fixed data. 1. Learning Objectives Build a small but useful golden dataset for RAG. Distinguish retrieval metrics from answer metrics . Use RAGAS and DeepEval in the right roles. Avoid the common traps in judge-model-based evaluation. 2. 핵심 요약 An evaluation set for RAG is ...

← 7/9 Deep Learning Training 📚 Series Index 9/9 PyTorch vs TensorFlow → What did we keep adding on top of an MLP? Vision went CNN, sequences went RNN/LSTM, and eventually both converged on Attention and the Transformer. Each architecture is easier to remember if you read it as what it gave up to gain something else . This part is the one-line summary of the decisive inflection points: LeNet → AlexNet → ResNet → LSTM → Attention → Transformer. 0. Learning Objectives Explain why CNN's convolution, pooling, and weight sharing match images so well. Trace what got unblocked from LeNet → AlexNet → VGG → ResNet as depth grew. State BPTT for RNNs and the vanishing/exploding gradient problem. Write the LSTM gate equations (forget, input, output) with intuition. Write the attention formula in query/key/value form and the scaled dot-product variant. State the precise reasons Transformers replaced RNNs (parallelism + long-range dependencies). 1. 핵심 요약 CNN : weight sharin...

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

← 6/9 Neural Networks 📚 Series Index 8/9 Deep Learning Architec… → Building an MLP, as we did in Part 6, is not the same as getting it to train well . The right optimizer, regularization, initialization, and learning rate — when these line up, deep networks converge. When they don't, the network refuses to learn at all. This part is the catalog of those crafts, with formulas, paper citations, and code in one place. 0. Learning Objectives Compare and write the update rules for SGD, Momentum, Nesterov, and Adam. Explain how Dropout, BatchNorm, and LayerNorm work and where they belong in a model. Derive the variance formulas for Xavier and He initialization and match them to activations. Implement step, cosine, and warmup learning-rate schedules in PyTorch. Explain why gradient clipping is effectively required for RNNs and Transformers. Diagnose the most common training failures (NaN, plateau, overfitting) and apply first-line fixes. 1. 핵심 요약 SGD : \(w \leftarro...

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

← 1/13 Current Structure Inve… 📚 Series Index 3/13 What Moves to Hermes → A code review record classifying operational scripts into three tiers: preserve, partial-preserve, and discard. What This Post Covers An asset classification framework (preserve / partial-preserve / discard) for migrating a legacy agent system to a successor A concrete case applying the Strangler Fig + Subset Migration strategy to a memory pipeline Classification results evaluated across LOC, external dependencies, and replaceability axes A method for slimming down a bloated single file (1,310 LOC) using the migration as a natural entry point Problem Definition: Port Everything or Cut Deep? The OpenClaw inventory comprises multiple agents, multiple scheduler/daemon processes, and multiple operational scripts. When moving to the successor system, Hermes, the choice is not binary. A full port carries accumulated technical debt along with it; a minimal port discards hard-won domain logic. Code-revie...

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System

← 4/7 My Life Organizer Agent 📚 Series Index 6/7 Dual → Designing a Markdown-Based Auto-Publish Pipeline with Blogger API v3 + OAuth 2.0 Key Summary Blogger API v3 + OAuth 2.0 enables a pipeline that converts Markdown files into publishable HTML posts blogger_publish.py handles the full flow — frontmatter parsing → body cleanup → HTML conversion → CSS injection → API post — in a single script For bulk publishing, quota management (delay, retry, TOC suppression) is the critical design concern Platform Selection Criteria API access is mandatory for AI agent-driven content publishing. A comparison of platforms: Platform API Cost AdSense Decision WordPress.com REST API Paid plan required Paid only Cost overhead Ghost REST API Self-hosted or paid Manual setup Ops overhead Blogger v3 API Free Native integration Adopted Blogger is free, exposes a REST API, and integrates natively with AdSense. For personal blog automation, this combination i...

이 블로그 검색

MaJu Tech Notes