Gaëtan Wittebolle.FR
← guides
Claude Code, the guide
Chapter 15 / 16

Part 1 · Foundations

  • 01What is Claude Code?
  • 02Step-by-step installation
  • 03CLAUDE.md, the brain of your project

Part 2 · Communicate and plan

  • 04How to talk to Claude Code
  • 05Plan before coding
  • 06The daily work cycle

Part 3 · Advanced tools

  • 07Specialized agents
  • 08The shortcuts that speed everything up
  • 09Permissions

Part 4 · Mastery

  • 10Memory between sessions
  • 11Case study, a feature from A to Z
  • 12The 11 fatal mistakes to avoid

Part 5 · Going to production

  • 13The checklists
  • 14Setting up infrastructure

Part 6 · Bonus, a power user's setup

  • 15Generating images from Claude Code
  • 16My complete setup

Part 6 · Bonus, a power user's setup

Generating images from Claude Code

Chapter 15 · 15 min reading

🎯

This chapter is for you if you want to generate images directly from Claude Code, without leaving your terminal. Required level: have Claude Code installed and know how to create a file.

The problem

Claude Code can't generate images on its own. You need to plug an external tool into it via an MCP (Model Context Protocol) server.

AI image generators output stock-photo material by default. To get results that actually look like your brand, you need precise prompts and reproducible workflows.

In this chapter, we'll:

  1. Build an MCP server that connects Gemini Image to Claude Code
  2. Create skills that encode the right prompts
  3. Centralize the techniques in a shared file

I could have shipped this as an npm package. I'd rather hand you the source code directly: you install your own MCP server at home, you tweak it however you want, zero dependency on my repo.

The simplest way to get through this chapter: skim it, then send the whole thing to Claude Code so it can walk you through how the MCP is built and help you set up your Gemini API key securely.

Step 1, build the "Nano Banana" MCP server

An MCP server is a small program that exposes tools to Claude Code via the MCP protocol. We'll create one that connects to the Gemini Image API.

Prerequisites

  • A Gemini API key (free at aistudio.google.com)
  • Node.js installed

Create the project

mkdir -p ~/.claude/mcp-servers/nano-banana/src
cd ~/.claude/mcp-servers/nano-banana

package.json

Create the package.json file:

{
  "name": "nano-banana-mcp",
  "version": "1.0.0",
  "description": "MCP server for Gemini image generation and editing",
  "type": "module",
  "bin": {
    "nano-banana": "./build/index.js"
  },
  "scripts": {
    "build": "tsc && chmod 755 build/index.js"
  },
  "dependencies": {
    "@modelcontextprotocol/sdk": "^1.12.0",
    "zod": "^3.25.0"
  },
  "devDependencies": {
    "@types/node": "^22.0.0",
    "typescript": "^5.7.0"
  }
}

tsconfig.json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "./build",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules"]
}

src/index.ts, the full code

This is the entire MCP server in a single file (~200 lines). Copy it into src/index.ts:

#!/usr/bin/env node

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import * as fs from "fs";
import * as path from "path";

const GEMINI_API_BASE =
  "https://generativelanguage.googleapis.com/v1beta/models";
const DEFAULT_MODEL = "gemini-3.1-flash-image-preview";

const VALID_ASPECT_RATIOS = [
  "1:1",
  "2:3",
  "3:2",
  "3:4",
  "4:3",
  "4:5",
  "5:4",
  "9:16",
  "16:9",
  "21:9",
] as const;

const VALID_SIZES = ["512", "1K", "2K"] as const;

function getApiKey(): string {
  const key = process.env.GEMINI_API_KEY;
  if (!key) {
    throw new Error(
      "GEMINI_API_KEY not set. Get a key at https://aistudio.google.com/",
    );
  }
  return key;
}

function resolveOutputPath(outputPath: string): string {
  if (path.isAbsolute(outputPath)) return outputPath;
  const cwd = process.env.NANO_BANANA_CWD || process.cwd();
  return path.resolve(cwd, outputPath);
}

const server = new McpServer({
  name: "nano-banana",
  version: "1.0.0",
});

// The full code (with the callGemini, extractImageFromResponse,
// saveImage functions, and the two tools generate_image / edit_image) is in
// the public repo. See the link at the bottom of the chapter.

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("Nano Banana MCP server running on stdio");
}

main().catch((error) => {
  console.error("Fatal error:", error);
  process.exit(1);
});

Build and config

cd ~/.claude/mcp-servers/nano-banana
npm install
npm run build

Then add the server to your Claude Code MCP config (~/.claude/mcp_servers.json):

{
  "nano-banana": {
    "command": "node",
    "args": ["/Users/YOUR_USER/.claude/mcp-servers/nano-banana/build/index.js"],
    "env": {
      "GEMINI_API_KEY": "your-gemini-api-key"
    }
  }
}
⚠️

Replace /Users/YOUR_USER/ with your home path. And put in your real Gemini API key.

Restart Claude Code. Test it:

> Generate an image of a sunset

If a PNG shows up, you're good.

Step 2, understand the two tools

generate_image

Creates an image from scratch based on a text prompt.

ParameterDescriptionDefault
promptDescription of the image to make(required)
aspect_ratio1:1, 4:5, 16:9, 9:16, etc.1:1
size512, 1K, 2K1K
output_pathOutput path for the filegenerated-image.png

Use it for: illustrations, moods, landscapes, project visuals.

edit_image

Modifies an existing image with a text prompt. This is the most powerful tool: you can take a real photo of yourself and change the setting around you.

ParameterDescription
image_pathPath of the source image to edit
promptDescription of the modifications
aspect_ratioOutput ratio
size512, 1K, 2K

Step 3, the secret behind prompts that work

The prompt makes all the difference between a "stock photo" output and something believable.

The trap: the naive prompt

"Photo of a Claude Code workshop with people and a whiteboard"

Result: forced smiles, perfect lighting, symmetry, zero grain. Unusable.

The fix: speak in photo specs

The model understands photography vocabulary. Giving it camera specs changes the result completely:

Shot on Sony A7III with 35mm f/1.8 lens at ISO 1600.
Shallow depth of field, background slightly soft.
Natural window light from the left, soft shadows.
Visible film grain, slight chromatic aberration at frame edges.
Rule of thirds composition, not centered.

Presets by context

ContextLensISOLight
Coworking interior35mm f/1.81600Window, soft shadows
Conference85mm f/2.83200Stage spots, contrasty
Outdoor day50mm f/2.0400Natural, golden hour
Travel/lifestyle24-50mm f/2.0400-800Golden hour, 4500K

The anti-AI block

Add at the end of any photorealistic prompt:

This must look like real candid photography, not AI-generated.
Imperfections: slightly uneven lighting,
natural skin texture with visible pores,
clothes with real fabric folds and wrinkles,
no perfectly symmetrical composition.

Step 4, create the /photo-gaetan skill

This skill takes one of your real pro photos as a base and drops it into a new setting via edit_image.

The workflow

  1. Pick a source photo whose lighting and angle are close to the final result
  2. Build a 4-block prompt:
    • Block 1: Subject preservation ("Keep this exact person unchanged")
    • Block 2: Scene with imperfections (mismatched chairs, someone on their phone...)
    • Block 3: Photo specs (lens, ISO, grain)
    • Block 4: Anti-AI (imperfections, asymmetry)
  3. Call edit_image with the right parameters
  4. Review and iterate if needed

Important rule: lighting and angle proximity

The bigger the gap between your source photo and the requested scene, the more the model will alter the face. For better results:

  • Outdoor golden-hour scene: outdoor source photo with warm light
  • Indoor office scene: source photo with window light
  • Conference scene: source photo with contrasty lighting

Full prompt example

Keep this exact person unchanged: face, glasses, hair,
facial hair, clothing, skin tone.
Do not alter any facial features.

Place him in a real workshop setting. He is standing,
turned slightly toward a whiteboard behind him, one hand
raised. The whiteboard is slightly out of focus.

The room is a real French coworking space: mismatched chairs,
a long wooden table with 4-5 attendees, some with stickers
on their laptop lids, one person has a coffee cup, another
is taking notes on paper. Not everyone is paying attention.
Diverse mixed-gender audience.

Shot on Sony A7III with 35mm f/1.8 lens, shallow depth of
field, natural window light from the left, visible grain
at ISO 1600. Rule of thirds, not centered.
⚠️

Text in the image: never ask for readable text (on a whiteboard, a screen...). The model produces gibberish. Separate the visual (AI) from the text (HTML or overlay).

Step 5, create the /image skill

This skill generates visuals without any identifiable person: article illustrations, SaaS project visuals, carousel backgrounds, YouTube thumbnail backdrops.

Available templates

TemplateUse caseRatio
--template articleArticle/blog illustration16:9
--template projectConceptual visual for a SaaS16:9
--template carousel-coverSlide 1 background4:5
--template thumbnail-bgYouTube thumbnail background16:9

Realism tuning for project visuals

The visual has to be appealing without slipping into stock-photo land. Think "photo taken by a happy customer", not "photoshoot by the marketing team".

Add light imperfections: uneven ivy, mismatched chairs, untrimmed grass. Without going full run-down.

Step 6, centralize the techniques

Create an image-techniques.md file that both skills read:

brand/prompts/image-techniques.md

Contains:
→ Brand palette (prompt suffix)
→ Photo presets by context
→ Anti-AI block
→ Templates by visual type
→ Constraints (no text, diversity)

When the model changes, you update one file instead of editing every skill.

The 3 rules after 30 tests

  1. Separate visual and text. AI generates the image, HTML displays the text on top. The model can't write in French.
  2. Speak in photo specs. Lens, ISO, grain, depth of field. The model understands this vocabulary and the result changes completely.
  3. Start from a real photo for portraits. From scratch never looks close enough. edit_image with a source photo whose lighting matches the final result gives the best output.
⚠️

Limits to know about - The model can't generate public figures (Gemini blocks this) - Never mention body hair in a prompt (the model adds more instead of less) - Face resemblance is never perfect. The closer the source lighting is to the result, the better - No reproducibility: same prompt = different result. Save the good outputs

✅

Chapter recap 1. Build the MCP server: 3 files (package.json, tsconfig.json, index.ts), npm install && npm run build, add the MCP config 2. Build /photo-gaetan: a skill that uses your real photos as a base and builds 4-block prompts 3. Build /image: a skill with templates for each visual type 4. Centralize in image-techniques.md: a shared file with all the presets and constraints 5. Always separate AI visuals and HTML text: the model can't write

← Chapter 14

Setting up infrastructure

Chapter 16 →

My complete setup