This chapter is for you if you want to generate images directly from Claude Code, without leaving your terminal. Required level: have Claude Code installed and know how to create a file.

The problem

Claude Code can't generate images natively. You need to plug an external tool into it via an MCP (Model Context Protocol) server. If MCP is new to you, chapter 17 covers the full setup.

AI image generators output stock-photo material by default. To get results that actually look like your brand, you need precise prompts and reproducible workflows.

In this chapter, we'll:

Build an MCP server that connects Gemini Image to Claude Code
Create skills that encode the right prompts
Centralize the techniques in a shared file

Several image-gen MCPs exist (OpenAI, Ideogram, Replicate, Leonardo). We'll go with Gemini through a home-grown MCP called Nano Banana for three reasons: the API is free to start, the Flash Image model is good at photo editing, and coding your own MCP gives you total control over the parameters.

I could have shipped this as an npm package. I'd rather hand you the source code directly: you install your own MCP server at home, you tweak it however you want, zero dependency on my repo.

The simplest way to get through this chapter: skim it, then send the whole thing to Claude Code so it can walk you through how the MCP is built and help you set up your Gemini API key securely.

Step 1, build the "Nano Banana" MCP server

An MCP server is a small program that exposes tools to Claude Code via the MCP protocol. We'll create one that connects to the Gemini Image API.

Prerequisites

A Gemini API key (free at aistudio.google.com)
Node.js installed

Create the project

mkdir -p ~/.claude/mcp-servers/nano-banana/src
cd ~/.claude/mcp-servers/nano-banana

package.json

Create the package.json file:

{
  "name": "nano-banana-mcp",
  "version": "1.0.0",
  "description": "MCP server for Gemini image generation and editing",
  "type": "module",
  "bin": {
    "nano-banana": "./build/index.js"
  },
  "scripts": {
    "build": "tsc && chmod 755 build/index.js"
  },
  "dependencies": {
    "@modelcontextprotocol/sdk": "^1.12.0",
    "zod": "^3.25.0"
  },
  "devDependencies": {
    "@types/node": "^22.0.0",
    "typescript": "^5.7.0"
  }
}

tsconfig.json

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "./build",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules"]
}

src/index.ts, the full code

This is the entire MCP server in a single file (~200 lines). Copy it into src/index.ts:

#!/usr/bin/env node

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import * as fs from "fs";
import * as path from "path";

const GEMINI_API_BASE =
  "https://generativelanguage.googleapis.com/v1beta/models";
const DEFAULT_MODEL = "gemini-3.1-flash-image-preview";

const VALID_ASPECT_RATIOS = [
  "1:1",
  "2:3",
  "3:2",
  "3:4",
  "4:3",
  "4:5",
  "5:4",
  "9:16",
  "16:9",
  "21:9",
] as const;

const VALID_SIZES = ["512", "1K", "2K"] as const;

function getApiKey(): string {
  const key = process.env.GEMINI_API_KEY;
  if (!key) {
    throw new Error(
      "GEMINI_API_KEY not set. Get a key at https://aistudio.google.com/",
    );
  }
  return key;
}

function resolveOutputPath(outputPath: string): string {
  if (path.isAbsolute(outputPath)) return outputPath;
  const cwd = process.env.NANO_BANANA_CWD || process.cwd();
  return path.resolve(cwd, outputPath);
}

const server = new McpServer({
  name: "nano-banana",
  version: "1.0.0",
});

// The full code (with the callGemini, extractImageFromResponse,
// saveImage functions, and the two tools generate_image / edit_image) is in
// the public repo. See the link at the bottom of the chapter.

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("Nano Banana MCP server running on stdio");
}

main().catch((error) => {
  console.error("Fatal error:", error);
  process.exit(1);
});

Build and config

cd ~/.claude/mcp-servers/nano-banana
npm install
npm run build

Then add the server to your Claude Code MCP config (~/.claude/mcp_servers.json):

{
  "nano-banana": {
    "command": "node",
    "args": ["/Users/YOUR_USER/.claude/mcp-servers/nano-banana/build/index.js"],
    "env": {
      "GEMINI_API_KEY": "your-gemini-api-key"
    }
  }
}

Replace /Users/YOUR_USER/ with your home path. And put in your real Gemini API key.

Restart Claude Code. Test it:

> Generate an image of a sunset

If a PNG shows up, you're good.

Step 2, understand the two tools

generate_image

Creates an image from scratch based on a text prompt.

Parameter	Description	Default
`prompt`	Description of the image to make	(required)
`aspect_ratio`	1:1, 4:5, 16:9, 9:16, etc.	1:1
`size`	512, 1K, 2K	1K
`output_path`	Output path for the file	generated-image.png

Use it for: illustrations, moods, landscapes, project visuals.

edit_image

Modifies an existing image with a text prompt. This is the most powerful tool: you can take a real photo of yourself and change the setting around you.

Parameter	Description
`image_path`	Path of the source image to edit
`prompt`	Description of the modifications
`aspect_ratio`	Output ratio
`size`	512, 1K, 2K

Step 3, the secret behind prompts that work

The prompt makes all the difference between a "stock photo" output and something believable.

The trap: the naive prompt

"Photo of a Claude Code workshop with people and a whiteboard"

Result: forced smiles, perfect lighting, symmetry, zero grain. Unusable.

The fix: speak in photo specs

The model understands photography vocabulary. Giving it camera specs changes the result completely:

Shot on Sony A7III with 35mm f/1.8 lens at ISO 1600.
Shallow depth of field, background slightly soft.
Natural window light from the left, soft shadows.
Visible film grain, slight chromatic aberration at frame edges.
Rule of thirds composition, not centered.

Presets by context

Context	Lens	ISO	Light
Coworking interior	35mm f/1.8	1600	Window, soft shadows
Conference	85mm f/2.8	3200	Stage spots, contrasty
Outdoor day	50mm f/2.0	400	Natural, golden hour
Travel/lifestyle	24-50mm f/2.0	400-800	Golden hour, 4500K

The anti-AI block

Add at the end of any photorealistic prompt:

This must look like real candid photography, not AI-generated.
Imperfections: slightly uneven lighting,
natural skin texture with visible pores,
clothes with real fabric folds and wrinkles,
no perfectly symmetrical composition.

Step 4, create the /photo-gaetan skill

This skill takes one of your real pro photos as a base and drops it into a new setting via edit_image.

The workflow

Pick a source photo whose lighting and angle are close to the final result
Build a 4-block prompt:
- Block 1: Subject preservation ("Keep this exact person unchanged")
- Block 2: Scene with imperfections (mismatched chairs, someone on their phone...)
- Block 3: Photo specs (lens, ISO, grain)
- Block 4: Anti-AI (imperfections, asymmetry)
Call edit_image with the right parameters
Review and iterate if needed

Important rule: lighting and angle proximity

The bigger the gap between your source photo and the requested scene, the more the model will alter the face. For better results:

Outdoor golden-hour scene: outdoor source photo with warm light
Indoor office scene: source photo with window light
Conference scene: source photo with contrasty lighting

Full prompt example

Keep this exact person unchanged: face, glasses, hair,
facial hair, clothing, skin tone.
Do not alter any facial features.

Place him in a real workshop setting. He is standing,
turned slightly toward a whiteboard behind him, one hand
raised. The whiteboard is slightly out of focus.

The room is a real French coworking space: mismatched chairs,
a long wooden table with 4-5 attendees, some with stickers
on their laptop lids, one person has a coffee cup, another
is taking notes on paper. Not everyone is paying attention.
Diverse mixed-gender audience.

Shot on Sony A7III with 35mm f/1.8 lens, shallow depth of
field, natural window light from the left, visible grain
at ISO 1600. Rule of thirds, not centered.

Text in the image: never ask for readable text (on a whiteboard, a screen...). The model produces gibberish. Separate the visual (AI) from the text (HTML or overlay).

Step 5, create the /image skill

This skill generates visuals without any identifiable person: article illustrations, SaaS project visuals, carousel backgrounds, YouTube thumbnail backdrops.

Available templates

Template	Use case	Ratio
`--template article`	Article/blog illustration	16:9
`--template project`	Conceptual visual for a SaaS	16:9
`--template carousel-cover`	Slide 1 background	4:5
`--template thumbnail-bg`	YouTube thumbnail background	16:9

Realism tuning for project visuals

The visual has to be appealing without slipping into stock-photo land. Think "photo taken by a happy customer", not "photoshoot by the marketing team".

Add light imperfections: uneven ivy, mismatched chairs, untrimmed grass. Without going full run-down.

Step 6, centralize the techniques

Create an image-techniques.md file that both skills read:

brand/prompts/image-techniques.md

Contains:
→ Brand palette (prompt suffix)
→ Photo presets by context
→ Anti-AI block
→ Templates by visual type
→ Constraints (no text, diversity)

When the model changes, you update one file instead of editing every skill.

The 3 rules after 30 tests

Separate visual and text. AI generates the image, HTML displays the text on top. The model still produces gibberish when asked to write readable words.
Speak in photo specs. Lens, ISO, grain, depth of field. The model understands this vocabulary and the result changes completely.
Start from a real photo for portraits. From scratch never looks close enough. edit_image with a source photo whose lighting matches the final result gives the best output.

Limits to know about - The model can't generate public figures (Gemini blocks this) - Never mention body hair in a prompt (the model adds more instead of less) - Face resemblance is never perfect. The closer the source lighting is to the result, the better - No reproducibility: same prompt = different result. Save the good outputs

Chapter recap 1. Build the MCP server: 3 files (package.json, tsconfig.json, index.ts), npm install && npm run build, add the MCP config 2. Build /photo-gaetan: a skill that uses your real photos as a base and builds 4-block prompts 3. Build /image: a skill with templates for each visual type 4. Centralize in image-techniques.md: a shared file with all the presets and constraints 5. Always separate AI visuals and HTML text: the model can't write