This chapter is for you if you want to generate images directly from Claude Code, without leaving your terminal. Required level: have Claude Code installed and know how to create a file.
The problem
Claude Code can't generate images on its own. You need to plug an external tool into it via an MCP (Model Context Protocol) server.
AI image generators output stock-photo material by default. To get results that actually look like your brand, you need precise prompts and reproducible workflows.
In this chapter, we'll:
- Build an MCP server that connects Gemini Image to Claude Code
- Create skills that encode the right prompts
- Centralize the techniques in a shared file
I could have shipped this as an npm package. I'd rather hand you the source code directly: you install your own MCP server at home, you tweak it however you want, zero dependency on my repo.
The simplest way to get through this chapter: skim it, then send the whole thing to Claude Code so it can walk you through how the MCP is built and help you set up your Gemini API key securely.
Step 1, build the "Nano Banana" MCP server
An MCP server is a small program that exposes tools to Claude Code via the MCP protocol. We'll create one that connects to the Gemini Image API.
Prerequisites
- A Gemini API key (free at aistudio.google.com)
- Node.js installed
Create the project
mkdir -p ~/.claude/mcp-servers/nano-banana/src
cd ~/.claude/mcp-servers/nano-banana
package.json
Create the package.json file:
{
"name": "nano-banana-mcp",
"version": "1.0.0",
"description": "MCP server for Gemini image generation and editing",
"type": "module",
"bin": {
"nano-banana": "./build/index.js"
},
"scripts": {
"build": "tsc && chmod 755 build/index.js"
},
"dependencies": {
"@modelcontextprotocol/sdk": "^1.12.0",
"zod": "^3.25.0"
},
"devDependencies": {
"@types/node": "^22.0.0",
"typescript": "^5.7.0"
}
}
tsconfig.json
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"outDir": "./build",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true
},
"include": ["src/**/*"],
"exclude": ["node_modules"]
}
src/index.ts, the full code
This is the entire MCP server in a single file (~200 lines). Copy it into src/index.ts:
#!/usr/bin/env node
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import * as fs from "fs";
import * as path from "path";
const GEMINI_API_BASE =
"https://generativelanguage.googleapis.com/v1beta/models";
const DEFAULT_MODEL = "gemini-3.1-flash-image-preview";
const VALID_ASPECT_RATIOS = [
"1:1",
"2:3",
"3:2",
"3:4",
"4:3",
"4:5",
"5:4",
"9:16",
"16:9",
"21:9",
] as const;
const VALID_SIZES = ["512", "1K", "2K"] as const;
function getApiKey(): string {
const key = process.env.GEMINI_API_KEY;
if (!key) {
throw new Error(
"GEMINI_API_KEY not set. Get a key at https://aistudio.google.com/",
);
}
return key;
}
function resolveOutputPath(outputPath: string): string {
if (path.isAbsolute(outputPath)) return outputPath;
const cwd = process.env.NANO_BANANA_CWD || process.cwd();
return path.resolve(cwd, outputPath);
}
const server = new McpServer({
name: "nano-banana",
version: "1.0.0",
});
// The full code (with the callGemini, extractImageFromResponse,
// saveImage functions, and the two tools generate_image / edit_image) is in
// the public repo. See the link at the bottom of the chapter.
async function main() {
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("Nano Banana MCP server running on stdio");
}
main().catch((error) => {
console.error("Fatal error:", error);
process.exit(1);
});
Build and config
cd ~/.claude/mcp-servers/nano-banana
npm install
npm run build
Then add the server to your Claude Code MCP config (~/.claude/mcp_servers.json):
{
"nano-banana": {
"command": "node",
"args": ["/Users/YOUR_USER/.claude/mcp-servers/nano-banana/build/index.js"],
"env": {
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
Replace /Users/YOUR_USER/ with your home path. And put in your real
Gemini API key.
Restart Claude Code. Test it:
> Generate an image of a sunset
If a PNG shows up, you're good.
Step 2, understand the two tools
generate_image
Creates an image from scratch based on a text prompt.
| Parameter | Description | Default |
|---|---|---|
prompt | Description of the image to make | (required) |
aspect_ratio | 1:1, 4:5, 16:9, 9:16, etc. | 1:1 |
size | 512, 1K, 2K | 1K |
output_path | Output path for the file | generated-image.png |
Use it for: illustrations, moods, landscapes, project visuals.
edit_image
Modifies an existing image with a text prompt. This is the most powerful tool: you can take a real photo of yourself and change the setting around you.
| Parameter | Description |
|---|---|
image_path | Path of the source image to edit |
prompt | Description of the modifications |
aspect_ratio | Output ratio |
size | 512, 1K, 2K |
Step 3, the secret behind prompts that work
The prompt makes all the difference between a "stock photo" output and something believable.
The trap: the naive prompt
"Photo of a Claude Code workshop with people and a whiteboard"
Result: forced smiles, perfect lighting, symmetry, zero grain. Unusable.
The fix: speak in photo specs
The model understands photography vocabulary. Giving it camera specs changes the result completely:
Shot on Sony A7III with 35mm f/1.8 lens at ISO 1600.
Shallow depth of field, background slightly soft.
Natural window light from the left, soft shadows.
Visible film grain, slight chromatic aberration at frame edges.
Rule of thirds composition, not centered.
Presets by context
| Context | Lens | ISO | Light |
|---|---|---|---|
| Coworking interior | 35mm f/1.8 | 1600 | Window, soft shadows |
| Conference | 85mm f/2.8 | 3200 | Stage spots, contrasty |
| Outdoor day | 50mm f/2.0 | 400 | Natural, golden hour |
| Travel/lifestyle | 24-50mm f/2.0 | 400-800 | Golden hour, 4500K |
The anti-AI block
Add at the end of any photorealistic prompt:
This must look like real candid photography, not AI-generated.
Imperfections: slightly uneven lighting,
natural skin texture with visible pores,
clothes with real fabric folds and wrinkles,
no perfectly symmetrical composition.
Step 4, create the /photo-gaetan skill
This skill takes one of your real pro photos as a base and drops it into a new setting via edit_image.
The workflow
- Pick a source photo whose lighting and angle are close to the final result
- Build a 4-block prompt:
- Block 1: Subject preservation (
"Keep this exact person unchanged") - Block 2: Scene with imperfections (mismatched chairs, someone on their phone...)
- Block 3: Photo specs (lens, ISO, grain)
- Block 4: Anti-AI (imperfections, asymmetry)
- Block 1: Subject preservation (
- Call edit_image with the right parameters
- Review and iterate if needed
Important rule: lighting and angle proximity
The bigger the gap between your source photo and the requested scene, the more the model will alter the face. For better results:
- Outdoor golden-hour scene: outdoor source photo with warm light
- Indoor office scene: source photo with window light
- Conference scene: source photo with contrasty lighting
Full prompt example
Keep this exact person unchanged: face, glasses, hair,
facial hair, clothing, skin tone.
Do not alter any facial features.
Place him in a real workshop setting. He is standing,
turned slightly toward a whiteboard behind him, one hand
raised. The whiteboard is slightly out of focus.
The room is a real French coworking space: mismatched chairs,
a long wooden table with 4-5 attendees, some with stickers
on their laptop lids, one person has a coffee cup, another
is taking notes on paper. Not everyone is paying attention.
Diverse mixed-gender audience.
Shot on Sony A7III with 35mm f/1.8 lens, shallow depth of
field, natural window light from the left, visible grain
at ISO 1600. Rule of thirds, not centered.
Text in the image: never ask for readable text (on a whiteboard, a screen...). The model produces gibberish. Separate the visual (AI) from the text (HTML or overlay).
Step 5, create the /image skill
This skill generates visuals without any identifiable person: article illustrations, SaaS project visuals, carousel backgrounds, YouTube thumbnail backdrops.
Available templates
| Template | Use case | Ratio |
|---|---|---|
--template article | Article/blog illustration | 16:9 |
--template project | Conceptual visual for a SaaS | 16:9 |
--template carousel-cover | Slide 1 background | 4:5 |
--template thumbnail-bg | YouTube thumbnail background | 16:9 |
Realism tuning for project visuals
The visual has to be appealing without slipping into stock-photo land. Think "photo taken by a happy customer", not "photoshoot by the marketing team".
Add light imperfections: uneven ivy, mismatched chairs, untrimmed grass. Without going full run-down.
Step 6, centralize the techniques
Create an image-techniques.md file that both skills read:
brand/prompts/image-techniques.md
Contains:
→ Brand palette (prompt suffix)
→ Photo presets by context
→ Anti-AI block
→ Templates by visual type
→ Constraints (no text, diversity)
When the model changes, you update one file instead of editing every skill.
The 3 rules after 30 tests
- Separate visual and text. AI generates the image, HTML displays the text on top. The model can't write in French.
- Speak in photo specs. Lens, ISO, grain, depth of field. The model understands this vocabulary and the result changes completely.
- Start from a real photo for portraits. From scratch never looks close enough. edit_image with a source photo whose lighting matches the final result gives the best output.
Limits to know about - The model can't generate public figures (Gemini blocks this) - Never mention body hair in a prompt (the model adds more instead of less) - Face resemblance is never perfect. The closer the source lighting is to the result, the better - No reproducibility: same prompt = different result. Save the good outputs
Chapter recap 1. Build the MCP server: 3 files (package.json,
tsconfig.json, index.ts), npm install && npm run build, add the MCP config
2. Build /photo-gaetan: a skill that uses your real photos as a base and
builds 4-block prompts 3. Build /image: a skill with templates for each
visual type 4. Centralize in image-techniques.md: a shared file with all
the presets and constraints 5. Always separate AI visuals and HTML text:
the model can't write