Multimodal | AI Training Hub

Claude Track

Module 17

Claude Track -- Module 17

ShopMate Sees the Products: Maya uploads a product photo and ShopMate writes the description from the image -- no manual brief needed. She also uploads a competitor's catalogue page and gets a comparison report. The product upload workflow that used to take 10 minutes per item now takes 90 seconds.

Multimodal Capabilities

All Claude 4 models support vision input. Send images, screenshots, charts, diagrams, and PDF pages for analysis alongside text.

Image Analysis

Send any image (JPEG, PNG, GIF, WebP) up to 5MB. Claude can describe content, extract text via OCR, identify objects, and answer questions about the visual.

Document Understanding

Pass PDF pages as images for extraction of tables, charts, and structured data. Ideal for invoices, forms, research papers, and technical diagrams.

Chart and Data Extraction

Send screenshots of dashboards or charts and ask Claude to extract underlying data, identify trends, or generate equivalent code to reproduce the visualisation.

UI Screenshot Analysis

Share screenshots of UIs for debugging, accessibility review, or automated testing. Claude can identify elements, suggest improvements, and generate test selectors.

ShopMate -- Photo-to-Description

Python -- shopmate/vision/photo_description.py

# shopmate/vision/photo_description.py
# Maya uploads a product photo -> ShopMate writes the description
import anthropic, base64
from pathlib import Path
client = anthropic.Anthropic()

SYSTEM = """You are a ThreadCo copywriter. When given a product photo:
1. Identify the garment type, colours, print/design, and apparent material
2. Write a 2-sentence product description in ThreadCo's voice:
   - Friendly, direct, never corporate
   - Always mention sustainability
   - No exclamation marks
   - Forbidden words: vibrant, perfect, stylish, must-have"""

def describe_from_photo(image_path: str, extra_info: str = "") -> str:
    """Generate a product description directly from a product photo."""
    data = base64.b64encode(Path(image_path).read_bytes()).decode()
    mime = "image/png" if image_path.endswith(".png") else "image/jpeg"
    resp = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=150,
        system=SYSTEM,
        messages=[{"role":"user", "content":[
            {"type":"image","source":{"type":"base64","media_type":mime,"data":data}},
            {"type":"text", "text": f"Write the ThreadCo product description.{' Extra info: ' + extra_info if extra_info else ''}"}
        ]}]
    )
    return resp.content[0].text

def batch_describe_photos(photo_dir: str) -> list[dict]:
    """Process a folder of product photos in one go."""
    results = []
    for path in Path(photo_dir).glob("*.jpg"):
        desc = describe_from_photo(str(path))
        results.append({"file": path.name, "description": desc})
        print(f"{path.name}: {desc[:80]}...")
    return results

# Usage: Maya drags her product photos into a folder, runs one command
# results = batch_describe_photos("new_season_photos/")

<-- Skills Next: API Usage -->