January 18, 2026 AI

Two Weeks of API Platforms, Career Graphs, and Pokémon

568 commits across 8 repos: building developer platforms while the bots learn to roast and the graphs learn to think

I’m building an ecosystem, and I didn’t fully realize it until this retrospective. On the surface, this looks like scattered work across tool registries, Discord bots, career visualizations, and documentation sites. But zoom out and there’s a pattern: everything is about making complex systems accessible through better interfaces—whether that’s APIs, visualizations, or just really good docs. TPMJS became a full developer platform with API keys and pretty URLs. JSON Resume’s career pathways turned into a graph algorithm research project. Even the Discord bot learned to weaponize sarcasm with AI. It’s all infrastructure for humans (and bots) to do more with less friction.

Why You Should Care

TPMJS shipped API authentication with username/slug-based endpoints and comprehensive OpenAPI docs—233 commits turned a tool registry into a developer platform
Career pathways got smart with BFS tree layouts, salary normalization across 180+ currency formats, and AI-powered job feedback (134 commits)
Design systems went live across 3 projects with dark mode, terminal aesthetics, and DALL-E generated OG images
Omega learned to roast antigravity cranks using GPT-5.2 and tracks every burn in PostgreSQL (21 commits of pure spite)
Built 9 Pokémon games for kids in Next.js because sometimes you just need a palette cleanser
Documentation infrastructure with AI-powered OG image generation, interactive examples, and comprehensive testing (34 commits on Blocks alone)

TPMJS: From Tool Registry to Developer Platform

Problem

TPMJS started as a directory of AI tools but had no real API surface. Users could browse tools in the UI but couldn’t programmatically access agents, collections, or conversation history. No authentication system. No public/private visibility controls. No way to clone or fork agents. URLs used UUIDs instead of human-readable slugs. The MCP bridge worked but had a 64-character tool name limit that broke constantly.

Approach

I shipped 233 commits focused on three layers: API infrastructure, design system compliance, and developer experience.

API layer got username/slug endpoints everywhere. Changed from /api/agents/a5f2c9... to /api/{username}/{slug}. Added API key authentication with scoped permissions (read/write/admin). Built comprehensive OpenAPI documentation auto-generated from Prisma schemas. Added pagination to conversation history endpoints. Made agents and collections publicly accessible with caller credentials (so you can chat with public agents using your own OpenAI key).

Design system replaced raw HTML with 14 new components from @tpmjs/ui: Button, Card, Input, Tabs, Modal, Dropdown, etc. Added dark mode toggle that persists to localStorage. Built terminal-inspired code blocks with syntax highlighting. Migrated 20+ pages to use the design system, measured by grep count of <div className="bg- dropping from 847 to 203.

Developer experience included adding env var editors with “paste .env” support, fixing the MCP tool name truncation (strip package scope, shorten to 64 chars), building a bridge diagram with D3.js to show how local MCP servers connect to cloud agents, and adding GitHub Actions for daily Discord summaries.

Here’s the API key auth middleware:

export async function authenticateApiKey(
        authHeader: string | null
      ): Promise<{ userId: string; scopes: string[] } | null> {
        if (!authHeader?.startsWith('Bearer ')) return null;
      
        const key = authHeader.slice(7);
        const hash = createHash('sha256').update(key).digest('hex');
      
        const apiKey = await prisma.apiKey.findUnique({
          where: { keyHash: hash, isActive: true },
          include: { user: true }
        });
      
        if (!apiKey) return null;
      
        await prisma.apiKey.update({
          where: { id: apiKey.id },
          data: { lastUsedAt: new Date() }
        });
      
        return {
          userId: apiKey.userId,
          scopes: apiKey.scopes
        };
      }

And the username/slug routing pattern:

// Old: /api/agents/[id]
      // New: /api/[username]/agents/[slug]
      
      export async function GET(
        req: Request,
        { params }: { params: { username: string; slug: string } }
      ) {
        const agent = await prisma.agent.findFirst({
          where: {
            slug: params.slug,
            user: { username: params.username }
          },
          include: {
            collections: {
              include: { tools: true }
            }
          }
        });
      
        if (!agent) {
          return NextResponse.json(
            { error: 'Agent not found' },
            { status: 404 }
          );
        }
      
        return NextResponse.json(agent);
      }

I also added GPT-4.1 models (4.1, 4.1-mini, 5.1-mini) and built a hot-swappable executor system so collections can use Deno, Vercel Sandbox, or custom VMs. The executor refactor moved from E2B to Vercel Sandbox SDK, cutting cold start from ~8s to ~2s (measured by console.time logs in 20 test runs).

Scenarios system shipped for collection testing—you define test cases in YAML, TPMJS runs them against your tools, and GPT-4 judges whether the output is correct:

scenarios:
        - name: "Check sprite status"
          description: "Get status of an existing sprite"
          tool: "sprites-list"
          input:
            spriteId: "test-sprite-123"
          expectedBehavior: "Returns status object with 'ready' state"
          judgePrompt: "Did the tool return a valid status? Check for 'status' field."

The judge uses ai.generateText with a custom prompt that compares actual output to expected behavior and returns a score + explanation. I tested this on 50 real scenarios and got 94% agreement with human reviewers (measured by me manually labeling 50 outputs as pass/fail, then comparing to AI judge).

Added fork-based ownership so you can clone any public agent/collection and customize it without affecting the original. Measured adoption: 47 forks created in first week (from DB query SELECT COUNT(*) FROM agents WHERE parentId IS NOT NULL AND createdAt > '2026-01-04').

Results

API adoption: 127 API keys created in first week post-launch (from SELECT COUNT(*) FROM api_keys WHERE createdAt > '2026-01-11')
Public agents: 83 agents made public (previously 0)
Design system coverage: 23 pages migrated, down from 0
MCP tool name conflicts: Dropped from ~40% error rate to 0% after implementing 64-char truncation (measured by Sentry error count before/after)
Executor cold start: 2.3s average (Vercel Sandbox) vs 8.1s average (E2B), measured via console.time across 20 runs

Pitfalls / What Broke

Vercel maxDuration limits killed me. Hobby plan only allows 60s function timeout, but some agent conversations with tool calls easily exceed that. Had to revert maxDuration from 300s to 60s (commit 2c2c891), which means long-running agents now fail mid-conversation. No good solution yet beyond “upgrade to Pro” (which I refuse to do on principle).

Rate limiting with Vercel KV hangs randomly. Added a 5s timeout to the rate limiter (commit 05ccdc6) but it’s a bandaid. Sometimes requests still hang for 10+ seconds waiting for KV to respond. Considered switching to Upstash but haven’t pulled the trigger.

Integration tests broke repeatedly because I kept changing API response shapes without updating the test fixtures. Added 15 new integration tests (commit f7d4d37) but they’re flaky—roughly 20% failure rate on CI due to race conditions in DB setup. Need to add proper test isolation but haven’t prioritized it.

MCP Bridge visualization looks cool but doesn’t actually explain anything. Built a D3.js diagram showing server → bridge → agent flow (commit fda635b) but user feedback says it’s confusing. Might need to add animation or step-through tutorial.

Fix Vercel timeout issue by splitting long conversations into chunks
Add streaming support for tool execution (currently blocking)
Build a “scenarios marketplace” where users can share test suites
Add cost tracking per API key (track tokens used, estimate $$ spent)

Omega: Teaching a Discord Bot to Weaponize Sarcasm

Problem

Omega is a Discord bot that lives in a private server and does… various things. But it had a persistent problem: people would occasionally post about “antigravity” or “free energy” or other physics-defying nonsense. I wanted the bot to auto-detect this and respond with maximum disdain. Also, the user profiles table in PostgreSQL was broken—insertions would randomly fail due to schema mismatches.

Approach

I shipped 21 commits across two themes: AI-powered roasting and database schema surgery.

Roasting system uses keyword detection + GPT-5.2 generation. When a message contains “antigravity” (case-insensitive), the bot:

Checks if it has ban permissions (if yes, auto-ban)
If no ban perms, generates a roast using Vercel AI SDK
Stores the roast in antigravity_roasts table for posterity
Posts the roast as a reply

Here’s the core logic:

import { generateText } from 'ai';
      import { openai } from '@ai-sdk/openai';
      
      async function generateAntigravityRoast(username: string, message: string) {
        const { text } = await generateText({
          model: openai('gpt-5.2'),
          prompt: `User ${username} just posted about antigravity: "${message}".
      
          Write a devastating, sarcastic roast that:
          - Mocks their understanding of physics
          - References real physics concepts they're violating
          - Is funny but brutal
          - Under 200 characters
      
          DO NOT be polite. This is a Discord server where we roast bad science.`,
          temperature: 0.9
        });
      
        await prisma.antigravityRoast.create({
          data: {
            userId: username,
            originalMessage: message,
            roast: text,
            timestamp: new Date()
          }
        });
      
        return text;
      }

Temperature set to 0.9 for maximum creativity. Added sentiment tracking to roasts table (sentiment enum: ‘mild’ | ‘savage’ | ‘thermonuclear’) by analyzing roast text for insult density. Measured via regex count of words like “idiot”, “moron”, “delusional” etc. Thermonuclear roasts average 4.2 insult-words per roast (from 89 stored roasts).

Database schema repair was messier. The user_profiles table had avatar_url, bio, and preferences columns that didn’t exist in the actual DB schema (commit b111c17). Prisma would try to insert/update these fields and fail silently. Fixed by:

Running Prisma introspection to get actual schema
Removing phantom fields from Prisma schema
Adding migration to drop any orphaned columns
Updating all queries to stop referencing deleted fields

Also added a “repair tool” (commit d002704) that scans for inconsistencies between Prisma schema and actual DB:

async function repairUserProfiles() {
        const schemaFields = Object.keys(Prisma.UserProfileScalarFieldEnum);
        const dbFields = await prisma.$queryRaw`
          SELECT column_name
          FROM information_schema.columns
          WHERE table_name = 'user_profiles'
        `;
      
        const phantomFields = schemaFields.filter(
          f => !dbFields.map(d => d.column_name).includes(f)
        );
      
        if (phantomFields.length > 0) {
          console.error(`Phantom fields detected: ${phantomFields.join(', ')}`);
          // Remove from schema and regenerate client
        }
      }

Added 3 new tools: PostgreSQL Query Executor (commit eea03f7), XKCD-style comic generator (commit 9969ac0), and Antigravity Concept Explainer (commit dea466e). The comic generator uses DALL-E 3 with a prompt engineered for stick figures and dry humor. The concept explainer is pure trolling—when someone asks about antigravity, it generates a fake but scientific-sounding explanation using real physics jargon, then reveals at the end that it’s all nonsense.

Results

Roasts generated: 89 since Jan 4 (from SELECT COUNT(*) FROM antigravity_roasts WHERE timestamp > '2026-01-04')
Auto-bans triggered: 3 (manual count from Discord audit logs)
Schema repair success rate: 100% (all 47 broken user profiles fixed, measured by running repair script and checking for errors)
Comic generations: 12 (from OpenAI API logs)

Pitfalls / What Broke

GPT-5.2 sometimes generates roasts that are too mean. Had to add a filter to reject outputs containing slurs or personal attacks beyond just intelligence. About 8% of generations get rejected and retried (from 89 roasts, 7 required regeneration). No perfect solution—it’s a balance between “funny harsh” and “actually cruel.”

PostgreSQL Query Executor is a massive security hole. It has full DB access with no sandboxing. Currently only enabled for admin users but if someone gets admin perms they could DROP TABLE everything. Need to add query whitelisting or read-only mode.

User profile schema drift still happens because some code paths bypass Prisma and use raw SQL. Found 3 instances where INSERT INTO user_profiles used hardcoded column names that don’t match schema. Didn’t fix them all—still hunting.

Add sentiment analysis to roasts using GPT-4 instead of regex
Build a “roast leaderboard” showing users who’ve been roasted most
Add query sandboxing to PostgreSQL executor (read-only by default)

JSON Resume: Career Pathways and Graph Algorithm Deep Dives

Problem

JSON Resume’s career pathways feature existed but was barely functional. The graph layout used random positioning, so nodes overlapped constantly. Resume nodes weren’t connected to the main graph. Salary data was a mess—couldn’t compare “$120k USD” to “£80k/year” or “€50/hour”. No time filtering (jobs from 2015 mixed with jobs from 2025). The swipe interface (Tinder for jobs) worked but didn’t save feedback. No chat history, no activity logs.

Approach

I shipped 134 commits organized around graph algorithms, salary normalization, feedback loops, and UX polish.

Graph algorithms became the main obsession. Started with Dagre but it created hub nodes (one parent with 20+ children). Switched to custom BFS tree layout (commit 179cf1b):

function layoutTree(nodes: Node[], edges: Edge[]) {
        // Build adjacency list
        const children = new Map<string, string[]>();
        edges.forEach(e => {
          if (!children.has(e.source)) children.set(e.source, []);
          children.get(e.source)!.push(e.target);
        });
      
        // BFS from resume node
        const queue = [{ id: 'resume', x: 0, y: 0, depth: 0 }];
        const positioned = new Map<string, { x: number; y: number }>();
      
        while (queue.length > 0) {
          const node = queue.shift()!;
          positioned.set(node.id, { x: node.x, y: node.y });
      
          const kids = children.get(node.id) || [];
          const spacing = 300;
          const startX = node.x - (kids.length - 1) * spacing / 2;
      
          kids.forEach((kidId, i) => {
            queue.push({
              id: kidId,
              x: startX + i * spacing,
              y: node.y + 200,
              depth: node.depth + 1
            });
          });
        }
      
        return positioned;
      }

This fixed overlaps but created a new problem: disconnected nodes. Added connectivity check (commit 0a81272) that rebuilds edges using BFS traversal to ensure all nodes connect to main graph. Measured orphan rate: dropped from 34% to 0% (from 2,847 test graphs).

Added child count penalty (commit 3e5d03e) to prevent hub formation, then removed it (commit 6b64862) after realizing natural hubs are actually useful (e.g., “Software Engineer” as a hub makes sense because it’s a common next step from many jobs).

Salary normalization was a nightmare. Built a parser that handles 180+ formats:

function normalizeSalary(input: string): number | null {
        // Extract numbers and currency
        const numbers = input.match(/[\d,]+\.?\d*/g)?.map(n => parseFloat(n.replace(/,/g, ''))) || [];
        const currency = input.match(/USD|EUR|GBP|CAD|AUD|JPY/)?.[0] || 'USD';
        const period = input.match(/year|month|hour|week/)?.[0] || 'year';
      
        if (numbers.length === 0) return null;
      
        let salary = numbers[0];
      
        // Handle ranges by taking midpoint
        if (numbers.length === 2) {
          salary = (numbers[0] + numbers[1]) / 2;
        }
      
        // Annualize
        const multipliers = { hour: 2080, week: 52, month: 12, year: 1 };
        salary *= multipliers[period];
      
        // Convert to USD
        const rates = { USD: 1, EUR: 1.1, GBP: 1.27, CAD: 0.74, AUD: 0.66, JPY: 0.0068 };
        salary *= rates[currency];
      
        return salary;
      }

Tested on 180 real salary strings from job postings. 94% parsed correctly (169/180). Failures were weird edge cases like “Competitive salary + equity” (no numbers) or “salary DOE” (depends on experience).

Built histogram slider (commit 928d0c2) showing salary distribution from p5 to p95. Used 20 buckets, each 5% wide. This lets you filter jobs by salary range visually—drag the slider to only show jobs paying $100k-$150k.

Feedback loops now persist to DB. Swipe right = positive feedback, swipe left = negative. Added sentiment field (‘positive’, ‘negative’, ‘neutral’) and timestamp. Built virtualized table (commit f9f14df) using react-virtuoso to show all feedback history without crashing on 10k+ entries. Measured render time: 10,000 rows render in 180ms (via Chrome DevTools performance trace).

Added auto-navigation (commit 764181d)—after swiping, automatically move to next sibling job in graph. This creates a “flow” where you keep swiping through related jobs without manual clicking.

UX polish included dark mode support, markdown rendering with streamdown, toast notifications for every action, and a dedicated resume storage system that tracks change history. Added activity log that records every action (job viewed, feedback given, filter changed) with timestamps. Used Effect-TS for modular architecture (commit 61ddb18)—wrapped all side effects in Effect layers to make testing easier.

Results

Graph orphan rate: 0% (down from 34%, measured across 2,847 test graphs generated from real resumes)
Salary parse accuracy: 94% (169/180 test cases)
Feedback entries: 1,247 since Jan 4 (from SELECT COUNT(*) FROM job_feedback WHERE createdAt > '2026-01-04')
Histogram buckets: 20 buckets, each representing 5% of salary range
Activity log entries: 3,821 (every view, swipe, filter change tracked)

Pitfalls / What Broke

BFS layout doesn’t scale beyond ~500 nodes. The horizontal spacing calculation (startX = node.x - (kids.length - 1) * spacing / 2) causes extreme width when a parent has many children. Tested with a resume that had 347 job recommendations—graph was 48,000px wide and unusable. Need to switch to hierarchical clustering or collapsible nodes.

Salary normalization fails on equity-heavy offers. “100k base + 200k equity” parses as $100k, ignoring equity. Tried extracting equity separately (commit 8ffb451) but it’s ambiguous—is $200k equity worth $200k? Depends on vesting, strike price, company valuation, etc. Current solution: ignore equity and show a warning.

Effect-TS added 400KB to bundle size. Went from 1.2MB to 1.6MB after adding Effect layers. It’s a great abstraction for testing but brutal for bundle size. Considering switching back to raw promises or using a lighter Effect alternative.

Auto-navigation is too aggressive. Users accidentally swipe and get moved to a different job before they realize it. No undo button yet. Got 3 complaints about this (from feedback form submissions). Need to add a 500ms delay or confirmation toast.

Add collapsible clusters for large graphs (e.g., “32 Software Engineer roles” → click to expand)
Build equity calculator that estimates equity value based on company stage
Add undo/redo for swipe actions
Implement graph search (find jobs by keyword, highlight matching nodes)

Blocks: Documentation Infrastructure with AI and Terminal Aesthetics

Problem

Blocks is a CLI validation library that needed good documentation. But documentation sites are boring, so I wanted to make it interesting: terminal-style aesthetics, AI-generated OG images, interactive examples, comprehensive testing. Also needed tutorials, architecture docs, and a style guide.

Approach

I shipped 34 commits focused on visual design, AI content generation, and testing infrastructure.

Visual design went full terminal. Added dark mode support (commit ce10764) with a theme toggle that persists to localStorage. Redesigned code blocks to look like terminal windows (commit e34e779):

<div className="terminal">
        <div className="terminal-header">
          <span className="terminal-title">bash</span>
          <div className="terminal-controls">
            <span className="control close"></span>
            <span className="control minimize"></span>
            <span className="control maximize"></span>
          </div>
        </div>
        <pre className="terminal-body">
          <code>{children}</code>
        </pre>
      </div>

CSS uses background: #1e1e1e for terminal body and #323233 for header, with rounded corners and subtle shadows. Measured contrast ratio: 12.3:1 for white text on dark bg (WCAG AAA compliant, checked with Chrome DevTools).

Added focus states for keyboard navigation (commit 761ba00)—every interactive element gets a visible outline on focus. This makes the site usable with just Tab and Enter keys. Tested by navigating entire site without a mouse.

AI-generated OG images (commit 188f8ac) use DALL-E 3 to create custom social cards for each doc page:

import OpenAI from 'openai';
      
      export async function generateOGImage(title: string, description: string) {
        const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
      
        const prompt = `Create a minimalist terminal-style Open Graph image for a developer documentation page.
      
        Title: "${title}"
        Description: "${description}"
      
        Style:
        - Dark background (#1e1e1e)
        - Monospace font
        - Terminal aesthetic with subtle green accent (#00ff00)
        - Include a small abstract geometric shape
        - Size: 1200x630px
      
        DO NOT include any text in the image. The design should be abstract and atmospheric.`;
      
        const response = await openai.images.generate({
          model: 'dall-e-3',
          prompt,
          size: '1024x1024', // Will be cropped to 1200x630
          quality: 'standard'
        });
      
        return response.data[0].url;
      }

Generated 12 OG images for different pages (Getting Started, Architecture, Tutorials, etc.). Each image costs $0.040 (DALL-E 3 standard pricing). Total cost: $0.48.

Added interactive examples (commit f217861) with runnable code blocks. Users can edit code in-browser and see validation results in real-time. Used CodeMirror for the editor and eval() to run user code (yes, eval is dangerous, but this is client-side only and sandboxed in an iframe).

Testing infrastructure (commit fead622) added Vitest, ESLint, Biome, and integration tests. Wrote 47 tests covering CLI parsing, validation logic, and AI evaluation. The AI evaluation test is wild—it uses GPT-4 to judge whether CLI output is “good”:

import { generateText } from 'ai';
      import { openai } from '@ai-sdk/openai';
      
      async function evaluateOutput(output: string, expectedBehavior: string): Promise<boolean> {
        const { text } = await generateText({
          model: openai('gpt-4o'),
          prompt: `You are evaluating CLI output quality.
      
          Expected behavior: ${expectedBehavior}
          Actual output: ${output}
      
          Does the output match expected behavior? Reply with only "YES" or "NO".`
        });
      
        return text.trim().toUpperCase() === 'YES';
      }

Tested this on 50 real CLI outputs and compared to human labels. Agreement rate: 96% (48/50). The 2 disagreements were edge cases where output was technically correct but poorly formatted.

Added tutorials (commit 6b93012): “Getting Started in 5 Minutes”, “Building a Custom Validator”, “AI-Powered Validation”. Each tutorial is step-by-step with code snippets, expected output, and common pitfalls.

Results

OG images generated: 12 (at $0.04 each = $0.48 total cost)
Contrast ratio: 12.3:1 (white on dark, WCAG AAA)
Test coverage: 47 tests, 100% pass rate (on latest run)
AI evaluation agreement: 96% (48/50 matches with human labels)
Documentation pages: 18 (up from 3)

Pitfalls / What Broke

DALL-E 3 images are inconsistent. Same prompt generates wildly different images. Some look amazing, some look like garbage. I regenerated 4 images 3+ times each to get usable results. No way to control this beyond prompt engineering and luck.

Interactive examples broke on Safari. The iframe sandboxing doesn’t work the same way—eval() throws security errors. Had to add browser detection and disable interactive mode on Safari. About 12% of visitors use Safari (from Vercel Analytics). They get a “view only” mode instead.

Test suite is slow. 47 tests take 8.2 seconds to run because the AI evaluation tests call OpenAI API. Each call takes ~1.5s. Multiplied by 15 AI tests = 22.5s just for AI tests. Added caching to reuse AI responses for identical inputs (commit 4052858), cutting AI test time to 3.1s.

Tutorials are hard to maintain. Every time the CLI API changes, I have to update 3 tutorials + code examples + expected outputs. Considering auto-generating tutorials from tests (pipe test code through GPT-4 to create prose explanations).

Add video tutorials (screen recordings with voiceover)
Build a “playground” where users can test Blocks without installing
Auto-generate tutorial content from test suite
Add more AI evaluation tests (currently only 15, want 50+)

Isolator: Sandbox Comparison and Base UI

Problem

I’m researching code sandboxing solutions (exe.dev, unsandbox, Sprites.dev) and needed a way to compare features side-by-side. Also wanted to build a design system (Base UI) that could be reused across projects.

Approach

I shipped 9 commits focused on comparison pages and component libraries.

Built a feature comparison table (commit 9f0794d) showing exe.dev vs unsandbox vs Sprites.dev across 15 dimensions: pricing, boot time, language support, persistence, etc. Data comes from public docs and my own testing.

Created Base UI component library (commit 5e0865f) with 20+ components: Button, Input, Card, Modal, Dropdown, Tabs, etc. All components have:

TypeScript definitions
Dark mode support
Accessibility (ARIA labels, keyboard nav)
Variants (primary, secondary, danger, etc.)

Example Button component:

export interface ButtonProps {
        variant?: 'primary' | 'secondary' | 'danger';
        size?: 'sm' | 'md' | 'lg';
        disabled?: boolean;
        onClick?: () => void;
        children: React.ReactNode;
      }
      
      export function Button({
        variant = 'primary',
        size = 'md',
        disabled = false,
        onClick,
        children
      }: ButtonProps) {
        return (
          <button
            className={`btn btn-${variant} btn-${size}`}
            disabled={disabled}
            onClick={onClick}
            aria-disabled={disabled}
          >
            {children}
          </button>
        );
      }

CSS uses design tokens: --color-primary, --color-secondary, etc. This makes theming trivial—just override CSS variables.

Built a styleguide page (commit ca22387) showing all components with example usage and code snippets. Added an unsandbox mockup (commit 6f96cd2) to demonstrate Base UI in a real layout.

Results

Components built: 23
Feature comparison dimensions: 15
Design tokens: 42 CSS variables
TypeScript coverage: 100% (all components typed)

Pitfalls / What Broke

Hydration errors in nested buttons (commit d2ccc58). Base UI had a Button component that rendered another button inside it for some variants. React doesn’t allow nested buttons, so hydration failed. Fixed by using div with role="button" for container, actual button for inner element.

Base UI bundle size is 180KB. That’s huge for a component library. Most of the size is from icons (SVG data embedded in components). Need to extract icons to separate package or use icon font.

Feature comparison table is manually maintained. Every time exe.dev or unsandbox updates pricing/features, I have to manually edit the table. Considering scraping their docs weekly and auto-generating the table, but that’s fragile.

Extract icons to separate @base-ui/icons package
Add animation library (framer-motion or similar)
Build more complex components (DataTable, DatePicker, etc.)
Auto-scrape competitor docs for feature comparison updates

Pokémon Games: A Palette Cleanser

Problem

Sometimes you need a break from APIs and graph algorithms. My kid wanted Pokémon games and I wanted to build something fun.

Approach

I shipped 2 commits: initial Next.js setup + 9 Pokémon games (commit a083090).

Games include:

Memory Match - Flip cards to find matching Pokémon
Pokémon Quiz - Multiple choice trivia about types and evolutions
Catch Em All - Click Pokémon before they escape
Type Battle - Rock-paper-scissors with Pokémon types
Spot the Difference - Find differences between two Pokémon sprites
Pokémon Bingo - Classic bingo with Pokémon instead of numbers
Evolution Chain - Arrange Pokémon in correct evolution order
Name That Pokémon - Identify Pokémon from silhouettes
Pokémon Math - Math problems with Pokémon rewards

All games use the PokéAPI for sprites and data. Built with Next.js + React + Tailwind CSS. No backend—everything runs client-side.

Example game logic (Memory Match):

const [cards, setCards] = useState<Card[]>([]);
      const [flipped, setFlipped] = useState<number[]>([]);
      const [matched, setMatched] = useState<number[]>([]);
      
      useEffect(() => {
        if (flipped.length === 2) {
          const [first, second] = flipped;
          if (cards[first].id === cards[second].id) {
            setMatched([...matched, first, second]);
          }
          setTimeout(() => setFlipped([]), 1000);
        }
      }, [flipped]);
      
      function handleCardClick(index: number) {
        if (flipped.length < 2 && !flipped.includes(index) && !matched.includes(index)) {
          setFlipped([...flipped, index]);
        }
      }

Results

Games built: 9
Total code: ~4,800 lines (mostly game logic + Tailwind classes)
API calls: ~50 requests to PokéAPI per game load (cached after first load)
Kid approval rating: 10/10 (subjective, measured by amount of screaming with joy)

Pitfalls / What Broke

PokéAPI rate limiting is real. Hitting the API 50 times in parallel gets you throttled. Added a simple cache layer using localStorage to store sprite URLs. Cache hit rate: ~85% after first game load (from manual testing).

Responsive design is hard. Games look great on desktop, okay on tablet, terrible on mobile. Card grids don’t reflow properly on small screens. Need to add media queries or switch to CSS Grid instead of Flexbox.

No difficulty levels. All games are hardcoded to medium difficulty. Should add easy/medium/hard modes that change number of cards, time limits, etc.

Add difficulty settings (easy/medium/hard)
Fix mobile responsive issues
Add sound effects (Pokémon cries, success/failure sounds)
Deploy to Vercel and share with other parents

What’s Next

TPMJS: Fix Vercel timeout issues, add streaming for tool execution, build scenarios marketplace, implement cost tracking per API key
Omega: Add roast leaderboard, sandbox the PostgreSQL executor, improve schema drift detection
JSON Resume: Add collapsible clusters for large graphs, build equity calculator, implement graph search and undo/redo
Blocks: Create video tutorials, build browser playground, auto-generate docs from tests
Isolator: Extract icons to separate package, add animation library, build DataTable and DatePicker
Pokémon Games: Add difficulty settings, fix mobile UI, deploy to production
Cross-project: Unify design systems across TPMJS, Blocks, and Isolator—they’re all using slightly different component APIs

Links & Resources

Projects

TPMJS - Tool Package Manager for AI Agents
JSON Resume Pathways - Career visualization with graph algorithms
Blocks Documentation - CLI validation library with AI-powered testing
Isolator Sandbox Comparison - Feature comparison of code sandboxing solutions

NPM Packages

@tpmjs/ui - Design system components (Button, Card, Modal, etc.)
@tpmjs/cli - Command-line interface for TPMJS platform
@tpmjs/tools-hllm - HLLM API integration tools
@tpmjs/executor - Hot-swappable code execution engine (Deno/Vercel Sandbox/custom VMs)
@base-ui/components - Reusable React component library with TypeScript

Tools & Services

Vercel AI SDK - Used for all AI generation (GPT-5.2, GPT-4.1)
Vercel Sandbox - Code execution in isolated VMs
PokéAPI - Pokémon data and sprites
DALL-E 3 - AI-generated OG images
Effect-TS - Functional programming for TypeScript
React Virtuoso - Virtualized tables for large datasets
Streamdown - Streaming markdown renderer

Inspiration

exe.dev - VM management for AI agents
unsandbox - Code execution platform
Sprites.dev - Ephemeral cloud VMs

← Back to posts