Slideshow Studio — turn a folder of images into a Ken Burns slideshow
Slideshow Studio is a small weekend app I built: drop in a folder of images, get back a 1080p Ken Burns MP4. No build step, no JavaScript framework, no database. FastAPI on the backend, a single vanilla HTML page on the frontend, ffmpeg doing the heavy lifting. I use it for fashion / product rolls where I want clean motion, predictable framing, and a one-click render.
The hardest problem: the image shakes
Anyone who has used ffmpeg's zoompan filter knows the feeling: you slow-pan an image over a few seconds, and instead of buttery motion you get pixel-by-pixel hops like an old film loop. The reason is that zoompan computes the crop center per frame using integer math — with a slow camera (zoom 1.06 over 4 seconds = a delta of 0.0005 per frame), a 1920×1280 source simply doesn't have enough resolution to represent smooth motion. Every frame the center moves exactly 0 or 1 pixel; there is no in-between.
My fix is dumb but effective: pre-scale the source so its short edge is ≥ 6000 px before feeding it to zoompan. At that resolution the per-frame delta lives well below one displayed pixel, ffmpeg interpolates smoothly, and the 1080p output is clean. One line of code, a few extra seconds per render, and nobody asks 'why does the video judder' anymore.
Crop or letterbox
Every image can have a different aspect ratio from the output. Two schools of thought: cover-crop or contain-with-bars. I expose both as a toggle. In Crop mode the image fills the frame and may lose subject — clean and simple when ratios match. In Letterbox mode the image keeps its ratio and the remaining bands are filled with a blurred copy of itself. The subtle bit is that the Ken Burns direction pool has to change too: if you letterbox a vertical image and then pan horizontally, the viewer's eye runs straight across the blurred bands. So I bias the pool toward the long axis — everything stays in frame.
Stack: just enough
- Backend: FastAPI + Uvicorn, about 140 lines of REST + static serving.
- Renderer: ffmpeg + ffprobe over subprocess. ~190-line pipeline.
- Frontend: a single HTML file, Tailwind from a CDN, SortableJS from a CDN for drag-reorder. No bundler, no node_modules.
I went out of my way to avoid frameworks. This is an internal tool, one user, runs on localhost. Every layer of abstraction is something I'd have to debug when my battery is dying and I'm chasing an 11 PM deliverable. Vanilla JS plus a Tailwind CDN gets me a respectable UI in 30 minutes, and 6 months from now I can still read it.
Small UX touches
A few things I let myself polish, because I'm also the user:
- Drag-to-reorder uses a FLIP animation — thumbnails slide into place instead of snapping. SortableJS does the heavy lifting; I just added a few transitions.
- Replace/delete buttons appear only on hover; clicking a thumbnail opens a full-size preview. Nothing fights for space when it isn't needed.
- The output gallery shows a thumbnail per rendered video, double-click the filename to rename inline, hover reveals × to delete and ↓ to download. The preview resizes to each video's native aspect when you switch.
- Per-slide duration is a 1–10 second slider; speed is a Slow/Medium/Fast dropdown; output ratio is 1:1, 16:9, 9:16, 4:3, 3:4. Every setting previews live in the main viewport.
Render pipeline
Each slide runs one of two paths, then build_slideshow concatenates the clips with -c copy so there is no second re-encode:
unknown nodeThe Ken Burns direction is randomized per slide from a pool of {pan horizontal, pan vertical, zoom in, zoom out}. The pool is filtered by the current image — portrait biases vertical, landscape biases horizontal, square gets all four.
What's next
The code is MIT on GitHub. I might add a 'random' duration preset so each slide gets a different length, and an option to drop in a music track with crossfades at the head and tail. But the core is solid: drop images, click render, get a non-shaking video. Some days that's all you need.
Share