Developer Tutorials

Generate TikTok, Reels, and Shorts from JSON and AI Prompts

Generate TikTok, Reels, and YouTube Shorts from JSON prompts, AI video workflows, and Zvid REST API payloads.

Published June 9, 2026

Generate TikTok, Reels, and Shorts from JSON and AI Prompts

Generate TikTok, Reels, and Shorts from JSON and AI Prompts

You can generate TikTok, Instagram Reels, and YouTube Shorts from one JSON template by separating AI prompt output from platform render settings. Keep the script, scenes, captions, media choices, and timing in a reusable JSON data model, then render platform-specific variants by changing the Zvid project resolution, layout spacing, text length, and final CTA. With Zvid, the public REST API receives a structured video project as JSON, so the same backend can submit a tiktok, instagram-reel, or youtube-short render without rebuilding the video by hand.

The practical workflow is: create one short-form video template, map your content data into a Zvid payload, submit it to POST https://api.zvid.io/api/render/api-key, store the job ID, and poll GET https://api.zvid.io/api/jobs/{id} until the render is ready. Keep the Getting Started guide, Authentication guide, JSON Structure overview, resolution presets reference, and submit render job endpoint open while you build the first version.

If this is your first Zvid project, start with the guide to generate a video from JSON. For richer short-form scenes, pair this workflow with the tutorials on adding B-roll automatically with JSON and adding subtitles to video with JSON.

Developer workflow for generating TikTok Reels and Shorts from JSON

A reusable JSON template lets your backend generate vertical video variants without manual editing.

Start with reusable source data

Do not make the first template platform-specific. Model the story once, then let the render layer decide the output preset.

A useful source record for automated social videos usually includes:

  • Hook text for the first second.
  • Scene list with start time, duration, headline, and supporting media.
  • Optional B-roll URLs, product images, screenshots, or generated stills.
  • Caption text or word-level subtitle data.
  • Brand colors, font choices, and safe-area margins.
  • CTA text and destination.
  • Output targets such as TikTok, Reels, Shorts, or a square preview.

That source data is not the final render payload. Your application can keep it small, validate it, and compile it into a Zvid project only when a render is needed.

Where AI video prompts fit

AI video workflows often start with a prompt, but the renderer should receive structured JSON. Use JSON prompts when an AI step needs to return a hook, scene list, caption copy, visual notes, and clip ideas in a predictable format. Then validate that JSON before building the Zvid payload.

A practical AI prompt can ask for plain fields such as hook, scenes, caption, brollIdea, and cta. Your app can turn those fields into video generation instructions, choose approved images and videos, and render the result through the API. This keeps the creative prompt useful without letting a model invent unsupported render fields.

If your workflow uses n8n, Make.com, a custom JavaScript worker, or another automation tool, keep the same boundary: AI creates structured content, your backend validates it, and Zvid renders the final short-form video.

This is also the safest way to use an AI video generator for TikTok-style content. Let AI tools help with video content ideas, text prompts, captions, clip notes, or a first JSON2Video outline. Then convert the result into clean JSON, explicit timestamps, approved media URLs, and a renderable API payload. The goal is not to let a prompt act like a video editor; the goal is to automate video creation with data your system can review.

For AI video content, keep the model's job narrow: generate video ideas, draft captions, suggest scene order, or describe images and videos your system should fetch. Videos with AI become easier to control when the final render step still depends on explicit JSON instead of a free-form prompt.

Use platform presets as render variants

Zvid's resolution presets include tiktok, instagram-reel, and youtube-short, which are vertical formats intended for short-form video. The value belongs in the project-level resolution field. That means your template compiler can reuse the same scene data and create a variant by changing one output preset plus any layout-specific adjustments.

For example, your backend can keep a target list like this:

[
  { "platform": "tiktok", "resolution": "tiktok" },
  { "platform": "instagram_reel", "resolution": "instagram-reel" },
  { "platform": "youtube_short", "resolution": "youtube-short" }
]

Each variant should still get a short layout pass. The canvas dimensions are similar, but the product context can differ. Keep the render template reusable, but do not assume every platform needs identical copy.

Workflow for generating short-form social videos from one JSON template

The template owns the story; each render variant owns the output preset and final layout checks.

Submit the render through the Zvid API

For the public API, wrap the Zvid project in a top-level payload field and submit it with your API key:

curl -X POST https://api.zvid.io/api/render/api-key \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d @short-form-render.json

Then check the job:

curl -X GET https://api.zvid.io/api/jobs/{id} \
  -H "x-api-key: YOUR_API_KEY"

A render worker can loop over your targets, build one payload per target, submit each job, and store the returned job ID beside the source record.

Automate the workflow with JavaScript or no-code tools

The automation layer does not need to be complicated. A JavaScript service, n8n workflow, Make.com scenario, or queue worker can collect source records, call an AI tool for JSON prompts when needed, build the Zvid REST API payload, and submit one render job per output target.

For clip sourcing, keep approved images and videos in your own media library, CMS, product feed, or licensed stock workflow. Store the final media URLs in the source record before rendering. The JSON template should reference media that your system is allowed to use and that the API can fetch.

Some teams keep prompt templates in GitHub, expose approved actions through an MCP server, or call a third-party API for narration, images, or avatar assets before rendering. That can work as long as the final Zvid payload stays plain English to debug: one source record, one template, one set of timestamps, and one generated video job per platform.

This article is not a best AI video generator roundup. Tools such as Canva, ElevenLabs, HeyGen, or an internal LLM workflow may help with design review, voiceover, avatar footage, or script drafts, but the renderer still needs clean JSON and reachable media URLs. Use those tools upstream, then submit the final video generation payload to Zvid.

Copy-paste Zvid payload for a short-form template

The payload below renders a vertical short-form template demo. It uses a single SVG layer so the example is easy to copy and inspect, but the same project can include TEXT, IMAGE, VIDEO, GIF, audio, and subtitles when your workflow needs richer assets.

{
  "name": "short-form-json-template-demo",
  "resolution": "tiktok",
  "duration": 10,
  "frameRate": 30,
  "outputFormat": "mp4",
  "backgroundColor": "#07111F",
  "visuals": [
    {
      "type": "SVG",
      "width": 1080,
      "height": 1920,
      "track": 1,
      "svg": "<svg width='1080' height='1920' viewBox='0 0 1080 1920' xmlns='http://www.w3.org/2000/svg'><defs><linearGradient id='bg' x1='0' y1='0' x2='1' y2='1'><stop offset='0' stop-color='#07111F'/><stop offset='1' stop-color='#261640'/></linearGradient><linearGradient id='accent' x1='0' y1='0' x2='1' y2='0'><stop offset='0' stop-color='#2DD4BF'/><stop offset='0.55' stop-color='#FADD46'/><stop offset='1' stop-color='#FB7185'/></linearGradient></defs><rect width='1080' height='1920' fill='url(#bg)'/><rect x='74' y='80' width='932' height='1760' rx='54' fill='#0C172A' stroke='#263755' stroke-width='4'/><rect x='118' y='138' width='844' height='16' rx='8' fill='url(#accent)'/><text x='540' y='245' text-anchor='middle' fill='#FFFFFF' font-family='Arial' font-size='62' font-weight='800'>One JSON Template</text><text x='540' y='306' text-anchor='middle' fill='#BFD0EE' font-family='Arial' font-size='34'>Render vertical variants for every channel</text><rect x='148' y='404' width='784' height='180' rx='34' fill='#10233A' stroke='#2DD4BF' stroke-width='3'/><text x='540' y='474' text-anchor='middle' fill='#2DD4BF' font-family='Arial' font-size='42' font-weight='800'>Hook</text><text x='540' y='530' text-anchor='middle' fill='#FFFFFF' font-family='Arial' font-size='34'>Stop rebuilding short videos by hand</text><rect x='148' y='650' width='784' height='350' rx='34' fill='#111D35' stroke='#FADD46' stroke-width='3'/><text x='540' y='730' text-anchor='middle' fill='#FADD46' font-family='Arial' font-size='42' font-weight='800'>Scenes</text><rect x='210' y='790' width='660' height='44' rx='22' fill='#243757'/><rect x='210' y='864' width='540' height='44' rx='22' fill='#20314F'/><rect x='210' y='938' width='620' height='44' rx='22' fill='#20314F'/><rect x='148' y='1072' width='784' height='250' rx='34' fill='#151B31' stroke='#FB7185' stroke-width='3'/><text x='540' y='1150' text-anchor='middle' fill='#FB7185' font-family='Arial' font-size='42' font-weight='800'>Variants</text><rect x='206' y='1210' width='184' height='58' rx='29' fill='#263755'/><rect x='448' y='1210' width='184' height='58' rx='29' fill='#263755'/><rect x='690' y='1210' width='184' height='58' rx='29' fill='#263755'/><text x='298' y='1249' text-anchor='middle' fill='#FFFFFF' font-family='Arial' font-size='25' font-weight='800'>TikTok</text><text x='540' y='1249' text-anchor='middle' fill='#FFFFFF' font-family='Arial' font-size='25' font-weight='800'>Reels</text><text x='782' y='1249' text-anchor='middle' fill='#FFFFFF' font-family='Arial' font-size='25' font-weight='800'>Shorts</text><rect x='176' y='1418' width='728' height='222' rx='42' fill='#F8FAFC'/><text x='540' y='1504' text-anchor='middle' fill='#07111F' font-family='Arial' font-size='44' font-weight='800'>Submit JSON</text><text x='540' y='1564' text-anchor='middle' fill='#334155' font-family='Arial' font-size='31'>Render once per output preset</text><rect x='290' y='1696' width='500' height='70' rx='35' fill='url(#accent)'/><text x='540' y='1743' text-anchor='middle' fill='#07111F' font-family='Arial' font-size='30' font-weight='800'>Ready for automation</text></svg>"
    }
  ]
}

Zvid JSON payload visual for short-form social video automation

This payload visual is generated from the same Zvid API payload shown above.

To render an Instagram Reels variant, keep the body of the project the same and change the project-level resolution to instagram-reel. For a YouTube Shorts variant, change it to youtube-short. In production, your compiler can also shorten the CTA, swap the final screen, or adjust safe-area margins before submitting the job.

Map the same scenes into each platform

A good short-form JSON template has stable scene roles:

  • Hook: the first visual statement, usually under two seconds.
  • Problem: why the viewer should care.
  • Proof: product shot, data point, transformation, or example.
  • Steps: a short sequence of actions or benefits.
  • CTA: the final instruction or next step.

Those roles can become timeline elements with explicit enterBegin, enterEnd, exitBegin, and exitEnd values. A template compiler can calculate these values from scene durations, then place text and media on the vertical canvas.

Diagram mapping source data to short-form JSON video timeline layers

Scene roles make the template easier to adapt than a pile of one-off timeline edits.

Keep captions and overlays inside safe areas

Vertical video templates fail when text sits too close to the edges or when captions compete with platform UI. Treat the center of the frame as the safest content zone.

Use fixed text boxes instead of unconstrained text. If your source content comes from an AI system, apply the same rules: maximum hook length, maximum caption length, allowed CTA patterns, and required media aspect ratios.

For subtitles, use structured timing so the captions match the video rhythm. The subtitle tutorial on timed captions with JSON is a useful next step if your template needs word-level caption control.

Many short-form teams talk about a 3-second rule for TikTok-style hooks. Treat that as a creative reminder, not an API requirement: the opening scene should communicate the topic quickly, and the JSON template should make that first caption short enough to read.

Compare one-template and one-off workflows

One-off editing works for a few posts. It breaks down when your product needs many variants, multiple languages, product-feed videos, or AI-generated scripts.

Comparison of manual short-form video editing and JSON template automation

A reusable template creates leverage when the same video structure needs many variants.

Use a reusable JSON template when:

  • The same story format appears across many records.
  • Your app needs to render several platform variants.
  • Creators or AI systems supply structured content upstream.
  • The team needs repeatable layout, timing, captions, and CTA behavior.
  • Completed renders must connect back to a source record or campaign.

Manual editing still makes sense for flagship creative. For repeatable social videos, a JSON video template gives developers a cleaner system boundary.

How it works in a backend service

A backend implementation can stay small:

  1. Receive a source record from a CMS, product feed, AI workflow, or form.
  2. Validate required fields such as hook, scenes, media URLs, and CTA.
  3. Build a Zvid project for each target output.
  4. Submit each project to the Zvid render endpoint.
  5. Store each job ID with the source record and target platform.
  6. Poll the job status endpoint and save completed video URLs.

For higher-volume workflows, read the guide on bulk video generation with an API. The core idea is the same: source data creates payloads, the API handles render jobs, and your product stores status.

This step-by-step pattern is useful whether you are building a fully automated short-form video content system or a marketer-facing tool that still requires approval before publication.

Common mistakes

The most common mistake is trying to make one final video file serve every channel. Keep the source story reusable, but render variants for each target.

Other mistakes include:

  • Letting AI-generated hooks exceed the safe text box.
  • Placing captions too low on the vertical frame.
  • Reusing horizontal image crops without checking vertical composition.
  • Changing resolution without reviewing line breaks.
  • Treating B-roll timing as optional when captions refer to it.
  • Submitting jobs without storing the payload version.
  • Making every field editable instead of locking template structure.

The fix is to enforce constraints before rendering: keep text short, use stable scene roles, validate media, and generate one payload per output target.

When to use Zvid

Use Zvid when your application needs to generate short-form videos from structured JSON through an API. It is a strong fit for AI video apps, content automation tools, e-commerce promos, educational explainers, and agency workflows where the same template needs many variants.

Use cases for automated TikTok Reels and Shorts generation with JSON

Short-form automation works best when structured records need consistent creative output.

Zvid is especially useful when you want:

  • JSON-controlled scenes, timing, media, and output format.
  • Backend render jobs instead of manual exports.
  • Repeatable vertical templates for TikTok, Reels, and Shorts.
  • Caption, B-roll, and CTA logic that can be tested once and reused.
  • A clean API workflow that connects render jobs to your product database.

Start with one template and one platform preset. Once the render looks right, add the other output targets and tighten the text and media constraints where the video layout needs protection.

FAQs

Can I generate TikTok, Reels, and Shorts from one JSON template?

Yes. Keep the source story and scene data reusable, then render a separate Zvid payload for each output preset such as tiktok, instagram-reel, and youtube-short.

Is one video file enough for every short-form platform?

Usually no. One reusable template is useful, but each platform variant should be rendered and reviewed with its own output preset, margins, text length, and CTA treatment.

What should the JSON template include?

Start with hook text, scene roles, media URLs, caption data, CTA copy, brand styles, output target, and timing rules. Your backend can compile that source data into the final Zvid API payload.

How does Zvid render a short-form video from JSON?

Your application submits a JSON project to the Zvid render endpoint. Zvid creates a render job, and your backend checks the job status endpoint until the completed video is ready.

Which resolution should I use for vertical short-form video?

Use the preset that matches the target output, such as tiktok, instagram-reel, or youtube-short. These presets are defined in Zvid's resolution preset reference.

How do people make AI shorts from JSON?

A common workflow is to ask an LLM for JSON prompts, validate the output, attach approved images and videos, then submit a render payload to a video generation API.

Where should clips for TikTok edits come from?

Use clips, images, and audio your team owns, licenses, or is allowed to use. Store the approved asset URLs in your source record before building the Zvid payload.

How do I keep AI-generated short videos from breaking layouts?

Validate the AI output before rendering. Limit hook length, caption length, CTA length, scene count, media aspect ratio, and allowed template fields.

Do I need a video editor for every variant?

No. A video editor is still useful for flagship creative, but repeatable TikTok video, Instagram Reels, and YouTube Shorts workflows can use JSON templates, approved media, and API rendering for high-quality automated output.

What is the best first implementation?

Start with a single vertical template, one payload, and one output preset. After that render works, add platform variants and automate the loop across source records.

Share