xAI shipped Grok Imagine Video 1.5 this week, claiming the top spot on the Image-to-Video Arena leaderboard with a +52 Elo jump over 1.0, beating Sora 2, Veo 3.1, Seedance 2.0 and Kling. The headline for working creators is price: $4.20 per minute via the Imagine API versus Sora 2 Pro at $30 and Veo 3.1 at $12, with native synchronized audio and lip-sync generated in a single pass. Output is 480p or 720p at 24fps, clips from 1–15 seconds, with a 6-second 720p render in about 25 seconds on the Fast tier.
Runway dropped Seedance 2.0 Fast onto its API on June 5, giving you ByteDance's cinematic motion model with keyframe control, reference images, reference videos and generated audio in one pipeline. It covers text-to-video, image-to-video and video-to-video at 480p or 720p, with durations from 4 to 15 seconds. If you already build in Runway, this is a same-day add to your toolkit.
Also fresh on the Runway API, Aleph 2.0 (model id aleph2) went live June 2 and edits existing videos from text prompts, with optional keyframe images placed at specific timestamps. It accepts input clips from 2 to 30 seconds and up to 5 keyframe images. This is the post-production lever a lot of creators have been waiting for — surgical edits on real footage instead of regenerating from scratch.
YouTube's detection systems are now flagging the patterns behind mass-produced AI videos, and thousands of faceless AI channels have had monetization suspended under the inauthentic-content policy. The economics still tempt — production costs have collapsed to under $3 per video and finance/tech niches command $15–$40 per thousand views — but the success rate sits at roughly 3 percent. Treat low-effort batch uploading as a demonetization risk, not a strategy.
TikTok is rolling out invisible watermarking — a platform-readable signal embedded in the media that survives re-uploads and re-edits even when visible labels and metadata are stripped. It's being applied to content from TikTok's own AI tools like AI Editor Pro and to anything carrying C2PA credentials, part of an effort that has labeled over 1.3 billion videos so far. If you're laundering AI clips through re-uploads to dodge labels, that loophole is closing.
The latest ComfyUI updates bring block prefetch and async LoRA loading for notably faster LTX performance, plus a new Wan2.5 Image-to-Image API node for image editing. There's also a Union IC-LoRA combining depth and canny edge control into one LoRA with downsampled latent processing to cut memory and speed inference, and LTXVSaveConditioning/LoadConditioning nodes that let you encode a prompt once and reuse it across runs. Real compute savings if you're iterating heavily.
NVIDIA's RTX optimizations for ComfyUI deliver 3x faster generation with 60% less VRAM using NVFP4, and 2x faster with 40% less VRAM on NVFP8, alongside an RTX Video node that upscales to 4K in seconds. Weight streaming lets ComfyUI spill into system RAM when VRAM runs out, putting bigger LTX-2 graphs within reach of mid-range cards. For self-hosters, this is the difference between "can't run it" and "runs tonight."
Wan 2.7, released under Apache 2.0 like its predecessors, generates 1080p up to 15 seconds with text-to-video, image-to-video and native audio built in, and runs locally in ComfyUI. Practical minimum is 24GB VRAM, with GGUF quantized builds available for lower-end hardware at the cost of speed and quality. For commercial work without per-minute API fees, this remains the open-source pick to standardize on.