Loads and samples video frames with accurate timestamps Sends a precise multi-task prompt to LLaVA (frame analysis + transcription) Extracts and formats the output cleanly into visual events and speech Uses a default prompt to auto-analyze the uploaded video