Our Models are built on
the latest AI research

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

November 10, 2024

BitNet a4.8: 4-bit Activations for 1-bit LLMs

November 10, 2024

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

November 9, 2024

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

November 8, 2024

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

November 5, 2024

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

November 1, 2023

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

November 1, 2024

Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model

October 30, 2024

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

October 25, 2024

Our Models are built onthe latest AI research