Technology
Research
Contact
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
November 10, 2024
BitNet a4.8: 4-bit Activations for 1-bit LLMs
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
November 9, 2024
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
November 8, 2024
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
November 5, 2024
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
November 1, 2023
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
November 1, 2024
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model
October 30, 2024
October 25, 2024