JanusFlow is an advanced framework designed to unify image understanding and generation within a single model. It introduces a streamlined architecture that combines autoregressive language models with rectified flow—a cutting-edge technique in generative modeling. Primary discovery shows that rectified flow can be effectively trained within the large language model framework, simplifying the process by removing the need for intricate architectural changes.

Model Summary
JanusFlow is a unified multimodal large language model (MLLM) that separates visual encoding tasks for both understanding and generation. Built upon the DeepSeek-LLM-1.3b-base architecture, it incorporates distinct modules for each function. For multimodal understanding, JanusFlow utilizes the SigLIP-L vision encoder, which processes images at a resolution of 384 x 384. For image generation, it employs rectified flow in combination with SDXL-VAE to produce images at the same resolution. The available checkpoint is an EMA (Exponential Moving Average) checkpoint obtained after comprehensive pre-training and supervised fine-tuning.

Model Download
We release Janus to the public to support a broader and more diverse range of research within both academic and commercial communities. Please note that the use of this model is subject to the terms outlined in License section. Commercial usage is permitted under these terms.
| Model | Sequence Length | Download |
|---|---|---|
| Janus-1.3B | 4096 | 🤗 Hugging Face |
| JanusFlow-1.3B | 4096 | 🤗 Hugging Face |
License
This code repository is licensed under the MIT License. The use of JanusFlow models is subject to DeepSeek Model License.

