JanusFlow

JanusFlow is an advanced framework designed to unify image understanding and generation within a single model. It introduces a streamlined architecture that combines autoregressive language models with rectified flow—a cutting-edge technique in generative modeling. Primary discovery shows that rectified flow can be effectively trained within the large language model framework, simplifying the process by removing the need for intricate architectural changes.

Model Summary

JanusFlow is a unified multimodal large language model (MLLM) that separates visual encoding tasks for both understanding and generation. Built upon the DeepSeek-LLM-1.3b-base architecture, it incorporates distinct modules for each function. For multimodal understanding, JanusFlow utilizes the SigLIP-L vision encoder, which processes images at a resolution of 384 x 384. For image generation, it employs rectified flow in combination with SDXL-VAE to produce images at the same resolution. The available checkpoint is an EMA (Exponential Moving Average) checkpoint obtained after comprehensive pre-training and supervised fine-tuning.

Model Download

We release Janus to the public to support a broader and more diverse range of research within both academic and commercial communities. Please note that the use of this model is subject to the terms outlined in License section. Commercial usage is permitted under these terms.

Model	Sequence Length	Download
Janus-1.3B	4096	🤗 Hugging Face
JanusFlow-1.3B	4096	🤗 Hugging Face

License

This code repository is licensed under the MIT License. The use of JanusFlow models is subject to DeepSeek Model License.

JanusFlow

Model Summary

Model Download

License

Links