Qwen-Image
By Alibaba Cloud Qwen Team
Qwen-Image is now available!
Open-source Advanced Text-to-Image Generative Model

Introduction
We are thrilled to release Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.

🚀 Multimodal AI Capabilities
Part of the Qwen (Tongyi Qianwen) model series, offering powerful text-to-image generation with exceptional understanding of complex prompts
🌟 Open Source Innovation
Part of Alibaba's commitment to open-source AI development, allowing researchers and developers to build upon and extend its capabilities
🔍 Comprehensive Model Family
Works alongside other Qwen models for text, vision, and multimodal applications, providing a complete ecosystem for AI development
Quick Start
Choose your preferred Qwen image model:
Option 1: Using the latest Qwen VLo model
The new Qwen VLo model supports both text-to-image and image-to-image generation with progressive generation feature.
pip install dashscope>=1.20.7
import dashscope
from dashscope import ImageSynthesis
# Set your API key
dashscope.api_key = "YOUR_API_KEY"
# Text-to-image generation
response = ImageSynthesis.call(
model='qwen-vlo',
prompt='A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup"',
negative_prompt='blurry, low quality',
n=1, # Number of images to generate
size='1024*1024', # Image size
steps=50 # Diffusion steps
)
# Save the generated image
if response.status_code == 200:
with open('qwen_vlo_result.png', 'wb') as f:
f.write(response.output.images[0].image)
print('Image saved successfully!')
else:
print(f'Failed to generate image: {response.message}')
Option 2: Using Qwen-Image with diffusers
Install the latest version of diffusers
pip install git+https://github.com/huggingface/diffusers
The following contains a code snippet illustrating how to use the model to generate images based on text prompts:
from diffusers import DiffusionPipeline
import torch
model_name = "Qwen/Qwen-Image"
# Load the pipeline
if torch.cuda.is_available():
torch_dtype = torch.bfloat16
device = "cuda"
else:
torch_dtype = torch.float32
device = "cpu"
pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
pipe = pipe.to(device)
positive_magic = {
"en": "Ultra HD, 4K, cinematic composition.", # for english prompt
"zh": "超清,4K,电影级构图" # for chinese prompt
}
# Generate image
prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''
negative_prompt = " "
# Generate with different aspect ratios
aspect_ratios = {
"1:1": (1328, 1328),
"16:9": (1664, 928),
"9:16": (928, 1664),
"4:3": (1472, 1140),
"3:4": (1140, 1472)
}
width, height = aspect_ratios["16:9"]
image = pipe(
prompt=prompt + positive_magic["en"],
negative_prompt=negative_prompt,
width=width,
height=height,
num_inference_steps=50,
true_cfg_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]
image.save("example.png")
Show Cases
Superior Text Rendering
One of its standout capabilities is high-fidelity text rendering across diverse images. Whether it's alphabetic languages like English or logographic scripts like Chinese, Qwen-Image preserves typographic details, layout coherence, and contextual harmony with stunning accuracy. Text isn't just overlaid—it's seamlessly integrated into the visual fabric.

Artistic Style Support
Beyond text, Qwen-Image excels at general image generation with support for a wide range of artistic styles. From photorealistic scenes to impressionist paintings, from anime aesthetics to minimalist design, the model adapts fluidly to creative prompts, making it a versatile tool for artists, designers, and storytellers.

Advanced Image Editing
When it comes to image editing, Qwen-Image goes far beyond simple adjustments. It enables advanced operations such as style transfer, object insertion or removal, detail enhancement, text editing within images, and even human pose manipulation—all with intuitive input and coherent output. This level of control brings professional-grade editing within reach of everyday users.

Image Understanding
But Qwen-Image doesn't just create or edit—it understands. It supports a suite of image understanding tasks, including object detection, semantic segmentation, depth and edge (Canny) estimation, novel view synthesis, and super-resolution. These capabilities, while technically distinct, can all be seen as specialized forms of intelligent image editing, powered by deep visual comprehension.

Together, these features make Qwen-Image not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation and manipulation—where language, layout, and imagery converge.