GLM-Image: Open-Source Image Generator with Advanced Text-to-Image Capabilities

GLM-Image is an open-source image generator that has generated interest due to its technical design. Previously, Zhipu released a successful open-source large language model called GLM, which outperformed many benchmarks and received positive feedback. Since then, there have been rumors about a potential image generator, which has now been released. It is available on FAL: https://fal.ai/models/fal-ai/glm-image.

The system’s architecture separates the thinking process of image generation from the rendering process. It uses an autoregressive model with 9 billion parameters to interpret instructions and handle complex, knowledge-rich prompts that typically challenge pure diffusion models. This understanding is then passed to a 7-billion-parameter diffusion decoder for rendering the final image. This two-step process, enhanced by a special Glyph Encoder, appears to be designed for accurate text-to-image translation. The model also includes built-in image editing and style transfer features. According to the developers, its overall quality matches leading diffusion models and surpasses them in complex or data-intensive tasks. Benchmarks suggest it performs very well, even surpassing some expectations.

However, practical testing shows that the image quality and editing capabilities may still need improvement. Initial tests, such as generating a squirrel image, resulted in somewhat pale images. The editing features require further evaluation, and their effectiveness is not yet clear. For more information, code, and weights are available at https://github.com/zai-org/GLM-Image. Sample images can be found at the provided links. Users are encouraged to test the model and share their experiences.

Links:
https://chat.z.ai/
https://fal.ai/models/fal-ai/glm-image
https://fal.ai/models/fal-ai/glm-image/image-to-image
https://github.com/zai-org/GLM-Image
https://z.ai/blog/glm-image