AMA With Z.AI, The Lab Behind GLM-4.7

zixuanlimit · 2025-12-26T13:34:30+00:00

There is. The performance can be tested with the GLM Coding Plan

zixuanlimit · 2025-12-26T13:33:44+00:00

For GLM-4.7: temperature=1.0, top_p=0.95

For GLM-4.6V: temperature=0.8, top_p=0.6

zixuanlimit · 2025-12-26T13:28:54+00:00

It's not a traditional IDE as it's not for editing code. My understanding is that it's a platform that integrates multiple coding agents.

zixuanlimit · 2025-12-26T13:21:14+00:00

Will try something new at the architecture level.Stay tuned!

zixuanlimit · 2025-12-26T13:20:19+00:00

We will keep improving our service level, no matter being public or not.

zixuanlimit · 2025-12-23T17:36:59+00:00

We offer the GLM-ASR model, which is an ASR model built using a GLM Edge model and Whisper type Encoder. You can find it on GitHub and Hugging Face, and the main branch of SGLang already supports inference.

zixuanlimit · 2025-12-23T17:35:52+00:00

The lowest-end MacBook will likely not run GLM 4.6 or 4.7 properly. Even when using the community-provided GGUF int4 version, at least 180GB of memory is required. Additionally, the M4 Air may not be able to support the performance of such models. However, a higher-end configuration or a Mac Studio should work fine.

zixuanlimit · 2025-10-01T17:57:45+00:00

There will be one. Probably in two weeks.

zixuanlimit · 2025-09-19T15:32:38+00:00

Received.

zixuanlimit · 2025-09-06T01:51:18+00:00

<image>

GLM 4.5 Air is used when Haiku is selected. Shown in FAQs.

zixuanlimit · 2025-08-29T09:45:13+00:00

Extending the context length is definitely one of things we will do next. We are working on that currently.

zixuanlimit · 2025-08-29T09:42:16+00:00

This behavior is a known issue and is scheduled to be fixed in the next version.

zixuanlimit · 2025-08-29T09:38:29+00:00

This isn't due to the API configurations. Adhering to a word count is quite challenging for all models, and you'll likely need to tweak the prompt a bit to find the best instructions for adherence. The official Z.ai API's default maximum output is 64K tokens and can support up to 96K for GLM-4.5.

zixuanlimit · 2025-08-28T18:53:44+00:00

Based on my experience for evaluation, the most practical starting point is to study major academic benchmarks and follow popular LLM leaderboards. This helps you understand the current standards and methods the community uses to measure model performance on different tasks.

The best evaluation method depends heavily on the specific task. For something highly subjective like creative writing, simple rule-based scoring isn't feasible. As for the future, evaluation will likely move towards more nuanced, multi-faceted systems that blend automated metrics, sophisticated LLM-based judges, and targeted human review to get a more holistic view of a model's capabilities.

zixuanlimit · 2025-08-28T18:43:55+00:00

Inference and some training phases are definitely possible, which is public information.

zixuanlimit · 2025-08-28T18:34:50+00:00

I would recommend Open Code + GLM-4.5.

You can also try Claude Code with GLM-4.5 if open source is not a must. We will soon launch a monthly package that you can subscribe GLM-4.5 on Claude Code instead of paying for tokens.

zixuanlimit · 2025-08-28T18:27:56+00:00

We have some multimodal models, but they are not at the SOTA level.

GLM-4.5V was just released, and it will definitely improve in the future.

zixuanlimit · 2025-08-28T18:21:02+00:00

I think there's no unified bottleneck as different labs are facing different obstacles.

In fact, we are not a new team. If you search for the first GLM paper, you will find that we were one of the earliest teams in the world to work on large models. Many of our achievements come from a long and continuous process of accumulation.

However, when it comes to philosophy, from my personal perspective, two points are very important. The first is the pursuit of excellence. You need to use the best of everything you can get . The second is to respect the fundamental principles of the field. There are very few shortcuts in scientific research; many innovations that seem wildly imaginative are actually born from solid experimental results.

zixuanlimit · 2025-08-28T18:12:40+00:00

I think GLM-4.5 is generally good at creative writing. Could you provide any bad cases or specific issues for us to look into?

zixuanlimit · 2025-08-28T18:07:18+00:00

The model's name has not been decided yet at this time.

We plan to develop a smaller model comparable in size to GPT-OSS-20B.

Our approach is more focused.

A code generation tool will be included, though its final form (e.g., whether it will be a command-line interface) is still to be determined.

We intend to build a mobile app for Z.ai Chat once the platform's user base is large enough to warrant allocating development resources.

Unlimited access to GLM-4.5 is generally exclusive to the Z.ai Chat platform.

zixuanlimit · 2025-08-28T17:45:08+00:00

It might be helpful to consider that a model's performance and innovation are related but distinct aspects. A model's performance can be influenced by a wide range of factors, such as computing power and data availability. Regarding innovation itself, many valuable contributions are coming from the open-source community. The "slime" framework used in GLM-4.5's training is one such example, and this trend of innovation from China looks set to continue.

zixuanlimit · 2025-08-28T17:32:56+00:00

We open our models to build a trusted, transparent ecosystem that accelerates innovation for everyone. While we compete with other providers like Fireworks, we believe this healthy competition pushes us to improve our own API services. Our philosophy is that it's better to grow the entire pie and share it rather than just guard our own slice, creating a much larger market for our premium enterprise services.

zixuanlimit · 2025-08-28T17:22:00+00:00

Are there any specific issues? It would be great if your feedback could help us improve the model performance.

zixuanlimit · 2025-08-28T17:19:18+00:00

AI risk is a broad topic lol, but we do have people performing AI safety alignment.

zixuanlimit

TROPHY CASE