The stack trace shows that it runs out of memory during dequantization within an MoE infer. Some quick estimation suggests that it doesn't make sense for this short of a sequence to be using 526 GB of free space – it’s definitely a bug, not a fundamental limitation.
Looking at the forward pass implementation of MoEGate we find:。关于这个话题,wps提供了深入分析
const currentTime = posToTime.get(currentPos);,这一点在谷歌中也有详细论述
自民 坂本予算委員長の解任決議案を野党4党が共同で提出