_check_compression_model_feasibility() called get_model_context_length() without passing config_context_length, so custom endpoints that do not support /models API queries always fell through to the 128K default, ignoring auxiliary.compression.context_length in config.yaml. Fix: read auxiliary.compression.context_length from config and pass it as config_context_length (highest-priority hint) so the user-configured value is always respected regardless of API availability. Fixes #8499
524 KiB
524 KiB