vllm.v1.worker.mamba_utils ¶
get_mamba_groups ¶
get_mamba_groups(
kv_cache_config: KVCacheConfig,
) -> tuple[list[int], MambaSpec]
Source code in vllm/v1/worker/mamba_utils.py
mamba_copy_block_for_qwen_next ¶
mamba_copy_block_for_qwen_next(
kv_cache_config: KVCacheConfig,
mamba_group_ids: list[int],
src_block_idx: int,
dest_block_idx: int,
accept_token_bias: int,
req_state: CachedRequestState,
forward_context: dict[str, Any],
)
Source code in vllm/v1/worker/mamba_utils.py
postprocess_mamba ¶
postprocess_mamba(
scheduler_output: SchedulerOutput,
kv_cache_config: KVCacheConfig,
input_batch: GPUInputBatch,
requests: dict[str, CachedRequestState],
mamba_state_idx: dict[str, int],
forward_context: dict[str, Any],
)
If a blocks is converted from partial block to full block in this step, copy the state from the block for running state to the new full block.
Source code in vllm/v1/worker/mamba_utils.py
preprocess_mamba ¶
preprocess_mamba(
scheduler_output: SchedulerOutput,
kv_cache_config: KVCacheConfig,
cache_config: CacheConfig,
mamba_state_idx: dict[str, int],
input_batch: GPUInputBatch,
requests: dict[str, CachedRequestState],
forward_context: dict[str, Any],
)
Copy the mamba state of previous step to the last (1 + num_speculative_blocks) block.