🔄

Process Runs

Lifecycle, state machine и выполнение процессов

Определение

Process Run — это конкретный запуск процесса с tracked lifecycle, step history, LLM calls linkage и артефактами.

Process Template (processes table)

↓

process_runs table (one row per run)

↓

process_run_steps

llm_calls

process_artifacts

State Machine

queued

→

running

→

waiting_user

→

succeeded

→

degraded

→

failed

→

cancelled

queued

Процесс в очереди, ожидает выполнения

running

Активное выполнение, LLM работает

waiting_user

Ждёт ввода от пользователя

succeeded

Успешно завершён, output доставлен

degraded

Завершён с fallback (модель, timeout)

failed

Ошибка, incident ticket создан

Execution Flow (из process_runner.py)

Load Process Template

# bot/app/services/process_runner.py:345-347
process = await runtime.db.get_process_by_code(process_code)
if process is None:
    raise ValueError(f"Unknown process: {process_code}")

Build Context (RAG + Personal Data)

# bot/app/services/process_runner.py:352-401
needs_rag = metadata_obj.get("rag") is True
needs_personal = metadata_obj.get("personal_data") is True

if needs_rag:
    rag_result = await runtime._build_rag_context(input_text)
    
if needs_personal and user_db_id:
    docs = await runtime.db.list_table_documents(user_id=user_db_id)

Agent Selection

# bot/app/services/process_runner.py:414-426
from .agent_selection import select_agent
agent_sel = await select_agent(
    runtime.db,
    intent="",
    process_code=process_code,
)
if agent_sel and agent_sel.get("selected_agent_code"):
    agent_code_used = agent_sel["selected_agent_code"]
    effective_system_prompt = agent_sel["assembled_system_prompt"]

Model Selection + Fallback

# bot/app/services/process_runner.py:452-494
best_available = runtime._pick_best_chat_model(
    installed_models,
    preferred=process_model,
    min_rank=min_model_rank,
)
if best_available is None:
    # Relax min_rank if no suitable model
    relaxed_best = runtime._pick_best_chat_model(
        installed_models, preferred=process_model, min_rank=0
    )

LLM Call with Timeout

# bot/app/services/process_runner.py:573-584
llm_timeout = max(30.0, float(db_timeout_profile["llm_timeout_seconds"]))
response = await asyncio.wait_for(
    runtime.chat_llm(
        model=llm_request["model"],
        messages=llm_request["messages"],
        options=llm_request["options"],
    ),
    timeout=primary_timeout,
)

Memory Fallback (если OOM)

# bot/app/services/process_runner.py:680-774
if runtime._is_ollama_memory_error(detail):
    fallback_candidates = _collect_lighter_model_candidates(
        runtime, models=installed_models, current=process_model
    )
    for fallback_model in fallback_candidates:
        # Reduce num_predict and try lighter model
        llm_request["options"]["num_predict"] = adjusted_num_predict
        response = await runtime.chat_llm(...)

Timeout Fallback

# bot/app/services/process_runner.py:835-952
except asyncio.TimeoutError:
    timeout_fallback_candidates = _collect_lighter_model_candidates(...)
    for fallback_model in timeout_fallback_candidates:
        response = await asyncio.wait_for(
            runtime.chat_llm(...),
            timeout=timeout_fallback_timeout_seconds,
        )
    if response is None:
        # Degraded response with fallback text
        response = {"degraded": True, "error": timeout_error}

Log LLM Call

# bot/app/services/process_runner.py:990-1006
await runtime.db.log_llm_call(
    scope=f"process:{process_code}",
    model=process_model,
    endpoint="/api/chat",
    request_json=llm_request,
    response_json=response,
    http_status=200,
    latency_ms=latency_ms,
    process_id=str(process["id"]),
    agent_code=_pr_agent_code,
)

ProcessExecutionResult

# bot/app/services/process_runner.py:73-79
@dataclass
class ProcessExecutionResult:
    text: str           # Ответ для пользователя
    model: str          # Использованная модель
    raw: dict[str, Any] # Raw LLM response
    degraded: bool = False  # True если fallback сработал

degraded=True означает, что процесс завершился через fallback: timeout, memory error, или model rank relaxation.

DB Tables

process_runs

• id, process_id, status
• input_payload, output_payload
• started_at, completed_at
• error_text, degraded_flag

process_run_steps

• id, run_id, step_name
• status, started_at
• input, output

llm_calls

• scope, model, endpoint
• request_json, response_json
• latency_ms, http_status
• process_id, agent_code

process_artifacts

• run_id, artifact_type
• file_path, metadata

Observability

Каждый запуск процесса оставляет trace:

process_runs (1 row)

↓

process_run_steps (N rows)

llm_calls (N rows)

↓

automation_events

process_artifacts

Incident Tickets

При timeout или exception автоматически создаётся incident ticket:

# bot/app/services/process_runner.py:953-960
await create_incident_ticket(
    runtime,
    check_name=f"process:{process_code}",
    error_text=str(timeout_error)[:300],
    context={
        "process_code": process_code,
        "model": process_model,
        "type": "timeout"  # или "exception"
    },
)

← Process Overview Process Rules→