🔄
Process Runs
Lifecycle, state machine и выполнение процессов
Определение
Process Run — это конкретный запуск процесса с tracked lifecycle, step history, LLM calls linkage и артефактами.
Process Template (processes table)
↓process_runs table (one row per run)
↓process_run_steps
llm_calls
process_artifacts
State Machine
queued
→running
→waiting_user
→succeeded
→degraded
→failed
→cancelled
queued
Процесс в очереди, ожидает выполнения
running
Активное выполнение, LLM работает
waiting_user
Ждёт ввода от пользователя
succeeded
Успешно завершён, output доставлен
degraded
Завершён с fallback (модель, timeout)
failed
Ошибка, incident ticket создан
Execution Flow (из process_runner.py)
1
Load Process Template
# bot/app/services/process_runner.py:345-347
process = await runtime.db.get_process_by_code(process_code)
if process is None:
raise ValueError(f"Unknown process: {process_code}")2
Build Context (RAG + Personal Data)
# bot/app/services/process_runner.py:352-401
needs_rag = metadata_obj.get("rag") is True
needs_personal = metadata_obj.get("personal_data") is True
if needs_rag:
rag_result = await runtime._build_rag_context(input_text)
if needs_personal and user_db_id:
docs = await runtime.db.list_table_documents(user_id=user_db_id)3
Agent Selection
# bot/app/services/process_runner.py:414-426
from .agent_selection import select_agent
agent_sel = await select_agent(
runtime.db,
intent="",
process_code=process_code,
)
if agent_sel and agent_sel.get("selected_agent_code"):
agent_code_used = agent_sel["selected_agent_code"]
effective_system_prompt = agent_sel["assembled_system_prompt"]4
Model Selection + Fallback
# bot/app/services/process_runner.py:452-494
best_available = runtime._pick_best_chat_model(
installed_models,
preferred=process_model,
min_rank=min_model_rank,
)
if best_available is None:
# Relax min_rank if no suitable model
relaxed_best = runtime._pick_best_chat_model(
installed_models, preferred=process_model, min_rank=0
)5
LLM Call with Timeout
# bot/app/services/process_runner.py:573-584
llm_timeout = max(30.0, float(db_timeout_profile["llm_timeout_seconds"]))
response = await asyncio.wait_for(
runtime.chat_llm(
model=llm_request["model"],
messages=llm_request["messages"],
options=llm_request["options"],
),
timeout=primary_timeout,
)6
Memory Fallback (если OOM)
# bot/app/services/process_runner.py:680-774
if runtime._is_ollama_memory_error(detail):
fallback_candidates = _collect_lighter_model_candidates(
runtime, models=installed_models, current=process_model
)
for fallback_model in fallback_candidates:
# Reduce num_predict and try lighter model
llm_request["options"]["num_predict"] = adjusted_num_predict
response = await runtime.chat_llm(...)7
Timeout Fallback
# bot/app/services/process_runner.py:835-952
except asyncio.TimeoutError:
timeout_fallback_candidates = _collect_lighter_model_candidates(...)
for fallback_model in timeout_fallback_candidates:
response = await asyncio.wait_for(
runtime.chat_llm(...),
timeout=timeout_fallback_timeout_seconds,
)
if response is None:
# Degraded response with fallback text
response = {"degraded": True, "error": timeout_error}8
Log LLM Call
# bot/app/services/process_runner.py:990-1006
await runtime.db.log_llm_call(
scope=f"process:{process_code}",
model=process_model,
endpoint="/api/chat",
request_json=llm_request,
response_json=response,
http_status=200,
latency_ms=latency_ms,
process_id=str(process["id"]),
agent_code=_pr_agent_code,
)ProcessExecutionResult
# bot/app/services/process_runner.py:73-79
@dataclass
class ProcessExecutionResult:
text: str # Ответ для пользователя
model: str # Использованная модель
raw: dict[str, Any] # Raw LLM response
degraded: bool = False # True если fallback сработалdegraded=True означает, что процесс завершился через fallback: timeout, memory error, или model rank relaxation.
DB Tables
process_runs- • id, process_id, status
- • input_payload, output_payload
- • started_at, completed_at
- • error_text, degraded_flag
process_run_steps- • id, run_id, step_name
- • status, started_at
- • input, output
llm_calls- • scope, model, endpoint
- • request_json, response_json
- • latency_ms, http_status
- • process_id, agent_code
process_artifacts- • run_id, artifact_type
- • file_path, metadata
Observability
Каждый запуск процесса оставляет trace:
process_runs (1 row)
↓process_run_steps (N rows)
llm_calls (N rows)
automation_events
process_artifacts
Incident Tickets
При timeout или exception автоматически создаётся incident ticket:
# bot/app/services/process_runner.py:953-960
await create_incident_ticket(
runtime,
check_name=f"process:{process_code}",
error_text=str(timeout_error)[:300],
context={
"process_code": process_code,
"model": process_model,
"type": "timeout" # или "exception"
},
)