Getting PHP, a C binary, Python, and Ollama to reliably hand off through a live encoding server — and the non-obvious failures along the way.
This is Part 2. Part 1 covers the architecture design and the constraints that shaped every decision here.
Debugging a Black Box Pipeline
The most frustrating aspect of this project was that the core pipeline — PHP invoking a C binary via exec() — runs silently in the background:
exec("/home/cms/exec/transcoding_ai $list_idx $mp3_url $output_filename > /dev/null &");
> /dev/null & means stdout is discarded and the process is detached. When something goes wrong, there's no error output, no stack trace, no signal. The CMS admin sees a successful submission. The encoding server logs nothing. The Python API receives no request. You're debugging a black box.
This forced me to build observability into every layer before writing any feature code.
Step 1: Establishing Ground Truth on Each Server
Before touching the AI pipeline, I verified the existing pipeline worked end-to-end. This sounds obvious, but it surfaced two bugs that would have wasted days later.
Finding 1: The C binary didn't exist yet.
I had written transcoding_ai.c and assumed it was compiled. It wasn't.
$ ls -la /home/cms/exec/transcoding_ai
ls: cannot access '/home/cms/exec/transcoding_ai': No such file or directory
This was the reason the AI pipeline never fired — PHP called an executable that didn't exist, exec() silently failed, and everything looked fine from the CMS.
# Compile on 227 server
cd /home/cms/exec/
gcc -o transcoding_ai transcoding_ai.c \
$(mysql_config --cflags --libs) \
-lm
chmod +x transcoding_ai
Note: I initially tried -lcurl since the C code makes HTTP requests. This failed with /usr/bin/ld: cannot find -lcurl. The fix was realizing that the code used popen() to invoke curl as a shell command — not libcurl as a linked library. -lcurl was never needed.
Finding 2: The wrong server was being targeted.
The ASP layer assigns content to either server T1 (227) or T2 (226) based on load balancing:
If transcoder1_count < transcoder2_count Then
transcoder_kind = "T1"
transcoder_url = "http://192.168.1.227/transcoding_ai.php"
Else
transcoder_kind = "T2"
transcoder_url = "http://192.168.1.226/transcoding_ai.php"
End If
I had been SSH'd into 226 looking for logs while the actual traffic was hitting 227. The MySQL job table on 226 showed nothing because no jobs had been routed there. Once I checked the transcoder_kind column in MSSQL for the test content, the correct server was immediately obvious.
Lesson: Always verify which physical server is actually handling a request before assuming the code is wrong.
Step 2: The $cms_rows < 3 Deadlock
After compiling the binary and confirming the correct server, the pipeline still didn't fire. Checking the MySQL queue:
mysql -u cms -pcms cms -e \
"SELECT COUNT(*) as running FROM cms WHERE status='Y';"
# → 3
The existing queue guard was blocking everything:
$cms_sql = "SELECT idx FROM cms WHERE status = 'Y'";
$cms_rows = mysqli_num_rows(mysqli_query($mysqlCon, $cms_sql));
if ($cms_rows < 3) { // 3 < 3 = FALSE
exec($shell_exec); // Never reached
}
The two Linux servers were actively encoding HD video — a job that takes 30–60 minutes. The status='Y' counter stayed at 3, permanently blocking any new jobs, including the MP3 AI job.
My first fix was wrong: I added a direct curl call from PHP to the Python API, bypassing the C binary entirely. This appeared to work — the Python API received the request — but the MP3 download failed with a 404. The CDN URL didn't exist yet because the C binary hadn't run ffmpeg to produce the MP3 file.
❌ Wrong flow:
PHP → Python API immediately → curl CDN URL → 404 (file doesn't exist yet)
✅ Correct flow:
PHP → C binary → ffmpeg MP3 encoding → FTP to CDN → Python API → CDN URL exists
The fix was to understand what the queue guard was actually protecting. It existed for CPU-intensive HD video encoding. MP3 extraction is audio-only (-vn flag, no video codec) and runs in a fraction of the time. The guard should never have applied to it.
// Final logic: two separate code paths
if (substr($output_filename, -4) === '.mp3' && $ai_learn_YN === 'Y') {
// MP3 + AI: always run, regardless of queue depth
$shell_exec = "/home/cms/exec/transcoding_ai $list_idx $mp3_url $output_filename > /dev/null &";
exec($shell_exec, $out, $err);
} else if ($cms_rows < 3) {
// Video/other: respect queue limit
$mp3_url = "not_mp3";
$shell_exec = "/home/cms/exec/transcoding_ai $list_idx $mp3_url $output_filename > /dev/null &";
exec($shell_exec, $out, $err);
}
Step 3: PM2 Port Collision
While debugging, I had started the Python server using PM2 for persistence. This introduced a subtle problem: PM2 was holding port 8001 even after I started the server manually in a separate terminal.
$ pm2 list
┌────┬─────────────┬──────┬──────┬──────────┐
│ 0 │ sermon-api │ fork │ 3 │ stopped │ ← ghost process
│ 1 │ sermon-api │ fork │ 0 │ online │ ← memory: 4.1mb (wrong)
└────┴─────────────┴──────┴──────┴──────────┘
The 4.1mb memory figure was the tell — a properly loaded Python server with Whisper imports should use 40–50mb minimum at idle. The PM2 process was a zombie that had claimed the port but wasn't actually running the Flask app.
Health check requests from the monitoring loop were returning 200 (hitting the zombie), masking the real problem. Manual python api_server.py in a terminal would fail silently because the port was already bound.
# Full cleanup required
pm2 delete all
pm2 kill
pm2 cleardump
After this, running python api_server.py directly in a terminal (with venv activated) became the stable operating mode. For this pilot, the simplicity and observability of a foreground process outweigh the convenience of a process manager.
Step 4: The Python API Handoff
With PHP, the C binary, and the port conflict resolved, I ran the first successful end-to-end manual test:
# On 227 server, directly invoking the binary
/home/cms/exec/transcoding_ai \
917700 \
"https://online.goodtv.co.kr/mp3/2026/YBD/STYBD_20260426.mp3" \
"STYBD_20260426.mp3"
Output on the Linux server:
====== Python AI API 호출 ======
[OK] Python API 호출 성공 (HTTP 202)
Simultaneous output on the Python server (my PC):
[QUEUE] [Job:2746c8fe|ko] STYBD_20260426.mp3
[OK] lang/ko/ 파이프라인 로드 완료
[Job:2746c8fe|ko] downloading | MP3 다운로드 중
[OK] 다운로드 완료: STYBD_20260426.mp3 (24.2 MB)
[Job:2746c8fe|ko] stt | STT 변환 + 텍스트 정제 중 (lang=ko)...
The first automated trigger from a real CMS upload followed the next day after the PHP fix was deployed.
The Python API: Async Job Architecture
The Flask server receives a POST and returns 202 Accepted in milliseconds. All processing happens in a background thread. This is essential — the PHP script has a response timeout, and the AI pipeline takes 20–40 minutes.
@app.route("/process_url", methods=["POST"])
def process_url():
body = request.get_json(force=True)
mp3_url = body.get("mp3_url", "").strip()
mp3_filename = body.get("mp3_filename", "").strip()
lang = body.get("lang", "ko").strip().lower()
# Duplicate job prevention
with JOB_STORE_LOCK:
running = [
j for j in JOB_STORE.values()
if j.get("mp3") == mp3_filename
and j.get("lang") == lang
and j["status"] not in ("done", "error")
]
if running:
return jsonify({"error": "already processing", "job_id": running[0]["job_id"]}), 409
job_id = str(uuid.uuid4())
with JOB_STORE_LOCK:
JOB_STORE[job_id] = {
"job_id": job_id,
"mp3": mp3_filename,
"lang": lang,
"status": "queued",
"started_at": time.time(),
}
threading.Thread(
target=run_pipeline,
args=(job_id, mp3_url, mp3_filename, lang),
daemon=True
).start()
return jsonify({"job_id": job_id, "status": "queued"}), 202
Key design choices:
- In-memory job store: sufficient for pilot scale; Redis would be the production replacement
- TTL-based cleanup: completed jobs are garbage-collected after 1 hour via a recurring timer thread
- Whisper singleton: the model (~6GB VRAM) is loaded once and reused across jobs via double-checked locking to prevent race conditions on concurrent requests
- STT semaphore: a threading.Semaphore(1) limits concurrent Whisper inference to one job, preventing GPU OOM
The VRAM Contention Problem
During testing, STT for a 40-minute sermon was taking 25+ minutes — far longer than expected for an RTX 3060 at float16 precision.
nvidia-smi revealed the issue:
Memory-Usage: 9167MiB / 12288MiB
GPU-Util: 35%
Processes: ollama.exe (two instances), python.exe
Ollama had loaded gemma4:e4b (≈9.6GB) into VRAM for the LLM stages. Whisper large-v3 (≈6GB) was competing for the same 12GB budget, spilling to shared system memory and degrading throughput significantly.
The fix: unload Ollama before running STT, then allow it to reload for the LLM stages.
def _run_stt(mp3_path, transcript_file, lang="ko"):
import subprocess
# Free VRAM before loading Whisper
subprocess.run(["ollama", "stop", "gemma4:e4b"], capture_output=True)
with _stt_semaphore:
# ... Whisper inference ...
pass
# Ollama reloads automatically when next called
This reduced STT time for a 40-minute sermon from ~25 minutes to ~8 minutes.
Monitoring Under Production Conditions
The foreground Flask server prints all pipeline stages to stdout. For the pilot, this is the monitoring interface:
192.168.1.227 - - "POST /process_url HTTP/1.1" 202
[QUEUE] [Job:a319d4a9|ko] STHNK_20260427.mp3
[OK] 다운로드 완료: STHNK_20260427.mp3 (22.6 MB)
STT: 338구간 [01:28, 3.80구간/s]
[OK] 정제 완료: 334개 문장
[BIBLE] 하나님: 30건 교정
[BIBLE] 성령: 12건 교정
[OK] 문맥 교정 완료
완료! 소요: 651.4초
업서트 완료: 27개 (누계: 310개)
[OK] 파이프라인 완료: STHNK_20260427.mp3
Status is also queryable via GET /status/{job_id} — useful when monitoring from a second terminal without watching the log stream.
One key lesson about PowerShell and tqdm: tqdm uses carriage return (\r) to overwrite the current line. Windows PowerShell renders \r differently than a Linux terminal, making it look like the STT has stalled when it's actually progressing normally. A quick GPU utilization check (nvidia-smi) confirms actual activity. Adding flush=True to all print calls resolved the visual lag in the application logs.
In Part 3, I'll cover the AI pipeline internals in detail: how Bible-specific STT correction was implemented, why the "dumber" LLM produced better sermon structuring results, and what the Pinecone vector schema looks like for multi-pastor RAG retrieval.
Stack: PHP · C (gcc) · Python · Flask · threading · faster-whisper · Ollama · Pinecone · NVIDIA RTX 3060
Top comments (0)