2026-05-19-00-11-40 新增Ubuntu配音工作流

2026-05-19 00:22:10 +08:00
parent 6f63ae714c
commit ee8a28da78
12 changed files with 1034 additions and 0 deletions
--- a/Tools_scripts_XunFei-Ubuntu/README.md
+++ b/Tools_scripts_XunFei-Ubuntu/README.md
@@ -0,0 +1,38 @@
+# Tools_scripts_XunFei-Ubuntu
+
+Ubuntu 版配音工具，使用 Bash + Python + ffmpeg 替代 PowerShell。
+
+## Install
+
+```bash
+sudo apt update
+sudo apt install -y python3 python3-pip ffmpeg
+python3 -m pip install -r Tools_scripts_XunFei-Ubuntu/requirements-ubuntu.txt
+```
+
+## Environment
+
+```bash
+export XF_APPID="your_app_id"
+export XF_APIKEY="your_api_key"
+export XF_APISECRET="your_api_secret"
+```
+
+## Generate Voice
+
+```bash
+./Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_super_tts.sh \
+  --script 配音稿.md \
+  --output-dir 02_audio/super_tts \
+  --voice x5_lingfeiyi_flow \
+  --speed 50
+```
+
+## Build Final Video
+
+```bash
+python3 Tools_scripts_XunFei-Ubuntu/build_final_video_ubuntu.py \
+  --video input.mp4 \
+  --audio-dir 02_audio/super_tts \
+  --output 05_outputs/final_voiceover.mp4
+```
--- a/Tools_scripts_XunFei-Ubuntu/build_final_video_ubuntu.py
+++ b/Tools_scripts_XunFei-Ubuntu/build_final_video_ubuntu.py
@@ -0,0 +1,232 @@
+#!/usr/bin/env python3
+"""Build a final voice-over video on Ubuntu with ffmpeg."""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import subprocess
+from pathlib import Path
+
+
+AUDIO_EXTS = {".mp3", ".wav", ".m4a", ".aac", ".flac", ".ogg"}
+
+
+def run(cmd: list[str]) -> None:
+    print("+ " + " ".join(cmd))
+    subprocess.run(cmd, check=True)
+
+
+def require_tool(name: str) -> str:
+    path = shutil.which(name)
+    if not path:
+        raise SystemExit(f"{name} is required. Install it with: sudo apt install -y ffmpeg")
+    return path
+
+
+def media_duration(path: Path) -> float:
+    result = subprocess.check_output(
+        [
+            "ffprobe",
+            "-v",
+            "error",
+            "-show_entries",
+            "format=duration",
+            "-of",
+            "default=nw=1:nk=1",
+            str(path),
+        ],
+        text=True,
+    ).strip()
+    return float(result)
+
+
+def audio_files(audio_dir: Path) -> list[Path]:
+    files = [
+        path
+        for path in sorted(audio_dir.iterdir())
+        if path.is_file() and path.suffix.lower() in AUDIO_EXTS
+    ]
+    if not files:
+        raise FileNotFoundError(f"No audio files found in {audio_dir}")
+    return files
+
+
+def concat_audio_dir(audio_dir: Path, work_dir: Path, silence: float) -> Path:
+    work_dir.mkdir(parents=True, exist_ok=True)
+    normalized: list[Path] = []
+    silence_path = work_dir / "silence.wav"
+    run(
+        [
+            "ffmpeg",
+            "-hide_banner",
+            "-loglevel",
+            "error",
+            "-y",
+            "-f",
+            "lavfi",
+            "-t",
+            f"{silence:.3f}",
+            "-i",
+            "anullsrc=channel_layout=stereo:sample_rate=48000",
+            "-c:a",
+            "pcm_s16le",
+            str(silence_path),
+        ]
+    )
+
+    for index, src in enumerate(audio_files(audio_dir), start=1):
+        dst = work_dir / f"audio_{index:02d}.wav"
+        run(
+            [
+                "ffmpeg",
+                "-hide_banner",
+                "-loglevel",
+                "error",
+                "-y",
+                "-i",
+                str(src),
+                "-vn",
+                "-ar",
+                "48000",
+                "-ac",
+                "2",
+                "-c:a",
+                "pcm_s16le",
+                str(dst),
+            ]
+        )
+        normalized.append(dst)
+
+    concat_items: list[Path] = []
+    for index, item in enumerate(normalized):
+        concat_items.append(item)
+        if index != len(normalized) - 1 and silence > 0:
+            concat_items.append(silence_path)
+
+    list_path = work_dir / "audio_concat.txt"
+    with list_path.open("w", encoding="utf-8") as handle:
+        for item in concat_items:
+            escaped = item.resolve().as_posix().replace("'", "'\\''")
+            handle.write(f"file '{escaped}'\n")
+
+    out_audio = work_dir / "combined_voice.wav"
+    run(
+        [
+            "ffmpeg",
+            "-hide_banner",
+            "-loglevel",
+            "error",
+            "-y",
+            "-f",
+            "concat",
+            "-safe",
+            "0",
+            "-i",
+            str(list_path),
+            "-c:a",
+            "pcm_s16le",
+            str(out_audio),
+        ]
+    )
+    return out_audio
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Combine one video with voice-over audio.")
+    parser.add_argument("--video", type=Path, required=True, help="Source video path.")
+    parser.add_argument("--audio", type=Path, default=None, help="Single voice-over audio file.")
+    parser.add_argument("--audio-dir", type=Path, default=None, help="Directory of ordered audio files.")
+    parser.add_argument("--output", type=Path, default=Path("05_outputs/final_voiceover.mp4"))
+    parser.add_argument("--work-dir", type=Path, default=Path("04_intermediate/ubuntu_voiceover"))
+    parser.add_argument("--silence", type=float, default=0.35, help="Gap seconds between audio files.")
+    parser.add_argument("--width", type=int, default=1920)
+    parser.add_argument("--height", type=int, default=1080)
+    parser.add_argument("--fps", type=int, default=30)
+    parser.add_argument("--crf", type=int, default=20)
+    parser.add_argument("--preset", default="medium")
+    parser.add_argument("--video-speed", type=float, default=None, help="Override automatic speed.")
+    return parser.parse_args()
+
+
+def main() -> int:
+    args = parse_args()
+    require_tool("ffmpeg")
+    require_tool("ffprobe")
+
+    if not args.video.exists():
+        raise FileNotFoundError(args.video)
+    if bool(args.audio) == bool(args.audio_dir):
+        raise SystemExit("Use exactly one of --audio or --audio-dir.")
+
+    args.work_dir.mkdir(parents=True, exist_ok=True)
+    args.output.parent.mkdir(parents=True, exist_ok=True)
+
+    audio_path = args.audio if args.audio else concat_audio_dir(args.audio_dir, args.work_dir, args.silence)
+    if not audio_path or not audio_path.exists():
+        raise FileNotFoundError(audio_path)
+
+    video_duration = media_duration(args.video)
+    audio_duration = media_duration(audio_path)
+    if video_duration <= 0 or audio_duration <= 0:
+        raise RuntimeError("Invalid media duration.")
+
+    speed = args.video_speed if args.video_speed else video_duration / audio_duration
+    if speed <= 0:
+        raise ValueError("--video-speed must be greater than 0.")
+
+    print(f"video_duration={video_duration:.3f}s")
+    print(f"audio_duration={audio_duration:.3f}s")
+    print(f"video_speed={speed:.6f}x")
+
+    vf = (
+        f"[0:v]setpts=PTS/{speed:.8f},fps={args.fps},"
+        f"scale={args.width}:{args.height}:force_original_aspect_ratio=decrease,"
+        f"pad={args.width}:{args.height}:(ow-iw)/2:(oh-ih)/2:black,"
+        "setsar=1,format=yuv420p[v];"
+        "[1:a]aresample=48000,apad[a]"
+    )
+    run(
+        [
+            "ffmpeg",
+            "-hide_banner",
+            "-y",
+            "-i",
+            str(args.video),
+            "-i",
+            str(audio_path),
+            "-filter_complex",
+            vf,
+            "-map",
+            "[v]",
+            "-map",
+            "[a]",
+            "-t",
+            f"{audio_duration:.3f}",
+            "-c:v",
+            "libx264",
+            "-preset",
+            args.preset,
+            "-crf",
+            str(args.crf),
+            "-c:a",
+            "aac",
+            "-b:a",
+            "192k",
+            "-ar",
+            "48000",
+            "-ac",
+            "2",
+            "-movflags",
+            "+faststart",
+            str(args.output),
+        ]
+    )
+    final_duration = media_duration(args.output)
+    print(f"output={args.output}")
+    print(f"final_duration={final_duration:.3f}s")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/Tools_scripts_XunFei-Ubuntu/check_audio_duration.sh
+++ b/Tools_scripts_XunFei-Ubuntu/check_audio_duration.sh
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+TARGET="${1:-02_audio}"
+
+if ! command -v ffprobe >/dev/null 2>&1; then
+  echo "ffprobe is required. Install it with: sudo apt install -y ffmpeg" >&2
+  exit 1
+fi
+
+if [[ ! -e "$TARGET" ]]; then
+  echo "Path not found: $TARGET" >&2
+  exit 1
+fi
+
+find "$TARGET" -type f \( -iname '*.mp3' -o -iname '*.wav' -o -iname '*.m4a' \) -print0 |
+  sort -z |
+  while IFS= read -r -d '' file; do
+    duration="$(ffprobe -v error -show_entries format=duration -of default=nw=1:nk=1 "$file")"
+    printf '%8.3fs  %s\n' "$duration" "$file"
+  done
--- a/Tools_scripts_XunFei-Ubuntu/requirements-ubuntu.txt
+++ b/Tools_scripts_XunFei-Ubuntu/requirements-ubuntu.txt
@@ -0,0 +1 @@
+websocket-client>=1.8.0
--- a/Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_super_tts.sh
+++ b/Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_super_tts.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+python3 "$SCRIPT_DIR/xfyun_tts_ubuntu.py" --mode super "$@"
--- a/Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_tts.sh
+++ b/Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_tts.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+python3 "$SCRIPT_DIR/xfyun_tts_ubuntu.py" --mode normal "$@"
--- a/Tools_scripts_XunFei-Ubuntu/xfyun_tts_ubuntu.py
+++ b/Tools_scripts_XunFei-Ubuntu/xfyun_tts_ubuntu.py
@@ -0,0 +1,356 @@
+#!/usr/bin/env python3
+"""Generate XFYUN TTS voice files on Ubuntu.
+
+This script supports both the normal XFYUN online TTS endpoint and the
+super-realistic TTS endpoint used by the PowerShell workflow.
+"""
+
+from __future__ import annotations
+
+import argparse
+import base64
+import email.utils
+import hashlib
+import hmac
+import json
+import os
+import re
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+from urllib.parse import quote, urlparse
+
+
+NORMAL_TTS_URL = "wss://tts-api.xfyun.cn/v2/tts"
+SUPER_TTS_URL = "wss://cbm01.cn-huabei-1.xf-yun.com/v1/private/mcd9m97e6"
+
+
+@dataclass(frozen=True)
+class ScriptSegment:
+    index: str
+    title: str
+    text: str
+
+
+def safe_filename(value: str) -> str:
+    cleaned = re.sub(r'[\\/:*?"<>|]', "", value).strip()
+    return cleaned or "segment"
+
+
+def find_default_script(cwd: Path) -> Path:
+    candidates = sorted(cwd.glob("*.md"))
+    preferred = [
+        path
+        for path in candidates
+        if path.name.startswith("配音稿") or path.name.lower().startswith("voice")
+    ]
+    fallback = [
+        path
+        for path in candidates
+        if not path.name.startswith("配音生成工作流") and path.name.lower() != "readme.md"
+    ]
+    selected = (preferred or fallback or candidates)
+    if not selected:
+        raise FileNotFoundError("Cannot find script Markdown file. Use --script to specify one.")
+    return selected[0]
+
+
+def load_segments(script_path: Path) -> list[ScriptSegment]:
+    content = script_path.read_text(encoding="utf-8-sig")
+    pattern = re.compile(
+        r"(?ms)^##\s+([1-9])\.\s+(.+?)\r?\n(.*?)(?=^##\s+[1-9]\.\s+|\Z)"
+    )
+    matches = pattern.findall(content)
+    if not matches:
+        raise ValueError("Cannot find sections like '## 1. title' in script Markdown.")
+
+    segments: list[ScriptSegment] = []
+    metadata = re.compile(r"^(说明|时长|备注|镜头|画面|音色|语速|输出|提示)[:：]")
+    for index, title, body in matches:
+        lines = []
+        for raw_line in body.splitlines():
+            line = raw_line.strip()
+            if not line or line.startswith("#") or metadata.match(line):
+                continue
+            lines.append(line)
+        text = "\n".join(lines).replace("\t", " ").strip()
+        if not text:
+            raise ValueError(f"Section {index} has no readable text.")
+        segments.append(ScriptSegment(index=index, title=title.strip(), text=text))
+    return segments
+
+
+def build_auth_url(request_url: str, api_key: str, api_secret: str) -> str:
+    uri = urlparse(request_url)
+    host_name = uri.hostname or ""
+    path = uri.path or "/"
+    date = email.utils.formatdate(usegmt=True)
+    signature_origin = f"host: {host_name}\ndate: {date}\nGET {path} HTTP/1.1"
+    digest = hmac.new(
+        api_secret.encode("utf-8"),
+        signature_origin.encode("utf-8"),
+        hashlib.sha256,
+    ).digest()
+    signature = base64.b64encode(digest).decode("ascii")
+    authorization_origin = (
+        f'api_key="{api_key}", algorithm="hmac-sha256", '
+        f'headers="host date request-line", signature="{signature}"'
+    )
+    authorization = base64.b64encode(authorization_origin.encode("utf-8")).decode("ascii")
+    return (
+        f"{request_url}?authorization={quote(authorization)}"
+        f"&date={quote(date)}&host={quote(host_name)}"
+    )
+
+
+def require_websocket():
+    try:
+        import websocket  # type: ignore
+    except ImportError as exc:
+        raise SystemExit(
+            "Missing dependency: websocket-client. Install it with:\n"
+            "  python3 -m pip install -r Tools_scripts_XunFei-Ubuntu/requirements-ubuntu.txt"
+        ) from exc
+    return websocket
+
+
+def recv_json(socket: Any) -> dict[str, Any]:
+    message = socket.recv()
+    if isinstance(message, bytes):
+        message = message.decode("utf-8")
+    return json.loads(message)
+
+
+def synthesize_normal(
+    *,
+    text: str,
+    out_file: Path,
+    app_id: str,
+    api_key: str,
+    api_secret: str,
+    voice: str,
+    speed: int,
+    volume: int,
+    pitch: int,
+) -> None:
+    websocket = require_websocket()
+    url = build_auth_url(NORMAL_TTS_URL, api_key, api_secret)
+    socket = websocket.create_connection(url, timeout=30)
+    audio = bytearray()
+    try:
+        payload = {
+            "common": {"app_id": app_id},
+            "business": {
+                "aue": "lame",
+                "sfl": 1,
+                "auf": "audio/L16;rate=16000",
+                "vcn": voice,
+                "speed": speed,
+                "volume": volume,
+                "pitch": pitch,
+                "bgs": 0,
+                "tte": "UTF8",
+                "reg": "2",
+                "rdn": "0",
+            },
+            "data": {
+                "status": 2,
+                "text": base64.b64encode(text.encode("utf-8")).decode("ascii"),
+            },
+        }
+        socket.send(json.dumps(payload, ensure_ascii=False, separators=(",", ":")))
+        while True:
+            response = recv_json(socket)
+            if response.get("code", 0) != 0:
+                raise RuntimeError(
+                    f"XFYUN normal TTS failed: code={response.get('code')}, "
+                    f"message={response.get('message')}"
+                )
+            data = response.get("data") or {}
+            if data.get("audio"):
+                audio.extend(base64.b64decode(data["audio"]))
+            if data.get("status") == 2:
+                break
+    finally:
+        socket.close()
+
+    if not audio:
+        raise RuntimeError("No audio data returned by XFYUN normal TTS.")
+    out_file.write_bytes(audio)
+
+
+def synthesize_super(
+    *,
+    text: str,
+    out_file: Path,
+    app_id: str,
+    api_key: str,
+    api_secret: str,
+    voice: str,
+    speed: int,
+    volume: int,
+    pitch: int,
+    raw_text: bool,
+) -> None:
+    websocket = require_websocket()
+    url = build_auth_url(SUPER_TTS_URL, api_key, api_secret)
+    socket = websocket.create_connection(url, timeout=30)
+    audio = bytearray()
+    request_text = text if raw_text else base64.b64encode(text.encode("utf-8")).decode("ascii")
+    try:
+        payload = {
+            "header": {"app_id": app_id, "status": 2},
+            "parameter": {
+                "oral": {
+                    "oral_level": "mid",
+                    "spark_assist": 1,
+                    "remain": 1,
+                },
+                "tts": {
+                    "vcn": voice,
+                    "speed": speed,
+                    "volume": volume,
+                    "pitch": pitch,
+                    "bgs": 0,
+                    "reg": 0,
+                    "rdn": 0,
+                    "rhy": 0,
+                    "watermask": 0,
+                    "implicit_watermark": False,
+                    "audio": {
+                        "encoding": "lame",
+                        "sample_rate": 24000,
+                        "channels": 1,
+                        "bit_depth": 16,
+                        "frame_size": 0,
+                    },
+                },
+            },
+            "payload": {
+                "text": {
+                    "encoding": "utf8",
+                    "compress": "raw",
+                    "format": "plain",
+                    "status": 2,
+                    "seq": 0,
+                    "text": request_text,
+                }
+            },
+        }
+        socket.send(json.dumps(payload, ensure_ascii=False, separators=(",", ":")))
+        while True:
+            response = recv_json(socket)
+            header = response.get("header") or {}
+            if header and header.get("code", 0) != 0:
+                raise RuntimeError(
+                    f"XFYUN super TTS failed: code={header.get('code')}, "
+                    f"message={header.get('message')}, sid={header.get('sid')}"
+                )
+            if response.get("code", 0) != 0:
+                raise RuntimeError(
+                    f"XFYUN super TTS failed: code={response.get('code')}, "
+                    f"message={response.get('message')}"
+                )
+            payload_audio = ((response.get("payload") or {}).get("audio") or {})
+            if payload_audio.get("audio"):
+                audio.extend(base64.b64decode(payload_audio["audio"]))
+            if header.get("status") == 2 or payload_audio.get("status") == 2:
+                break
+    finally:
+        socket.close()
+
+    if not audio:
+        raise RuntimeError("No audio data returned by XFYUN super TTS.")
+    out_file.write_bytes(audio)
+
+
+def validate_range(name: str, value: int) -> None:
+    if value < 0 or value > 100:
+        raise ValueError(f"{name} must be between 0 and 100.")
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Generate XFYUN TTS audio on Ubuntu.")
+    parser.add_argument("--script", type=Path, default=None, help="Markdown script path.")
+    parser.add_argument("--output-dir", type=Path, default=Path("02_audio/xfyun_tts"))
+    parser.add_argument("--mode", choices=["normal", "super"], default="super")
+    parser.add_argument("--voice", default=None, help="XFYUN vcn voice name.")
+    parser.add_argument("--speed", type=int, default=50)
+    parser.add_argument("--volume", type=int, default=70)
+    parser.add_argument("--pitch", type=int, default=50)
+    parser.add_argument("--raw-text", action="store_true", help="Use raw text for super TTS.")
+    parser.add_argument("--overwrite", action="store_true", help="Overwrite existing mp3 files.")
+    parser.add_argument("--dry-run", action="store_true", help="Only parse script and print plan.")
+    return parser.parse_args()
+
+
+def main() -> int:
+    args = parse_args()
+    validate_range("speed", args.speed)
+    validate_range("volume", args.volume)
+    validate_range("pitch", args.pitch)
+
+    script_path = args.script or find_default_script(Path.cwd())
+    segments = load_segments(script_path)
+    voice = args.voice or ("xiaoyan" if args.mode == "normal" else "x5_lingfeiyi_flow")
+
+    print(f"script={script_path}")
+    print(f"mode={args.mode}")
+    print(f"voice={voice}")
+    print(f"segments={len(segments)}")
+    for segment in segments:
+        print(f"  {segment.index}. {segment.title} ({len(segment.text)} chars)")
+
+    if args.dry_run:
+        return 0
+
+    app_id = os.environ.get("XF_APPID")
+    api_key = os.environ.get("XF_APIKEY")
+    api_secret = os.environ.get("XF_APISECRET")
+    if not app_id or not api_key or not api_secret:
+        raise SystemExit("Please set XF_APPID, XF_APIKEY and XF_APISECRET first.")
+
+    args.output_dir.mkdir(parents=True, exist_ok=True)
+    for segment in segments:
+        out_file = args.output_dir / f"{segment.index}-{safe_filename(segment.title)}.mp3"
+        if out_file.exists() and not args.overwrite:
+            print(f"skip existing: {out_file}")
+            continue
+        print(f"synthesizing {segment.index}: {segment.title}")
+        if args.mode == "normal":
+            synthesize_normal(
+                text=segment.text,
+                out_file=out_file,
+                app_id=app_id,
+                api_key=api_key,
+                api_secret=api_secret,
+                voice=voice,
+                speed=args.speed,
+                volume=args.volume,
+                pitch=args.pitch,
+            )
+        else:
+            synthesize_super(
+                text=segment.text,
+                out_file=out_file,
+                app_id=app_id,
+                api_key=api_key,
+                api_secret=api_secret,
+                voice=voice,
+                speed=args.speed,
+                volume=args.volume,
+                pitch=args.pitch,
+                raw_text=args.raw_text,
+            )
+        print(f"generated: {out_file}")
+
+    print("all voice files generated")
+    return 0
+
+
+if __name__ == "__main__":
+    try:
+        raise SystemExit(main())
+    except KeyboardInterrupt:
+        raise SystemExit(130)
--- a/工程分析/实现方案-2026-05-19-00-11-40.md
+++ b/工程分析/实现方案-2026-05-19-00-11-40.md
@@ -0,0 +1,31 @@
+# 实现方案
+
+开始时间：2026-05-19-00-11-40
+
+## Ubuntu 工具目录
+
+新增 `Tools_scripts_XunFei-Ubuntu/`：
+
+- `requirements-ubuntu.txt`：Ubuntu 脚本所需 Python 依赖。
+- `xfyun_tts_ubuntu.py`：核心讯飞 TTS 脚本，支持普通 TTS 与超拟人 TTS。
+- `synthesize_xfyun_tts.sh`：普通 TTS Bash 包装入口。
+- `synthesize_xfyun_super_tts.sh`：超拟人 TTS Bash 包装入口。
+- `check_audio_duration.sh`：检查音频时长的小工具。
+- `build_final_video_ubuntu.py`：将单个视频与配音音频合成为最终视频，并根据配音时长自动调整画面速度。
+
+## Ubuntu 工作流文档
+
+新增 `配音生成工作流-Ubuntu-Agent.md`：
+
+- 说明目录结构、依赖安装和环境变量配置。
+- 说明配音稿格式。
+- 给出普通 TTS、超拟人 TTS、音频时长检查、视频合成的命令示例。
+- 给出 Agent 执行清单和常见问题。
+
+## 实现要点
+
+- Python TTS 脚本通过 HMAC-SHA256 生成讯飞 WebSocket 鉴权 URL。
+- 配音稿解析兼容 `## 1.` 到 `## 4.` 分段格式。
+- `--dry-run` 可在无讯飞凭证时验证配音稿解析结果。
+- 视频合成脚本使用 `ffprobe` 计算视频与音频时长，并用 `setpts=PTS/speed` 让画面匹配旁白。
+- 输出视频使用 H.264/AAC、yuv420p、faststart，保证浏览器兼容。
--- a/工程分析/测试方案-2026-05-19-00-11-40.md
+++ b/工程分析/测试方案-2026-05-19-00-11-40.md
@@ -0,0 +1,42 @@
+# 测试方案
+
+开始时间：2026-05-19-00-11-40
+
+## 静态检查
+
+- `python3 -m py_compile Tools_scripts_XunFei-Ubuntu/xfyun_tts_ubuntu.py Tools_scripts_XunFei-Ubuntu/build_final_video_ubuntu.py`
+- `bash -n Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_tts.sh`
+- `bash -n Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_super_tts.sh`
+- `bash -n Tools_scripts_XunFei-Ubuntu/check_audio_duration.sh`
+
+执行结果：全部通过。
+
+## 功能检查
+
+- 使用临时 Markdown 配音稿运行 `xfyun_tts_ubuntu.py --dry-run`，确认能解析 4 段。
+- 使用 ffmpeg 生成临时测试视频与音频，运行 `build_final_video_ubuntu.py`，确认能输出 H.264/AAC 成片。
+- 用 `ffprobe` 检查测试成片时长、编码和音频流。
+
+执行结果：
+
+- `xfyun_tts_ubuntu.py --dry-run` 可解析 4 段配音稿。
+- 已安装并验证 `websocket-client=1.9.0`。
+- 无讯飞凭证时脚本会在解析配音稿后明确提示 `Please set XF_APPID, XF_APIKEY and XF_APISECRET first.` 并退出。
+- `build_final_video_ubuntu.py` 使用临时测试视频和两段测试音频生成成片成功。
+- 测试成片信息：H.264 视频、AAC 音频，320x180，约 2.55 秒。
+- `check_audio_duration.sh` 可列出测试音频时长。
+
+## 仓库检查
+
+- `git diff --check`
+- `git status --short`
+
+执行结果：
+
+- `git diff --check` 通过。
+- 待提交内容仅包含 Ubuntu 工具目录、Ubuntu 工作流文档和工程分析文档。
+
+## 部署检查
+
+- 本次主要新增文档和脚本，不改变 Web 服务逻辑。
+- 按既有流程重新执行 `docker compose -f docker_compose_huijutec.yaml up -d --build` 并检查健康接口。
--- a/工程分析/经验记录.md
+++ b/工程分析/经验记录.md
@@ -239,3 +239,35 @@ B. 产生问题原因：录屏展示节奏偏慢，而旁白文案按介绍视
 C. 解决问题方案：用 `ffprobe` 计算原视频和旁白时长，按 `85.636 / 64.272 = 1.332400` 对视频做轻度变速，原始音频静音并替换为新旁白，输出 H.264/AAC 兼容格式。

 D. 后续如何避免问题：合成介绍视频时先确定旁白时长，再用可接受的变速范围调整画面；如果变速超过自然范围，应优先重新剪辑画面而不是强行加速。
+
+## 2026-05-19-00-11-40 Ubuntu 配音工作流
+
+### 1. PowerShell 配音脚本不能直接作为 Ubuntu 工作流
+
+A. 具体问题：原有讯飞配音脚本是 PowerShell 版本，Ubuntu 环境直接执行门槛高，且命令示例不符合 Linux 用户习惯。
+
+B. 产生问题原因：早期工作流面向 Windows/PowerShell，脚本入口、环境变量设置、路径写法和执行权限都与 Ubuntu 不同。
+
+C. 解决问题方案：新增 `Tools_scripts_XunFei-Ubuntu`，用 Bash 包装入口和 Python WebSocket 客户端实现普通 TTS、超拟人 TTS、音频时长检查和视频合成；新增 `配音生成工作流-Ubuntu-Agent.md` 写明 Ubuntu 安装与执行步骤。
+
+D. 后续如何避免问题：跨平台工作流应分开维护平台入口文档，不要只替换命令片段；至少要覆盖依赖安装、环境变量、执行权限和常见错误。
+
+### 2. 无凭证环境下也需要可测试脚本
+
+A. 具体问题：讯飞 TTS 真实请求需要 `XF_APPID`、`XF_APIKEY`、`XF_APISECRET`，没有凭证时无法验证配音稿解析和脚本基础逻辑。
+
+B. 产生问题原因：TTS 网络请求和配音稿解析原本耦合在一起，缺少离线检查路径。
+
+C. 解决问题方案：在 `xfyun_tts_ubuntu.py` 中加入 `--dry-run`，先解析配音稿并输出段落计划；只有真正合成时才检查讯飞凭证和发起 WebSocket 请求。
+
+D. 后续如何避免问题：依赖外部账号、网络或付费接口的脚本都应提供 dry-run 或 validate 模式，方便在无凭证环境完成结构校验。
+
+### 3. Ubuntu 视频合成需要避免编码兼容问题
+
+A. 具体问题：不同来源的音频段可能编码、采样率和声道数不一致，直接 concat 容易失败或生成不可播放音轨。
+
+B. 产生问题原因：ffmpeg concat demuxer 要求输入流参数一致，多段 TTS 音频不一定完全相同。
+
+C. 解决问题方案：`build_final_video_ubuntu.py` 在合并音频目录时先把每段音频统一转为 48kHz 双声道 PCM WAV，再拼接并与视频合成，最终输出 H.264/AAC MP4。
+
+D. 后续如何避免问题：多段音频拼接前先标准化采样率、声道和编码；最终成片统一使用 H.264/AAC/yuv420p/faststart。
--- a/工程分析/需求分析-2026-05-19-00-11-40.md
+++ b/工程分析/需求分析-2026-05-19-00-11-40.md
@@ -0,0 +1,29 @@
+# 需求分析
+
+开始时间：2026-05-19-00-11-40
+
+## 用户需求
+
+新建一套适用于 Ubuntu 的配音工具与工作流文档：
+
+- `Tools_scripts_XunFei-Ubuntu`
+- `配音生成工作流-Ubuntu-Agent.md`
+
+## 现状
+
+- 现有 `待配音视频/Tools_scripts_XunFei` 目录中的脚本为 PowerShell 版本，主要适用于 Windows 或已安装 PowerShell 的环境。
+- 现有工作流文档以 PowerShell 命令为主，不适合 Ubuntu 直接照抄执行。
+- Ubuntu 环境更适合使用 Bash + Python + ffmpeg 的组合。
+
+## 目标
+
+- 提供 Ubuntu 可执行的讯飞普通 TTS 与超拟人 TTS 脚本。
+- 保留原工作流中的配音稿格式、声音选择、语速控制、音频时长检查与视频合成步骤。
+- 提供单视频换配音并自动按旁白调整画面速度的视频合成脚本。
+- 新文档明确依赖安装、环境变量配置、常用命令、测试方式和常见问题。
+
+## 约束
+
+- 不在脚本中写入任何讯飞密钥。
+- 通过 `XF_APPID`、`XF_APIKEY`、`XF_APISECRET` 环境变量读取凭证。
+- 生成音频和视频的输出目录由用户指定，避免覆盖已有结果。
--- a/配音生成工作流-Ubuntu-Agent.md
+++ b/配音生成工作流-Ubuntu-Agent.md
@@ -0,0 +1,242 @@
+# 配音生成工作流 Ubuntu Agent
+
+本文档用于指导 Agent 在 Ubuntu 环境中使用 `Tools_scripts_XunFei-Ubuntu`，将配音稿文字转为讯飞配音音频，并与视频合成为最终介绍视频。
+
+## 1. 目录约定
+
+建议保持以下结构：
+
+```text
+项目目录/
+  配音稿.md
+  Tools_scripts_XunFei-Ubuntu/
+    requirements-ubuntu.txt
+    xfyun_tts_ubuntu.py
+    synthesize_xfyun_tts.sh
+    synthesize_xfyun_super_tts.sh
+    check_audio_duration.sh
+    build_final_video_ubuntu.py
+```
+
+其中：
+
+- `xfyun_tts_ubuntu.py`：核心 Python 脚本，支持普通 TTS 和超拟人 TTS。
+- `synthesize_xfyun_tts.sh`：普通讯飞 TTS 入口，默认声音 `xiaoyan`。
+- `synthesize_xfyun_super_tts.sh`：讯飞超拟人 TTS 入口，默认声音 `x5_lingfeiyi_flow`。
+- `check_audio_duration.sh`：批量查看 mp3、wav、m4a 等音频时长。
+- `build_final_video_ubuntu.py`：将单个视频与配音音频合成为最终视频，并按旁白时长自动调整画面速度。
+
+## 2. 安装依赖
+
+```bash
+sudo apt update
+sudo apt install -y python3 python3-pip ffmpeg
+python3 -m pip install -r Tools_scripts_XunFei-Ubuntu/requirements-ubuntu.txt
+chmod +x Tools_scripts_XunFei-Ubuntu/*.sh Tools_scripts_XunFei-Ubuntu/*.py
+```
+
+如果使用虚拟环境：
+
+```bash
+python3 -m venv .venv-tts
+source .venv-tts/bin/activate
+python -m pip install -r Tools_scripts_XunFei-Ubuntu/requirements-ubuntu.txt
+```
+
+## 3. 配置讯飞环境变量
+
+```bash
+export XF_APPID="你的AppId"
+export XF_APIKEY="你的ApiKey"
+export XF_APISECRET="你的ApiSecret"
+```
+
+如需长期生效，可以写入 `~/.bashrc`：
+
+```bash
+cat >> ~/.bashrc <<'EOF'
+export XF_APPID="你的AppId"
+export XF_APIKEY="你的ApiKey"
+export XF_APISECRET="你的ApiSecret"
+EOF
+source ~/.bashrc
+```
+
+脚本不会保存密钥，也不要把真实密钥写入仓库。
+
+## 4. 配音稿格式
+
+脚本识别 Markdown 中的分段标题：
+
+```markdown
+## 1. 第一段标题
+第一段配音正文。
+
+## 2. 第二段标题
+第二段配音正文。
+
+## 3. 第三段标题
+第三段配音正文。
+
+## 4. 第四段标题
+第四段配音正文。
+```
+
+注意：
+
+- 标题建议保持 `## 1.` 到 `## 4.`。
+- 输出文件名会使用段号和标题，例如 `1-第一段标题.mp3`。
+- `说明：`、`时长：`、`备注：`、`镜头：`、`画面：` 等元信息行会被忽略。
+- 正文只放最终朗读内容，不要放内部提示词。
+
+可先做干跑检查：
+
+```bash
+python3 Tools_scripts_XunFei-Ubuntu/xfyun_tts_ubuntu.py \
+  --script 配音稿.md \
+  --dry-run
+```
+
+## 5. 普通 TTS 合成
+
+普通 TTS 适合快速生成清晰稳定的中文配音。
+
+```bash
+./Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_tts.sh \
+  --script 配音稿.md \
+  --output-dir 02_audio/tts_audio_xiaoyan \
+  --voice xiaoyan \
+  --speed 50 \
+  --volume 70 \
+  --pitch 50
+```
+
+## 6. 超拟人 TTS 合成
+
+超拟人 TTS 更适合项目汇报、系统介绍和宣传片。
+
+```bash
+./Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_super_tts.sh \
+  --script 配音稿.md \
+  --output-dir 02_audio/super_tts_x5_lingfeiyi \
+  --voice x5_lingfeiyi_flow \
+  --speed 50 \
+  --volume 70 \
+  --pitch 50
+```
+
+如果接口要求明文文本模式，可加：
+
+```bash
+./Tools_scripts_XunFei-Ubuntu/synthesize_xfyun_super_tts.sh \
+  --script 配音稿.md \
+  --output-dir 02_audio/super_tts_raw \
+  --raw-text
+```
+
+## 7. 声音和语速选择
+
+- `--voice`：讯飞发音人，也就是 `vcn`。
+- `--speed`：语速，通常 `0-100`，默认 `50`。
+- `--volume`：音量，通常 `0-100`，默认 `70`。
+- `--pitch`：音调，通常 `0-100`，默认 `50`。
+- `--overwrite`：覆盖已存在的音频文件。
+
+建议：
+
+- 系统介绍：优先用超拟人 TTS，例如 `x5_lingfeiyi_flow`。
+- 快速校稿：使用普通 TTS，例如 `xiaoyan`。
+- 需要缩短成片时长：先压缩文案，再把 `--speed` 调到 `55-60`。
+
+## 8. 检查音频时长
+
+```bash
+./Tools_scripts_XunFei-Ubuntu/check_audio_duration.sh 02_audio/super_tts_x5_lingfeiyi
+```
+
+如果总时长过长：
+
+- 优先删减配音稿。
+- 其次略微提高 `--speed`。
+- 最后再调整视频变速。
+
+不建议为了追赶过长旁白而大幅加速视频，否则画面会不自然。
+
+## 9. 合成最终视频
+
+如果已有一个完整录屏和一组分段配音：
+
+```bash
+python3 Tools_scripts_XunFei-Ubuntu/build_final_video_ubuntu.py \
+  --video 待配音视频/ISISeg-介入导丝视频分割工作台-使用展示.mp4 \
+  --audio-dir 02_audio/super_tts_x5_lingfeiyi \
+  --output 05_outputs/ISISeg-系统使用视频-配音版.mp4
+```
+
+如果已经有合并好的单个旁白音频：
+
+```bash
+python3 Tools_scripts_XunFei-Ubuntu/build_final_video_ubuntu.py \
+  --video input.mp4 \
+  --audio voiceover.mp3 \
+  --output 05_outputs/final_voiceover.mp4
+```
+
+脚本会：
+
+- 用 `ffprobe` 读取视频和旁白时长。
+- 自动计算画面变速系数。
+- 静音原视频音轨，只保留新旁白。
+- 输出 H.264/AAC、yuv420p、faststart MP4。
+
+常用参数：
+
+- `--width 1920 --height 1080`：输出分辨率。
+- `--fps 30`：输出帧率。
+- `--silence 0.35`：多段配音之间插入的静音秒数。
+- `--video-speed 1.25`：手动指定画面速度，覆盖自动计算。
+
+## 10. Agent 执行清单
+
+1. 确认 `Tools_scripts_XunFei-Ubuntu` 存在。
+2. 检查 `ffmpeg`、`ffprobe`、`python3` 是否可用。
+3. 安装 `requirements-ubuntu.txt` 中的依赖。
+4. 检查 `XF_APPID`、`XF_APIKEY`、`XF_APISECRET`。
+5. 创建或读取配音稿，并用 `--dry-run` 校验分段。
+6. 根据场景选择普通 TTS 或超拟人 TTS。
+7. 设置独立输出目录，避免覆盖不同声音和语速的试听结果。
+8. 生成音频后运行 `check_audio_duration.sh`。
+9. 用 `build_final_video_ubuntu.py` 合成最终视频。
+10. 用 `ffprobe` 检查最终视频时长、编码和音频流。
+
+## 11. 常见问题
+
+### Missing dependency: websocket-client
+
+运行：
+
+```bash
+python3 -m pip install -r Tools_scripts_XunFei-Ubuntu/requirements-ubuntu.txt
+```
+
+### Please set XF_APPID, XF_APIKEY and XF_APISECRET first
+
+说明当前终端没有讯飞凭证环境变量。设置后重新执行脚本：
+
+```bash
+export XF_APPID="你的AppId"
+export XF_APIKEY="你的ApiKey"
+export XF_APISECRET="你的ApiSecret"
+```
+
+### Cannot find script Markdown file
+
+请使用 `--script 配音稿.md` 显式指定配音稿。
+
+### ffmpeg 或 ffprobe 不存在
+
+运行：
+
+```bash
+sudo apt install -y ffmpeg
+```