Purpose
This tool converts files between Apple .webarchive format and MHTML/MHT format.
It is useful when you have saved a web page in one format but need to open, share, or store it in another format. For example, Safari often uses .webarchive, while many browsers and email-based tools use .mhtml or .mht.
The converter works in both directions:
-
.mhtmlor.mhtto.webarchive -
.webarchiveto.mhtml
It is designed to be simple to run from the command line and does not require any external Python packages.
Main Benefits
The tool helps users:
- Convert saved web pages between common archive formats
- Keep the main page and related resources together
- Choose an output file name when needed
- Let the tool automatically detect the conversion direction
- Use a few optional settings for special cases
Because it uses only Python’s standard library, setup is minimal.
Requirements
To use this tool, you need:
- Python 3.9 or newer
- The
webarchive_mhtml_converter.pyscript - A
.webarchive,.mhtml, or.mhtfile to convert
No additional libraries need to be installed.
Basic Usage
The simplest way to use the tool is to provide the input file.
For an MHTML file:
python webarchive_mhtml_converter.py page.mhtml
This creates:
page.webarchive
For a WebArchive file:
python webarchive_mhtml_converter.py page.webarchive
This creates:
page.mhtml
The tool detects the input file type from the extension and automatically chooses the opposite output format.
Choosing an Output File Name
You can choose the output file path with -o or --output.
Example:
python webarchive_mhtml_converter.py page.mhtml -o converted.webarchive
Another example:
python webarchive_mhtml_converter.py page.webarchive -o converted.mhtml
This is useful when you want to keep the original file name unchanged or save the converted file in another folder.
Choosing the Output Format Manually
If the input file extension is unclear, you can specify the target format with --to.
To create a WebArchive file:
python webarchive_mhtml_converter.py page.dat --to webarchive -o page.webarchive
To create an MHTML file:
python webarchive_mhtml_converter.py page.dat --to mhtml -o page.mhtml
This avoids confusion when the tool cannot determine the conversion direction automatically.
Optional Settings
Keeping CID CSS Links
When converting from MHTML to WebArchive, the tool normally adjusts certain stylesheet links to improve compatibility.
To keep those links unchanged, use:
python webarchive_mhtml_converter.py page.mhtml --keep-cid-css-link
Choosing the WebArchive Plist Format
When converting from MHTML to WebArchive, the default output format is binary.
To create an XML plist instead, use:
python webarchive_mhtml_converter.py page.mhtml --plist-format xml
Excluding Subframes
When converting from WebArchive to MHTML, the tool normally includes subframe archives.
To exclude them, use:
python webarchive_mhtml_converter.py page.webarchive --no-subframes
Error Handling
If the tool cannot convert the file, it prints an error message.
Common reasons include:
- The input file does not exist
- The input path is not a file
- The file extension does not match the requested output type
- The conversion direction cannot be detected
- An option is used with the wrong conversion direction
These messages help identify what needs to be fixed before running the command again.
Summary
webarchive_mhtml_converter.py is a small command-line tool for converting saved web pages between Apple WebArchive and MHTML/MHT formats.
It is best suited for users who need a practical way to move archived web pages between different browsers, systems, or workflows. Basic conversion requires only one command, while optional settings provide more control when needed.
#!/usr/bin/env python3
"""
Unified converter for Apple .webarchive and MHTML/MHT files.
This combines the behavior of the two separate tools:
- MHTML/MHT -> Apple WebArchive
- Apple WebArchive -> MHTML/MHT
Requirements:
- Python 3.9+
- No external dependencies; uses only the Python standard library.
Examples:
# Auto-detect direction from input extension
python webarchive_mhtml_converter.py page.mhtml
python webarchive_mhtml_converter.py page.webarchive
# Explicit output path
python webarchive_mhtml_converter.py page.mhtml -o page.webarchive
python webarchive_mhtml_converter.py page.webarchive -o page.mhtml
# Explicit conversion target
python webarchive_mhtml_converter.py page.dat --to webarchive -o page.webarchive
python webarchive_mhtml_converter.py page.dat --to mhtml -o page.mhtml
# Direction-specific options
python webarchive_mhtml_converter.py page.mhtml --keep-cid-css-link
python webarchive_mhtml_converter.py page.webarchive --no-subframes
"""
from __future__ import annotations
import argparse
import base64
import email
import mimetypes
import plistlib
import re
import sys
import uuid
from email import policy
from email.message import EmailMessage, Message
from email.parser import BytesParser
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple
from urllib.parse import quote, urljoin, urlparse
Resource = Dict[str, Any]
Archive = Dict[str, Any]
MHTML_SUFFIXES = {".mhtml", ".mht"}
WEBARCHIVE_SUFFIXES = {".webarchive"}
TEXT_LIKE_MIME_PREFIXES = ("text/",)
TEXT_LIKE_MIME_TYPES = {
"application/javascript",
"application/ecmascript",
"application/json",
"application/xml",
"application/xhtml+xml",
"image/svg+xml",
}
DEFAULT_BINARY_MIME = "application/octet-stream"
# ---------------------------------------------------------------------------
# Shared helpers
# ---------------------------------------------------------------------------
def normalize_mime_type(mime_type: str) -> str:
return (mime_type or DEFAULT_BINARY_MIME).split(";", 1)[0].strip().lower()
def is_text_like_mime(mime_type: str) -> bool:
mt = normalize_mime_type(mime_type)
return mt.startswith(TEXT_LIKE_MIME_PREFIXES) or mt in TEXT_LIKE_MIME_TYPES
def kind_from_suffix(path: Path) -> Optional[str]:
suffix = path.suffix.lower()
if suffix in WEBARCHIVE_SUFFIXES:
return "webarchive"
if suffix in MHTML_SUFFIXES:
return "mhtml"
return None
def default_output_path(input_path: Path, target: str) -> Path:
if target == "mhtml":
return input_path.with_suffix(".mhtml")
if target == "webarchive":
return input_path.with_suffix(".webarchive")
raise ValueError(f"Unsupported target: {target}")
def infer_target(input_path: Path, output_path: Optional[Path], requested_target: str) -> str:
"""
Return the output target: "mhtml" or "webarchive".
In auto mode, the input extension is authoritative. If the input extension is
unknown, the output extension is used as a fallback.
"""
if requested_target != "auto":
return requested_target
input_kind = kind_from_suffix(input_path)
if input_kind == "webarchive":
return "mhtml"
if input_kind == "mhtml":
return "webarchive"
if output_path is not None:
output_kind = kind_from_suffix(output_path)
if output_kind in {"mhtml", "webarchive"}:
return output_kind
raise ValueError(
"Could not infer conversion direction. Use --to mhtml or --to webarchive."
)
def validate_paths_and_direction(
input_path: Path,
output_path: Path,
target: str,
*,
requested_target: str,
) -> None:
if not input_path.exists():
raise FileNotFoundError(f"Input file does not exist: {input_path}")
if not input_path.is_file():
raise ValueError(f"Input path is not a file: {input_path}")
input_kind = kind_from_suffix(input_path)
output_kind = kind_from_suffix(output_path)
if input_kind == target:
raise ValueError(
f"Input extension already looks like the requested output type ({target}). "
"Check --to or the input file extension."
)
if output_kind is not None and output_kind != target:
raise ValueError(
f"Output extension implies {output_kind}, but target is {target}: {output_path}"
)
if requested_target == "auto" and input_kind is None and output_kind is None:
raise ValueError(
"Could not infer conversion direction from extensions. Use --to explicitly."
)
# ---------------------------------------------------------------------------
# MHTML/MHT -> Apple WebArchive
# ---------------------------------------------------------------------------
def normalize_cid(value: Optional[str]) -> Optional[str]:
if not value:
return None
value = value.strip()
if value.startswith("<") and value.endswith(">"):
value = value[1:-1]
return value
def cid_url_from_part(part: EmailMessage) -> Optional[str]:
cid = normalize_cid(part.get("Content-ID"))
return f"cid:{cid}" if cid else None
def is_absolute_or_cid(url: str) -> bool:
if url.startswith("cid:"):
return True
parsed = urlparse(url)
return bool(parsed.scheme)
def text_encoding_from_content_type(content_type: str) -> str:
msg = Message()
msg["content-type"] = content_type
return msg.get_content_charset() or "utf-8"
def fix_mime_type(url: str, mime: str) -> str:
mime = normalize_mime_type(mime)
path = urlparse(url).path.lower()
guessed, _ = mimetypes.guess_type(path)
if path.endswith(".css") and mime not in {"text/css", DEFAULT_BINARY_MIME}:
return "text/css"
if path.endswith(".css") and mime == DEFAULT_BINARY_MIME:
return "text/css"
if path.endswith(".js") and mime in {"text/plain", DEFAULT_BINARY_MIME}:
return "application/javascript"
if guessed and mime == DEFAULT_BINARY_MIME:
return normalize_mime_type(guessed)
return mime
def payload_bytes(part: EmailMessage) -> bytes:
data = part.get_payload(decode=True)
if data is not None:
return data
raw = part.get_payload()
if isinstance(raw, str):
enc = part.get_content_charset() or "utf-8"
return raw.encode(enc, errors="replace")
return b""
def choose_root_part(msg: EmailMessage, parts: List[EmailMessage]) -> EmailMessage:
start = normalize_cid(msg.get_param("start"))
if start:
for part in parts:
if normalize_cid(part.get("Content-ID")) == start:
return part
# RFC-compatible fallback for typical MHTML saved by browsers.
for part in parts:
if part.get_content_type() in {"text/html", "application/xhtml+xml"}:
return part
raise ValueError("No HTML root part found in MHTML")
def mhtml_part_url(part: EmailMessage, base_url: Optional[str], index: int) -> str:
loc = (part.get("Content-Location") or "").strip()
if loc:
if is_absolute_or_cid(loc):
return loc
if base_url:
return urljoin(base_url, loc)
return loc
cid = cid_url_from_part(part)
if cid:
return cid
return f"mhtml-resource-{index}"
def make_web_resource(url: str, mime: str, data: bytes, *, content_type_header: Optional[str] = None) -> Resource:
fixed_mime = fix_mime_type(url, mime)
if is_text_like_mime(fixed_mime):
data = data.rstrip(b"\x00")
resource: Resource = {
"WebResourceURL": url,
"WebResourceMIMEType": fixed_mime,
"WebResourceData": data,
}
if is_text_like_mime(fixed_mime):
resource["WebResourceTextEncodingName"] = text_encoding_from_content_type(
content_type_header or fixed_mime
)
return resource
def strip_inline_charset(css: str) -> str:
# @charset is only meaningful as a stylesheet byte-stream marker; remove it for <style>.
return re.sub(r'^\s*@charset\s+["\'][^"\']+["\']\s*;\s*', "", css, flags=re.I)
def replace_stylesheet_link(html: str, url: str, css_text: str) -> str:
style_tag = '<style type="text/css">\n' + strip_inline_charset(css_text) + "\n</style>"
def repl(match: re.Match[str]) -> str:
tag = match.group(0)
href_pat = r'\bhref\s*=\s*(["\'])' + re.escape(url) + r'\1'
rel_pat = r'\brel\s*=\s*(["\'])[^"\']*stylesheet[^"\']*\1'
if re.search(href_pat, tag, flags=re.I) and re.search(rel_pat, tag, flags=re.I):
return style_tag
return tag
return re.sub(r"<link\b[^>]*>", repl, html, flags=re.I)
def inline_cid_stylesheets(archive: Archive) -> None:
resources = [archive["WebMainResource"], *archive.get("WebSubresources", [])]
cid_css: List[Tuple[str, str]] = []
for resource in resources:
if resource.get("WebResourceMIMEType") == "text/css" and str(
resource.get("WebResourceURL", "")
).startswith("cid:"):
enc = resource.get("WebResourceTextEncodingName") or "utf-8"
css = (
resource.get("WebResourceData", b"")
.rstrip(b"\x00")
.decode(enc, errors="replace")
)
cid_css.append((resource["WebResourceURL"], css))
if not cid_css:
return
for resource in resources:
if resource.get("WebResourceMIMEType") not in {"text/html", "application/xhtml+xml"}:
continue
enc = resource.get("WebResourceTextEncodingName") or "utf-8"
html = (
resource.get("WebResourceData", b"")
.rstrip(b"\x00")
.decode(enc, errors="replace")
)
for url, css in cid_css:
html = replace_stylesheet_link(html, url, css)
resource["WebResourceData"] = html.encode(enc, errors="replace")
def parse_mhtml(path: Path, *, inline_cid_css: bool = True) -> Archive:
msg = BytesParser(policy=policy.default).parsebytes(path.read_bytes())
if not msg.is_multipart():
raise ValueError("Input is not multipart MHTML")
parts = [p for p in msg.walk() if not p.is_multipart()]
root = choose_root_part(msg, parts)
snapshot_url = (msg.get("Snapshot-Content-Location") or "").strip() or None
root_loc = (root.get("Content-Location") or "").strip() or snapshot_url
root_url = root_loc or cid_url_from_part(root) or "about:blank"
main_data = payload_bytes(root)
main_mime = root.get_content_type() or "text/html"
main_resource = make_web_resource(
root_url,
main_mime,
main_data,
content_type_header=root.get("Content-Type"),
)
subresources: List[Resource] = []
seen_main_identity = id(root)
for idx, part in enumerate(parts, start=1):
if id(part) == seen_main_identity:
# Add a cid: alias for the root only if the HTML might refer to it.
cid = cid_url_from_part(part)
if cid and cid != root_url:
subresources.append(
make_web_resource(
cid,
main_mime,
main_data,
content_type_header=part.get("Content-Type"),
)
)
continue
url = mhtml_part_url(part, root_url, idx)
mime = part.get_content_type() or DEFAULT_BINARY_MIME
data = payload_bytes(part)
subresources.append(
make_web_resource(
url,
mime,
data,
content_type_header=part.get("Content-Type"),
)
)
# Alias Content-ID as cid:... when Content-Location differs.
cid = cid_url_from_part(part)
if cid and cid != url:
subresources.append(
make_web_resource(
cid,
mime,
data,
content_type_header=part.get("Content-Type"),
)
)
# Keep raw relative Content-Location as an alias for snapshots that use it verbatim.
raw_loc = (part.get("Content-Location") or "").strip()
if raw_loc and raw_loc != url and not is_absolute_or_cid(raw_loc):
subresources.append(
make_web_resource(
raw_loc,
mime,
data,
content_type_header=part.get("Content-Type"),
)
)
archive: Archive = {
"WebMainResource": main_resource,
"WebSubresources": subresources,
}
if inline_cid_css:
inline_cid_stylesheets(archive)
return archive
def convert_mhtml_to_webarchive(
input_path: Path,
output_path: Path,
*,
inline_cid_css: bool = True,
plist_format: str = "binary",
) -> None:
archive = parse_mhtml(input_path, inline_cid_css=inline_cid_css)
fmt = plistlib.FMT_BINARY if plist_format == "binary" else plistlib.FMT_XML
with output_path.open("wb") as f:
plistlib.dump(archive, f, fmt=fmt, sort_keys=False)
# ---------------------------------------------------------------------------
# Apple WebArchive -> MHTML/MHT
# ---------------------------------------------------------------------------
def load_webarchive(path: Path) -> Archive:
"""
Load .webarchive with plistlib.
plistlib supports both XML plist and binary plist. In .webarchive files,
WebResourceData is preserved as bytes.
"""
with path.open("rb") as f:
archive = plistlib.load(f)
if not isinstance(archive, dict):
raise ValueError("Invalid webarchive: root object is not a dictionary")
if not isinstance(archive.get("WebMainResource"), dict):
raise ValueError("Invalid webarchive: missing WebMainResource")
return archive
def decode_webresource_data(value: Any) -> bytes:
"""
Decode WebResourceData.
With plistlib this should normally be bytes. Other forms are handled
defensively for compatibility with unusual plist representations.
"""
if value is None:
return b""
if isinstance(value, bytes):
return value
if isinstance(value, bytearray):
return bytes(value)
if isinstance(value, str):
s = value.strip()
# Defensive support for base64 strings.
try:
padded = s + ("=" * ((4 - len(s) % 4) % 4))
decoded = base64.b64decode(padded, validate=True)
if base64.b64encode(decoded).decode("ascii").rstrip("=") == s.rstrip("="):
return decoded
except Exception:
pass
# Defensive support for plain text strings.
return value.encode("utf-8")
if isinstance(value, dict):
for key in ("bytes", "data", "base64", "WebResourceData"):
if key in value:
return decode_webresource_data(value[key])
raise TypeError(f"Unsupported WebResourceData type: {type(value).__name__}")
def guess_mime_type(resource: Resource, data: bytes, is_main: bool) -> str:
mime = resource.get("WebResourceMIMEType")
if isinstance(mime, str) and mime.strip():
return normalize_mime_type(mime)
url = resource.get("WebResourceURL")
if isinstance(url, str):
guessed, _ = mimetypes.guess_type(url)
if guessed:
return normalize_mime_type(guessed)
if is_main:
return "text/html"
# Basic signature detection.
if data.startswith(b"\x89PNG\r\n\x1a\n"):
return "image/png"
if data.startswith(b"\xff\xd8\xff"):
return "image/jpeg"
if data.startswith(b"GIF87a") or data.startswith(b"GIF89a"):
return "image/gif"
if data.startswith(b"RIFF") and data[8:12] == b"WEBP":
return "image/webp"
if data.startswith(b"\x00\x00\x00") and b"ftypavif" in data[:32]:
return "image/avif"
if data.startswith(b"wOFF"):
return "font/woff"
if data.startswith(b"wOF2"):
return "font/woff2"
if data.lstrip().startswith((b"<svg", b"<?xml")):
return "image/svg+xml"
return DEFAULT_BINARY_MIME
def get_charset(resource: Resource, mime_type: str) -> Optional[str]:
encoding = resource.get("WebResourceTextEncodingName")
if isinstance(encoding, str) and encoding.strip():
return encoding.strip()
if is_text_like_mime(mime_type):
return "utf-8"
return None
def content_location(resource: Resource, fallback: str) -> str:
url = resource.get("WebResourceURL")
if isinstance(url, str) and url.strip():
return url.strip()
return fallback
def sanitize_header_value(value: str) -> str:
"""
Keep Content-Location browser-friendly.
Do not use email.header/Header or RFC 2047 encoded-word for URLs. Remove
CR/LF to prevent header injection. Percent-encode non-ASCII only.
"""
value = value.replace("\r", "").replace("\n", "")
try:
value.encode("ascii")
return value
except UnicodeEncodeError:
# Keep normal URL punctuation readable; encode non-ASCII characters.
return quote(value, safe=":/?#[]@!$&'()*+,;=%")
def fold_header_line(name: str, value: str, limit: int = 998) -> bytes:
"""
Fold long header lines without RFC 2047 encoding.
RFC 5322 hard limit is 998 octets per line. MHTML readers are usually
happier with raw folded URLs than encoded-word URLs.
"""
prefix = f"{name}: "
raw = sanitize_header_value(value)
line = prefix + raw
encoded = line.encode("utf-8")
if len(encoded) <= limit:
return encoded + b"\r\n"
# Fold on safe URL boundary characters where possible.
out: List[bytes] = []
current = prefix
for token in re.split(r"([/?&=#.;,:_-])", raw):
if token == "":
continue
candidate = current + token
if len(candidate.encode("utf-8")) <= limit:
current = candidate
continue
out.append(current.encode("utf-8") + b"\r\n")
current = " " + token
if current:
out.append(current.encode("utf-8") + b"\r\n")
return b"".join(out)
def make_content_type(mime_type: str, charset: Optional[str]) -> str:
if charset and is_text_like_mime(mime_type):
return f'{mime_type}; charset="{charset}"'
return mime_type
def wrap_base64(data: bytes) -> bytes:
"""Base64-wrap to MIME's conventional 76-character lines."""
return base64.encodebytes(data).replace(b"\n", b"\r\n")
def make_mhtml_part(
*,
data: bytes,
mime_type: str,
charset: Optional[str],
location: str,
content_id: Optional[str] = None,
) -> bytes:
headers = bytearray()
headers += fold_header_line("Content-Type", make_content_type(mime_type, charset))
headers += b"Content-Transfer-Encoding: base64\r\n"
headers += fold_header_line("Content-Location", location)
if content_id:
headers += fold_header_line("Content-ID", f"<{content_id}>")
return bytes(headers) + b"\r\n" + wrap_base64(data)
def iter_archive_resources(
archive: Archive,
*,
include_subframes: bool,
prefix: str = "",
) -> Iterable[Tuple[Resource, bool, str]]:
"""
Yield (resource, is_main, fallback_location).
Main resource is yielded before subresources for each archive. Subframe
archives are recursively included after subresources.
"""
main = archive.get("WebMainResource")
if isinstance(main, dict):
yield main, True, f"{prefix}main-resource"
subresources = archive.get("WebSubresources") or archive.get("WebSubResources") or []
if isinstance(subresources, list):
for i, resource in enumerate(subresources, start=1):
if isinstance(resource, dict):
yield resource, False, f"{prefix}resource-{i}"
if include_subframes:
subframes = archive.get("WebSubframeArchives") or []
if isinstance(subframes, list):
for frame_index, subarchive in enumerate(subframes, start=1):
if isinstance(subarchive, dict):
frame_prefix = f"{prefix}frame-{frame_index}-"
yield from iter_archive_resources(
subarchive,
include_subframes=True,
prefix=frame_prefix,
)
def build_mhtml(
archive: Archive,
*,
source_name: str,
include_subframes: bool = True,
) -> bytes:
boundary = f"----=_NextPart_{uuid.uuid4().hex}"
boundary_bytes = boundary.encode("ascii")
resources = list(
iter_archive_resources(
archive,
include_subframes=include_subframes,
)
)
if not resources:
raise ValueError("Invalid webarchive: no resources found")
main_resource, _, _ = resources[0]
main_data = decode_webresource_data(main_resource.get("WebResourceData"))
main_mime = guess_mime_type(main_resource, main_data, is_main=True)
lines = bytearray()
lines += b"MIME-Version: 1.0\r\n"
lines += fold_header_line("Subject", f"Converted from {source_name}")
lines += fold_header_line(
"Content-Type",
f'multipart/related; type="{main_mime}"; start="<main-resource>"; boundary="{boundary}"',
)
lines += b"\r\n"
lines += b"This is a multi-part message in MIME format.\r\n"
seen_locations = set()
for resource_index, (resource, is_main, fallback_location) in enumerate(resources):
data = decode_webresource_data(resource.get("WebResourceData"))
if not data:
continue
mime_type = guess_mime_type(resource, data, is_main=is_main)
charset = get_charset(resource, mime_type)
location = content_location(resource, fallback_location)
if location in seen_locations:
continue
seen_locations.add(location)
content_id = "main-resource" if resource_index == 0 else None
lines += b"\r\n--" + boundary_bytes + b"\r\n"
lines += make_mhtml_part(
data=data,
mime_type=mime_type,
charset=charset,
location=location,
content_id=content_id,
)
lines += b"\r\n--" + boundary_bytes + b"--\r\n"
return bytes(lines)
def convert_webarchive_to_mhtml(
input_path: Path,
output_path: Path,
*,
include_subframes: bool = True,
) -> None:
archive = load_webarchive(input_path)
mhtml = build_mhtml(
archive,
source_name=input_path.name,
include_subframes=include_subframes,
)
output_path.write_bytes(mhtml)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Convert between Apple .webarchive and .mhtml/.mht files."
)
parser.add_argument("input", type=Path, help="Input .webarchive, .mhtml, or .mht file")
parser.add_argument(
"-o",
"--output",
type=Path,
help="Output path. Default: input name with the opposite extension.",
)
parser.add_argument(
"--to",
choices=("auto", "mhtml", "webarchive"),
default="auto",
help=(
"Output format. Default auto: .webarchive input becomes .mhtml; "
".mhtml/.mht input becomes .webarchive."
),
)
parser.add_argument(
"--keep-cid-css-link",
action="store_true",
help=(
"For MHTML -> WebArchive only: do not inline cid: stylesheet links. "
"Default is to inline them for better iOS WebArchive compatibility."
),
)
parser.add_argument(
"--plist-format",
choices=("binary", "xml"),
default="binary",
help="For MHTML -> WebArchive only: plist output format. Default: binary.",
)
parser.add_argument(
"--no-subframes",
action="store_true",
help="For WebArchive -> MHTML only: do not include WebSubframeArchives recursively.",
)
return parser
def run(args: argparse.Namespace) -> int:
input_path: Path = args.input
target = infer_target(input_path, args.output, args.to)
output_path: Path = args.output or default_output_path(input_path, target)
validate_paths_and_direction(
input_path,
output_path,
target,
requested_target=args.to,
)
if target == "webarchive":
if args.no_subframes:
raise ValueError("--no-subframes applies only to WebArchive -> MHTML conversion")
convert_mhtml_to_webarchive(
input_path,
output_path,
inline_cid_css=not args.keep_cid_css_link,
plist_format=args.plist_format,
)
elif target == "mhtml":
if args.keep_cid_css_link:
raise ValueError("--keep-cid-css-link applies only to MHTML -> WebArchive conversion")
if args.plist_format != "binary":
raise ValueError("--plist-format applies only to MHTML -> WebArchive conversion")
convert_webarchive_to_mhtml(
input_path,
output_path,
include_subframes=not args.no_subframes,
)
else:
raise ValueError(f"Unsupported target: {target}")
print(f"wrote: {output_path}")
return 0
def main(argv: Optional[Sequence[str]] = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
try:
return run(args)
except Exception as e:
print(f"error: {e}", file=sys.stderr)
return 1
if __name__ == "__main__":
raise SystemExit(main())
Top comments (0)