DEV Community: kito2718

BCT4: 3D U-Net Bottlenecks and Minimal CV Score Improvement (0.4483 -> 0.4578)

kito2718 — Mon, 20 Jul 2026 08:14:14 +0000

Biohub - Cell Tracking During Development

Abstract

Constructed and trained a 3D U-Net model in PyTorch for 3D cell centroid detection.
Observed a slight CV score increase from 0.4483 to 0.4578, falling far short of top leaderboard scores(0.982).
Analyzed key technical bottlenecks including extreme 3D class imbalance, spatial resolution loss, and tracking limits.

Overview & Background

In our local validation setup, simple intensity thresholding(blob_dog) struggled to separate dark, dim cells from background noise.
To overcome this limitation, we generated 3D Gaussian heatmaps centered on Ground Truth(GT) cell coordinates and trained a 3D Convolutional Neural Network(3D U-Net). However, even after multiple training epochs and hyperparameter tuning, the CV score only marginally improved to 0.4578.

This post analyzes the technical bottlenecks and structural limitations responsible for this plateau.

Main Content

1. Overall 3D U-Net Pipeline

The complete processing pipeline combining 3D U-Net detection and Linear Sum Assignment(LAP) tracking is summarized below.

Figure 2: Overall pipeline from 3D image input to metric evaluation

2. Quantitative Results & Score Progression

Comparison of predicted node counts and local CV scores across experiments:

Experiment ID	Method & Model	Detection Setting	Predicted Nodes	Edge Jaccard	Division Jaccard	FINAL CV SCORE
Baseline	Traditional Processing(blob_dog)	Intensity Threshold	~19,800	0.4483	0.0000	0.4483
Exp 1	3D U-Net(3 Ep)	Fixed Threshold(0.25)	464,195	0.1223	nan	0.1223
Exp 2	3D U-Net(15 Ep)	Fixed Threshold(0.55)	35,000	0.4578	nan	0.4578
Exp 3	Target 3D U-Net	Adaptive Threshold + LAP	35,000	0.4589	0.0039	0.4593

[Local CV Score Progression]
0.50 |
0.45 |-----------------------+----------+ (Exp 2/3: 0.4593)
     |                       |          |
0.40 |  + (Baseline: 0.4483) |          |
0.35 |  |                    |          |
0.30 |  |                    |          |
0.25 |  |                    |          |
0.20 |  |                    |          |
0.15 |  |   + (Exp 1: 0.1223)|          |
0.00 +--+---+----------------+----------+--------->
       Base Exp1            Exp2       Exp3

3. Key Technical Bottlenecks

We identified three fundamental reasons why a standalone 3D U-Net fails to reach high leaderboard performance:

(1) Extreme Class Imbalance in 3D Space(Zero Collapse)

In a 3D image volume(64x256x256 voxels), cell centers occupy less than 0.1% of total voxels. When training with Standard Mean Squared Error(MSE) loss, the network easily collapses into a local minimum where predicting all background(0.0) minimizes total loss.

(2) Loss of Spatial Resolution in Deep Pooling

Max-pooling operations inside 3D U-Net reduce spatial resolution, blurring boundaries between tightly clustered cells and merging adjacent centroids into a single heatmap blob.

(3) Limitations of Frame-to-Frame Distance Tracking

Greedy or LAP matching based solely on frame-to-frame Euclidean distance fails to capture long-term temporal trajectory consistency and cell division events, introducing false-positive and false-negative edges.

Conclusion

A naive 3D U-Net heatmap regression approach plateaus around CV 0.4593 and cannot reach competitive leaderboard performance(0.96+). Rebuilding the architecture with 3D Instance Segmentation(e.g., StarDist 3D) and Multi-frame Min-Cost Flow tracking is necessary for the next major milestone.

Hope this helps!

Japanese Series Links

For Japanese readers, the full series is published on Zenn:

BCT3: Building a Fast Local Cross-Validation Environment for 80GB Dataset

kito2718 — Mon, 20 Jul 2026 01:50:28 +0000

Figure: Conceptual illustration of 3D cell tracking data analysis and high-performance local CV setup

Abstract

In Kaggle's "Biohub - Cell Tracking During Development" competition, performing model iterations on Kaggle Notebooks for an 81.4GB dataset incurs massive wait times for every "Run All", "Save Version", and "Submit". This article demonstrates how to build a fast and reliable local cross-validation (CV) pipeline on a Windows CPU environment using Astral's uv package manager, resolving Windows DLL security blocks, applying PyTorch CPU patches, and optimizing graph node accessors.

Overview

When competing in computer vision and bioinformatics challenges with massive 3D image datasets (.zarr) and temporal cell tracking graphs (.geff), relying solely on cloud platform execution drastically slows down experimentation.

To achieve a fast feedback loop (seconds to minutes per iteration), we constructed a self-contained local evaluation environment that processes all 199 embryo datasets and outputs a micro-averaged CV score perfectly aligned with Kaggle's official evaluation logic.

Figure: Complete workflow of the local CV pipeline setup

Details

(1) Isolated Directory Structure

We organized the virtual environment, dataset, evaluation repository, and notebooks under a clean local_env folder to maintain portable environment isolation.

007_kaggle_Biohub-Cell_Tracking_During_Development/
├── local_env/                                  <- Virtual environment created with uv
│   ├── Scripts/python.exe                      <- Python interpreter path for VS Code
│   ├── data/                                   <- Downloaded Kaggle dataset (81.4 GB)
│   │    ├── train/
│   │    └── test/
│   ├── src/
│   │    ├── eda_visualizer.py
│   │    └── kaggle_cell_tracking_competition/  <- Organizer's official evaluation code
│   │        └── src/tracking_cellmot/
│   └── notebooks/
│        └── local_validation.ipynb             <- Local validation notebook
└── kaggle_Biohub-Cell_Tracking_During_Development/
    └── notebooks/
        └── baseline_pipeline.ipynb             <- Kaggle submission notebook

(2) Fast Package Management with `uv`

We utilized Astral's Rust-based package manager uv to manage dependencies efficiently.

:: Create local_env virtual environment
uv venv local_env

:: Install dependencies including pre-release tracksdata and CPU PyTorch
uv pip install --prerelease allow --python local_env --extra-index-url https://download.pytorch.org/whl/cpu tracksdata polars zarr scikit-image scipy torch pandas matplotlib

(3) Resolving Windows Security DLL Blocks (Smart App Control)

On Windows environments, binaries installed via pip or uv (such as rustworkx .pyd files) may carry a Mark-of-the-Web (MotW) flag, resulting in DLL load failed errors during import. We unblocked all binaries in PowerShell:

Get-ChildItem -Path local_env -Recurse | Unblock-File

(4) CPU Environment Compatibility Patch (`io.py`)

The organizer's data loader (tracking_cellmot/io.py) throws a RuntimeError on CPU-only environments when invoking .pin_memory(). We patched the loader to conditionally execute .pin_memory() only when CUDA is active.

     if device is not None:
         torch_device = torch.device(device)
         tensor = torch.from_numpy(image).to(torch_device)
-        if pin_memory:
-            tensor = tensor.pin_memory()
+        # Execute pin_memory only when CUDA GPU is available
+        if pin_memory and torch_device.type == "cuda" and torch.cuda.is_available():
+            tensor = tensor.pin_memory()

(5) Resolving Graph Counting Performance Bottleneck

When accessing node and edge counts in tracksdata's InMemoryGraph, calling len(list(predicted_graph.nodes)) causes severe Python overhead due to converting tens of thousands of internal C++/Rust objects into Python lists. We resolved this bottleneck by calling the native accessor methods node_ids() and edge_ids().

# Before (Freezes due to converting 100k+ objects to Python list):
# num_nodes = len(list(predicted_graph.nodes))

# After (Instant O(1) attribute access):
num_nodes = len(predicted_graph.node_ids())
num_edges = len(predicted_graph.edge_ids())

(6) Local Validation Notebook Implementation & GitHub Reference

You can download the ready-to-run notebook directly from our GitHub repository:

GitHub Repository: local_validation.ipynb (kito2718/kaggle_Biohub-Cell_Tracking_During_Development)

Alternatively, create local_env/notebooks/local_validation.ipynb and insert the full source code below:

import os
import sys
import glob
import zarr
import numpy as np
import pandas as pd
from skimage.feature import blob_dog
from scipy.spatial.distance import cdist
from tqdm import tqdm
import polars as pl

# 1. Add organizer repo source to system path
repo_path = os.path.abspath(os.path.join(os.getcwd(), "../src/kaggle_cell_tracking_competition/src"))
if repo_path not in sys.path:
    sys.path.append(repo_path)

import tracksdata as td
from tracking_cellmot.io import open_dataset
from tracking_cellmot.metrics import evaluate, evaluate_datasets

# 2. Data path setup
DATA_DIR = os.path.abspath(os.path.join(os.getcwd(), "../data/train"))
zarr_paths = glob.glob(os.path.join(DATA_DIR, "*.zarr"))
dataset_names = sorted([os.path.basename(p).replace(".zarr", "") for p in zarr_paths])

# Fast iteration setup: Set to 2 for quick testing, None for full CV
NUM_DATASETS_TO_EVAL = 2
eval_datasets = dataset_names if NUM_DATASETS_TO_EVAL is None else dataset_names[:NUM_DATASETS_TO_EVAL]

# 3. Sanity Check (Evaluate Ground Truth directly to verify max score 1.1000)
ds_sanity = open_dataset(os.path.join(DATA_DIR, dataset_names[0]), normalize=True, require_tracks=True, device="cpu")
sanity_res = evaluate(graph=ds_sanity.tracks, gt_graph=ds_sanity.tracks, scale=ds_sanity.scale)
edge_d = sanity_res.edge_tp + sanity_res.edge_fp + sanity_res.edge_fn
edge_j = sanity_res.edge_tp / edge_d if edge_d > 0 else 1.0
div_d = sanity_res.division_tp + sanity_res.division_fp + sanity_res.division_fn
div_j = sanity_res.division_tp / div_d if div_d > 0 else 1.0
print(f"SANITY SCORE: {edge_j + 0.1 * div_j:.4f} (Expected: 1.1000)")

# 4. Main prediction loop
graph_pairs = []
for dataset_name in eval_datasets:
    ds = open_dataset(os.path.join(DATA_DIR, dataset_name), normalize=True, require_tracks=True, device="cpu")
    predicted_graph = td.graph.InMemoryGraph()
    for key in ("z", "y", "x"):
        predicted_graph.add_node_attr_key(key, pl.Float64, 0.0)

    num_frames = ds.image.shape[0]
    prev_nodes_info = []
    for t in tqdm(range(num_frames), desc=f"Frames ({dataset_name})"):
        img_3d = ds.image[t]
        coords = detect_cells_3d(img_3d, min_sigma=2, max_sigma=5, threshold=0.05)
        curr_nodes_info = []
        for idx, (z, y, x) in enumerate(coords):
            node_id = predicted_graph.add_node({"t": int(t), "z": float(z), "y": float(y), "x": float(x)})
            curr_nodes_info.append((idx, node_id, (z, y, x)))

        if len(prev_nodes_info) > 0 and len(curr_nodes_info) > 0:
            coords_prev = np.array([item[2] for item in prev_nodes_info]) * ds.scale
            coords_curr = np.array([item[2] for item in curr_nodes_info]) * ds.scale
            links = track_frame_to_frame(coords_prev, coords_curr, max_distance=15.0)
            for idx_prev, idx_curr in links:
                predicted_graph.add_edge(prev_nodes_info[idx_prev][1], curr_nodes_info[idx_curr][1], {})
        prev_nodes_info = curr_nodes_info

    num_nodes = len(predicted_graph.node_ids())
    num_edges = len(predicted_graph.edge_ids())
    print(f"Dataset {dataset_name} finished: {num_nodes} nodes, {num_edges} edges.")
    graph_pairs.append((predicted_graph, ds.tracks))

# 5. Compute overall CV score
cv_result = evaluate_datasets(graph_pairs=graph_pairs, scale=ds.scale)
print(f"FINAL CV SCORE: {cv_result.score:.6f}")

(7) VS Code Execution Steps

Open the project root folder in VS Code.
Open local_env/notebooks/local_validation.ipynb.
Select local_env/Scripts/python.exe as the kernel interpreter.
Click "Run All". Sanity check (1.1000) completes instantly, followed by the local CV evaluation.

(8) Sanity Check & Baseline Local CV Score Report

We conducted a Sanity Check by passing the ground-truth tracks directly into the evaluator, confirming a theoretical maximum score of 1.1000 (Edge Jaccard: 1.0 + 0.1 * Division Jaccard: 1.0).

Next, we evaluated our 3D DoG detection + nearest neighbor tracking baseline across all 199 embryo datasets, establishing our baseline local CV score:

========================================
===      LOCAL CV SCORE REPORT       ===
========================================
Evaluated datasets count: 199
Edge Jaccard:            0.505233
Division Jaccard:        0.000000
----------------------------------------
FINAL CV SCORE:          0.505233
  (Formula: Edge Jaccard + 0.1 * Division Jaccard)
========================================

The breakdown clearly demonstrates that our baseline achieves 50.5% Edge Jaccard for 1-to-1 cell movements, but scores 0.0 for Division Jaccard because cell division (1-to-2 branching) logic has not yet been implemented.

Conclusion

Establishing a fast local cross-validation environment for an 80GB dataset eliminates cloud platform queueing delays and enables rapid hypothesis testing in seconds to minutes. Next, we will focus on implementing cell division matching logic and global linear assignment matching (Hungarian algorithm) to boost our local CV score.

Hope this helps!

Japanese Series Articles (Zenn)

For Japanese readers, the complete series is published on Zenn:

BCT2: Baseline Source Code Walkthrough

kito2718 — Sat, 18 Jul 2026 08:29:56 +0000

Biohub - Cell Tracking During Development

Abstruct

Explains the structure and lazy-loading mechanics of OME-Zarr datasets.
Step-by-step code walkthrough of the 3D cell detection(Min-Max normalization + blob_dog)and nearest neighbor spatiotemporal tracking(cdist).
Demonstrates how to avoid Kaggle \"Submission Error\"s through explicit data casting.

Introduction and Background

In this article, I will explain the inner workings of the baseline pipeline for the Kaggle Biohub Cell Tracking competition.

Although this baseline script doesn't use machine learning(and achieves a modest score of 0.505), it covers essential concepts such as OME-Zarr chunk loading, scientific image normalization, 3D centroid extraction, and nearest neighbor matching.

Content

1. Understanding OME-Zarr Directory Structures

Before diving into the code, we must understand how input data is structured. In OME-Zarr, a .zarr file is actually a directory containing metadata and raw binary arrays.

dataset.zarr/
├── .zgroup            # Defines that this is a hierarchical group structure
├── .zattrs            # Contains physical dimensions and resolution scaling metadata
├── 0/                 # The highest resolution image array (store['0'])
│   ├── .zarray        # Describes array shape (T,C,Z,Y,X) and chunk compression
│   ├── 0.0.0.0.0      # Chunk binary data (T.C.Z.Y.X index)
│   └── ...

The OME-Zarr specification standardizes how datasets are read:

.zarray defines chunk sizes(e.g., [1, 1, 30, 256, 256]). This means the large 5D array (T, C, Z, Y, X) is chopped up into small binary files containing exactly 1 frame, 1 channel, 30 slices, and 256x256 pixels.
0.0.0.0.0 corresponds to the first chunk: t=0, c=0, z=0..29, y=0..255, and x=0..255(0-indexed).
When we open a store via zarr.open(zarr_path, mode='r'), the library only loads setting files like .zgroup and .zattrs. The actual image data is not loaded into memory until we slice it(e.g., arr[t, 0, :, :, :]). This lazy-loading technique is crucial for managing system memory.

2. Pipeline Architecture

Here is the flowchart and sequence of our cell tracking pipeline:

3. Step-by-Step Code Walkthrough

Step 3.1: Open Zarr Store

We load the Zarr store in read-only mode, fetching only the metadata. We target the highest resolution array named '0'.

# Open the Zarr store
store = zarr.open(zarr_path, mode='r')

# Reference the highest resolution array
arr = store['0']

Step 3.2: Iterate Time Frames

We extract the total frames T and loop through them. We use tqdm to display a progress bar.

num_frames = arr.shape[0]
prev_nodes_info = []

for t in tqdm(range(num_frames), desc="Frames"):
    # Process each frame sequentially

Note: tqdm will display real-time statistics like: Frames: 23%|███ | 23/100 [00:45<02:30, 1.95s/it] showing the current progress and expected remaining time.

Step 3.3: Slice 3D Image

We slice the current 3D image (Z, Y, X) for frame t. This operation triggers the disk read and decompresses the specific binary chunk file.

has_channels = (arr.ndim == 5)
if has_channels:
    img_3d = arr[t, 0, :, :, :]
else:
    img_3d = arr[t, :, :, :]

Step 3.4: Min-Max Normalization & 3D Cell Detection

Microscope images often have uneven illumination. We normalize the values to 0.0..1.0 to maximize contrast, and run blob_dog(Difference of Gaussians)to detect cell nuclei. If no cells are found, we log statistics(Min/Max/Mean)for debugging.

def detect_cells_3d(image_3d, min_sigma=2, max_sigma=5, threshold=0.05):
    img_min = image_3d.min()
    img_max = image_3d.max()
    if img_max > img_min:
        img_norm = (image_3d.astype(np.float32) - img_min) / (img_max - img_min)
    else:
        img_norm = np.zeros_like(image_3d, dtype=np.float32)

    blobs = blob_dog(img_norm, min_sigma=min_sigma, max_sigma=max_sigma, threshold=threshold)
    if len(blobs) > 0:
        return blobs[:, :3] # Return (z, y, x)
    return np.empty((0, 3))

Step 3.5: Physical Scale Transformation & Nearest Neighbor Tracking

Microscope pixels are anisotropic(Z-slice depth is 1.625 µm, while XY pixels are 0.40625 µm). We scale our pixel coordinates to micrometer physical space. Then, we use cdist to compute a distance matrix and execute a greedy nearest-neighbor matching algorithm.

# Convert pixel coordinates to physical space (micrometers)
scale_zyx = np.array([1.625, 0.40625, 0.40625])
coords_prev_physical = coords_prev * scale_zyx
coords_curr_physical = coords_curr * scale_zyx

# Link cells within a threshold of 15.0 µm
links = track_frame_to_frame(coords_prev_physical, coords_curr_physical, max_distance=15.0)

Step 3.6: Combine Nodes & Edges and Type Cast

After processing all frames, we merge detected nodes and tracking edges into a unified DataFrame. To prevent \"Submission Error\"s due to NaNs or float types, we cast all columns explicitly to int64.

# Combine DataFrames
df_sub = pd.concat([df_nodes, df_edges], ignore_index=True)
df_sub.insert(0, 'id', range(len(df_sub)))

# Cast types rigidly
df_sub = df_sub.astype({
    'id': 'int64',
    'node_id': 'int64',
    't': 'int64',
    'z': 'int64',
    'y': 'int64',
    'x': 'int64',
    'source_id': 'int64',
    'target_id': 'int64'
})

Step 3.7: Export CSV

df_sub.to_csv("submission.csv", index=False)

Complete Source Code

Here is the full source code for the baseline pipeline, achieving a score of 0.505 on Kaggle:

!pip install --no-index --find-links=/kaggle/input/datasets/aaaa1597/zarr-offline-installation-wheels/zarr_wheels zarr

import os
import glob
import zarr
import numpy as np
import pandas as pd
from skimage.feature import blob_dog
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
from tqdm import tqdm

###########################################
# 1. Setup and Data Path Verification
# Set up the input data paths in the Kaggle environment.

CANDIDATES = [
    "/kaggle/input/biohub-cell-tracking-during-development",
    "/kaggle/input/competitions/biohub-cell-tracking-during-development",
]

ROOT = "/kaggle/input/biohub-cell-tracking-during-development"
for p in CANDIDATES:
    if os.path.exists(os.path.join(p, "test")):
        ROOT = p
        break

TEST_DIR = os.path.join(ROOT, "test")
print(f"Using TEST_DIR: {TEST_DIR}")

test_zarr_paths = glob.glob(os.path.join(TEST_DIR, "*.zarr"))
print(f"Found {len(test_zarr_paths)} test datasets.")
for p in test_zarr_paths:
    print(f"  {os.path.basename(p)}")

###########################################
# 2. 3D Blob Detection and Tracking Implementation
# Detect cells in each time frame and track them between adjacent frames.

def detect_cells_3d(image_3d, min_sigma=2, max_sigma=5, threshold=0.05):
    """Detect cell centroids (Z,Y,X) from a 3D image."""
    img_min = image_3d.min()
    img_max = image_3d.max()
    if img_max > img_min:
        img_norm = (image_3d.astype(np.float32) - img_min) / (img_max - img_min)
    else:
        img_norm = np.zeros_like(image_3d, dtype=np.float32)

    blobs = blob_dog(img_norm, min_sigma=min_sigma, max_sigma=max_sigma, threshold=threshold)
    if len(blobs) > 0:
        return blobs[:, :3]
    return np.empty((0, 3))

def track_frame_to_frame(coords_prev, coords_curr, max_distance=15.0):
    """Perform nearest neighbor matching between adjacent frames."""
    if len(coords_prev) == 0 or len(coords_curr) == 0:
        return []

    dists = cdist(coords_prev, coords_curr)

    links = []
    used_curr = set()
    for i in range(len(coords_prev)):
        js = np.argsort(dists[i])
        for j in js:
            if j not in used_curr and dists[i, j] <= max_distance:
                links.append((i, j))
                used_curr.add(j)
                break
    return links

###########################################
# 3. Pipeline Execution
# Loop through all test datasets to collect node and edge information.

nodes = []
edges = []

print(f"Processing {len(test_zarr_paths)} datasets.")
for zarr_path in test_zarr_paths:
    dataset_name = os.path.basename(zarr_path).replace(".zarr", "")
    print(f"Processing dataset: {dataset_name}")

    node_counter = 1
    store = zarr.open(zarr_path, mode='r')
    arr = store['0']

    has_channels = (arr.ndim == 5)
    num_frames = arr.shape[0]

    prev_nodes_info = []

    for t in tqdm(range(num_frames), desc="Frames"):
        if has_channels:
            img_3d = arr[t, 0, :, :, :]
        else:
            img_3d = arr[t, :, :, :]

        coords = detect_cells_3d(img_3d, min_sigma=2, max_sigma=5, threshold=0.05)

        if len(coords) == 0:
            min_val = np.min(img_3d)
            max_val = np.max(img_3d)
            mean_val = np.mean(img_3d)
            print(f"Warning: 0 cells detected in dataset '{dataset_name}' at frame {t}. "
                  f"Image Stats -> Min: {min_val}, Max: {max_val}, Mean: {mean_val:.4f}")

        curr_nodes_info = []
        for idx, (z, y, x) in enumerate(coords):
            node_id = node_counter
            node_counter += 1

            nodes.append({
                "dataset": dataset_name,
                "row_type": "node",
                "node_id": int(node_id),
                "t": int(t),
                "z": int(round(z)),
                "y": int(round(y)),
                "x": int(round(x)),
                "source_id": -1,
                "target_id": -1
            })
            curr_nodes_info.append((idx, node_id, (z, y, x)))

        if len(prev_nodes_info) > 0 and len(curr_nodes_info) > 0:
            coords_prev = np.array([item[2] for item in prev_nodes_info])
            coords_curr = np.array([item[2] for item in curr_nodes_info])

            scale_zyx = np.array([1.625, 0.40625, 0.40625])
            coords_prev_physical = coords_prev * scale_zyx
            coords_curr_physical = coords_curr * scale_zyx

            links = track_frame_to_frame(coords_prev_physical, coords_curr_physical, max_distance=15.0)

            if len(links) == 0:
                print(f"Warning: 0 tracking edges created between frame {t-1} and {t} in dataset '{dataset_name}'. "
                      f"Previous node count: {len(prev_nodes_info)}, Current node count: {len(curr_nodes_info)}.")

            for idx_prev, idx_curr in links:
                src_id = prev_nodes_info[idx_prev][1]
                tgt_id = curr_nodes_info[idx_curr][1]

                edges.append({
                    "dataset": dataset_name,
                    "row_type": "edge",
                    "node_id": -1,
                    "t": -1,
                    "z": -1,
                    "y": -1,
                    "x": -1,
                    "source_id": int(src_id),
                    "target_id": int(tgt_id)
                })

        prev_nodes_info = curr_nodes_info

###########################################
# 4. Submission File Generation and Verification

columns_order = ["dataset", "row_type", "node_id", "t", "z", "y", "x", "source_id", "target_id"]

if len(nodes) == 0:
    df_nodes = pd.DataFrame(columns=columns_order)
else:
    df_nodes = pd.DataFrame(nodes)

if len(edges) == 0:
    df_edges = pd.DataFrame(columns=columns_order)
else:
    df_edges = pd.DataFrame(edges)

df_sub = pd.concat([df_nodes, df_edges], ignore_index=True)
df_sub.insert(0, 'id', range(len(df_sub)))
df_sub = df_sub[["id"] + columns_order]

df_sub = df_sub.astype({
    'id': 'int64',
    'node_id': 'int64',
    't': 'int64',
    'z': 'int64',
    'y': 'int64',
    'x': 'int64',
    'source_id': 'int64',
    'target_id': 'int64'
})

print(f"Total rows: {len(df_sub)}")
df_sub.to_csv("submission.csv", index=False)
print("submission.csv has been successfully generated!")

Japanese Version of This Series

You can read the original Japanese version of this article on Zenn:

Conclusion

Now that we have established a baseline, the next logical step is to improve scores by replacing blob_dog with deep learning models(like Cellpose or StarDist)and optimizing matching algorithms(like the Hungarian method).

I hope this helps!

BCTx: Installing Custom Packages Offline

kito2718 — Sat, 18 Jul 2026 08:29:15 +0000

Biohub - Cell Tracking During Development

Abstruct

A guide on how to download python wheel files for custom packages(like Zarr)and import them in internet-disabled Kaggle notebooks.
Resolves dependency version mismatch issues between local and Linux-based Kaggle execution environments.

Introduction and Background

Kaggle code competitions require you to submit your final predictions by running a notebook inside a container with the internet connection disabled.

This is a security precaution to prevent cheating, but it introduces a major headache: if you need to use an external library(such as zarr)that is not pre-installed in the default Kaggle notebook environment, you cannot simply run !pip install zarr. Doing so will fail because the notebook has no internet access to fetch packages from PyPI.

In this article, I will explain a reliable method to download, upload, and install any custom packages completely offline inside Kaggle.

Content

Overall Flow

The general workflow for offline package installation is shown below. In short, we download the necessary wheel(.whl)files on our local PC, upload them to Kaggle as a private Dataset, attach it to our notebook, and install from the local mount path.

Step 1: Download Wheel Files Locally

We must download the packages targeting the specific OS and Python version of the Kaggle environment.
Kaggle notebooks run on Linux, using Python 3.12. If you download wheel files using default Windows/macOS settings, they will not be compatible.

Open your command prompt or terminal, and run the following command. The key is to force the Python version to 3.12 and the platform to manylinux:

pip download zarr -d ./zarr_wheels_fixed --only-binary=:all: --platform manylinux2014_x86_64 --python-version 3.12 --implementation cp

This will download zarr and all of its dependencies(including libraries like numcodecs)as .whl files into the zarr_wheels_fixed directory.

Step 2: Upload Wheels to Kaggle as a Dataset

(1)Navigate to the Kaggle Datasets page and click "New Dataset".

(2)Drag and drop all the .whl files downloaded inside your zarr_wheels_fixed folder.

(3)Name your dataset(e.g., zarr-offline-whl)and choose your visibility preference(Private is fine).

(4)Click "Create" to compile the dataset.

Step 3: Add the Dataset and Install Offline

(1)In your Kaggle notebook editor, toggle the settings panel on the right side and ensure "Internet" is turned off.

(2)Click "Add Input" at the top right of the notebook. Search for your uploaded dataset under your profile and click add.

(3)In the very first cell of your notebook, run the following pip install command, directing pip to look into the local mounted path of your dataset:

!pip install --no-index --find-links=/kaggle/input/datasets/aaaa1597/zarr-offline-installation-wheels/zarr_wheels_fixed zarr

This tells pip not to query index servers(--no-index)and instead look inside the wheels directory. The library will install immediately without any internet connection.

Summary

Even in competitions with internet disabled, you can use any library freely using this method.

Note

For the zarr library, you can easily install it by simply running the following code block at the beginning of your notebook. Please feel free to use it:

!pip install --no-index --find-links=/kaggle/input/datasets/aaaa1597/zarr-offline-installation-wheels/zarr_wheels_fixed zarr

Japanese Version of This Series

You can read the original Japanese version of this article on Zenn:

(Kaggle提出のための)Offline環境でプリインストールされていないライブラリをインストールする方法

Conclusion

In the next part, we will dive deep into the source code of the baseline pipeline, explaining OME-Zarr indexing, Min-Max normalization, and spatiotemporal cell tracking.

I hope this helps!

BCT1: Kaggle Environment Setup and First Submission

kito2718 — Sat, 18 Jul 2026 08:29:14 +0000

Biohub - Cell Tracking During Development

Abstract

Participating in the biological image competition "Biohub - Cell Tracking During Development" on Kaggle.
A summary of steps from setting up the execution environment on Kaggle Notebooks to making your first submission.
Environment: Windows 11.

Introduction and Background

I am taking on my first active Kaggle competition! I have never done life sciences before, but I feel an intense motivation that this will be useful to society. I have occasionally seen TV documentaries showing cell division and movement, and this competition involves exactly that kind of data. Essentially, the task is to detect cell locations(nodes)from spatiotemporal 3D microscope images and link them across frames. My motivation is to challenge myself with something complex like "how AI relates to this" and use it to upgrade my skills.

The data size is extremely large. However, Kaggle Notebooks provides free GPU slots, so I decided to set up my development environment there. I plan to set up a local PC environment later as needed.

Kaggle Notebook Setup Steps

1. Development Environment Setup & Preparation

To participate in this competition, you need a few preparations:

(1)Agree to the Competition Rules:
Go to the Biohub - Cell Tracking During Development page, click the "Join Competition" button, and agree to the terms. If you do not do this, you cannot download data or submit predictions.

(2)Get Kaggle API Token:
Click "Create New Token" on your Kaggle Account Settings page to download the kaggle.json file. This is useful when submitting from a local environment or using the CLI. You can place it under your user directory:

C:\user\xxx\.kaggle\kaggle.json

(3)Create a Kaggle Notebook:
Go to the Code tab and press "New Notebook" to generate one.

(4)Open the Template Notebook:
The notebook content is published here. It is easiest to start by clicking "Copy and Edit":
https://www.kaggle.com/code/aaaa1597/biocell-track-by-hgs

[!CAUTION]
Warning: import zarr fails!
Kaggle does not pre-install zarr, so we must install it separately. However, during the final submission, we must set "Internet" to off. Because of this, we cannot simply write !pip install zarr in our notebook(it will fail during submission evaluation).

I have summarized how to resolve this in the next article:
BCT2: Installing Custom Packages Offline (Please check it out!)

2. Run All

Once your notebook is ready, run it to verify that no errors occur.
Press the "Run All" button.

The progress will be shown in the bottom-left popup. It takes about 10 minutes, so take a break and wait.

3. Save Version

After "Run All" completes successfully, save your notebook by clicking "Save Version".
Select the options and save. The notebook will run all cells again in the background. You can track this progress in the bottom-left popup as well.

↓

4. Submit to Competition

Open the version history, select the successful run, and click "Submit to competition" (as shown below).

Then, click the "Submit" button to finalize.

5. Verify Results

Go to the Submissions page to verify the status.

For example, Version 7 succeeded, whereas Version 6 failed. Version 6 failed because the output file format did not match the submission requirements. After fixing the formatting bugs and running "Run All" again, it went through successfully.

This completes the first submission! From here, the cycle of score optimization begins. Good luck!

Summary

We have successfully verified that our notebook can make a valid submission. Using this baseline as a starting point, we can now move on to applying more advanced 3D cell detection models(such as Cellpose or StarDist)and tracking algorithms(such as Kalman filters or network flows).

First Submission Code

The first submission code is available in this Gist:
@gist

Japanese Version of This Series

You can read the original Japanese version of this article on Zenn:

Conclusion

I hope this helps!

T2I(1). Setting Up a Local Validation Environment for Kaggle's Text-to-Image Generation Challenge

kito2718 — Sat, 11 Jul 2026 00:35:26 +0000

Text-to-Image Generation Challenge Competition

Abstract

Participated in the past Kaggle competition "Text-to-Image Generation Challenge."
Built a local validation environment.
Ran the baseline pipeline end-to-end.

Overview

After graduating from the Kaggle Titanic tutorial, I was thinking about what to do next. My inner voice said, "It has to be computer vision next!" So I decided to take on the "Text-to-Image Generation Challenge" competition.

The goal of this competition is to generate images that match given text prompts(Prompt-to-Image Alignment). It is a cutting-edge challenge and the perfect opportunity to build hands-on skills in generative AI.

In this article, I summarize the steps I took to build a local evaluation environment for image generation and automated evaluation on my PC, along with an overview of how the pipeline works.

Environment Setup

1. Hardware Configuration

The specs of my local PC used for the evaluation environment are as follows:

OS: Windows 11
CPU: Intel Core Processor(24 logical cores)
RAM: 32GB
GPU: NVIDIA GeForce RTX 4060
Storage: HDD(500GB free space)

2. Installed Libraries and Selection Rationale

Python 3.12: Although 3.14 is the latest version, PyTorch and other libraries could not utilize the GPU on 3.14. 😞
uv: A fast Python package manager.
diffusers: The de facto standard library provided by Hugging Face for running diffusion models like Stable Diffusion.
ultralytics(YOLO): Used to automatically detect and evaluate whether objects specified in the prompt are correctly depicted in the generated images.

3. Installation Steps

Run the following commands in PowerShell:

# 1. Install uv
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# 2. Create a virtual environment specifying Python 3.12
uv venv --python 3.12

# 3. Install PyTorch and Torchvision supporting CUDA 12.4 in the virtual environment
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

# 4. Install required dependency libraries all together
uv pip install diffusers "transformers<5.0.0" accelerate pandas ultralytics

Now, a fast image generation environment using CUDA on the GPU(NVIDIA GeForce RTX 4060) is ready.

Running the Baseline Pipeline

I verified that the entire pipeline runs end-to-end:

Load prompts -> Generate images -> Auto-evaluate using the object detection model(YOLO).

uv run src/baseline.py

Processing Flow

1. How Images Are Generated from Text

The core process of generating images from text prompts in Stable Diffusion works as follows.

1.1 Overview of the Generation Process

Stable Diffusion operates on an architecture called the "Latent Diffusion Model(LDM)." Instead of performing computations directly in the high-dimensional pixel space, it compresses the image into a lower-dimensional "Latent Space"¹ and performs noise removal² there. This reduces memory consumption while maintaining generation quality.

Role of Each Component

1. Input

Text Prompt(String): The input string specifying what objects to depict. Example: "A dog sitting on a chair".
Random Noise(Latent Space: 64x64): A numeric array generated on the fly at runtime, represented as a 64x64 array.

2. Text Processing

2.1. CLIP Tokenizer: Splits the input prompt(string) into tokens(words or subwords) and converts them into "Token IDs(an array of numbers)" based on a predefined dictionary.
- Example: The string "A dog on a chair" is converted to a numeric array like [320, 2361, 803, 320, 8942].
2.2. CLIP Text Encoder: Takes the array of Token IDs and converts it into high-dimensional vectors representing semantic meanings("Text Embeddings"), which capture word relationships and nuances. This serves as the guide map for image generation.

3. Latent Space Reverse Diffusion Process(Denoising Loop)
The reverse diffusion process is like watching ink dispersed in water gather back into a single drop(playing time backward). Starting from complete noise(static), U-Net predicts the noise in the image step-by-step, subtracting it to construct the final image.

3.1. U-Net(Noise Prediction in Latent Space): Image generation starts with meaningless random noise(Latent Noise). U-Net takes the noisy image and the text embedding vectors from CLIP as inputs and predicts what noise to remove to bring the image closer to the prompt's meaning. The connection between the text and noise is established using a mechanism called Cross-Attention.
3.2. Scheduler(Noise Reduction Control): An algorithm that controls how much of the predicted noise to subtract at each step. By repeating the loop("noise prediction -> subtraction"), the image gradually emerges from the static, completing the latent representation of the final image.
- Note: The sd-turbo model used in this baseline is a distilled model(Adversarial Diffusion Distortion) that can complete this denoising process in just a single step, making it extremely fast.

4. Reconstruct to Pixel Space

VAE Decoder: Variational Autoencoder(VAE) decoder takes the completed denoised latent representation(usually a small 64x64 size) and decodes it back into the pixel space(e.g., 512x512 pixels) that humans can perceive.

Implementation

The evaluation pipeline reads inputs from input/DreamLayer-Prompt-Kaggle.txt(provided by Kaggle), generates images, and evaluates them using YOLOv8.

Python Code

import torch
import pandas as pd
import re
from pathlib import Path
from diffusers import StableDiffusionPipeline
from ultralytics import YOLO

# Objects corresponding to common COCO classes
common_objects = {
    'man', 'woman', 'person', 'dog', 'cat', 'car', 'truck', 'train', 'airplane',
    'pizza', 'cake', 'donut', 'chair', 'table', 'bed', 'toilet', 'sink', 'mirror', 'clock', 'umbrella'
    # ... (partially omitted)
}

def extract_expected_objects(text):
    words = re.findall(r'\b\w+\b', text.lower())
    return set(word for word in words if word in common_objects)

def calculate_f1_score(expected, detected):
    if len(expected) == 0 and len(detected) == 0:
        return 1.0
    if len(expected) == 0 or len(detected) == 0:
        return 0.0
    true_positives = len(expected.intersection(detected))
    precision = true_positives / len(detected)
    recall = true_positives / len(expected)
    if precision + recall == 0:
        return 0.0
    return 2 * (precision * recall) / (precision + recall)

In the image generation process, a fixed seed value(seed = 42) is set in torch.Generator to ensure reproducibility. The evaluation script outputs generated images and computes the F1-score to produce submission.csv.

Evaluation

Execution Results and Local Evaluation Score

Running the baseline script on the GPU(CUDA) produced the following console output:

Using device: cuda
Reading prompts from input\DreamLayer-Prompt-Kaggle.txt...
Loaded 49 prompts.
Loading pipeline for stabilityai/sd-turbo...
...
[49/49] Generating: 'A group of people standing on a snow covered hill.' -> 0049.png
Image generation complete.
Loading YOLOv8 model for local evaluation...
Running YOLO detection and F1 score calculation...
Saved results.csv to output\results.csv
Saved submission.csv to output\submission.csv

==================================================
LOCAL EVALUATION COMPLETE
Mean F1 Score: 0.5102
==================================================

The local F1 score using SD-Turbo is 0.5102.
This is a poor score for the competition, but it is a starting point.
Below is the list of generated images. You can see that some images do not match the prompts correctly.

1	2	3	4	5	6	7	8	9	10

11	12	13	14	15	16	17	18	19	20

21	22	23	24	25	26	27	28	29	30

31	32	33	34	35	36	37	38	39	40

41	42	43	44	45	46	47	48	49	50

Output Files(`output/`)

output/images/0001.png to 0049.png(Generated images)
output/results.csv(Detailed log of prompts, detected objects, and F1 scores)
output/submission.csv(Kaggle submission CSV)
output/config-dreamlayer.json(Parameter settings for generation)

An F1 score of 0.5102 is definitely a losing score, but it is not a bad start. I will aim for 0.70 from here.

Future Directions

Now that we have a baseline score of 0.5102, we will work on improving it.

Switching to Higher-Precision Generation Models:
While SD-Turbo is fast, its rendering details(especially fine object layouts and shapes) are weak. We plan to switch to more expressive models like SDXL(stabilityai/stable-diffusion-xl-base-1.0) or Flux.
Prompt Engineering:
Modify prompts by adding quality keywords or emphasis tags to ensure target objects are rendered in sizes and layouts that YOLOv8 can easily detect.
Self-Feedback Iteration(Maximization Hack):
Generate multiple images for each prompt using different seed values, and build a system that automatically selects the image with the highest YOLOv8 F1 score to include in the final submission. This allows us to optimize(overfit) the submission set specifically to the evaluator's(YOLOv8) characteristics for a higher score.

Next, I will verify the changes in score by implementing this "Self-Feedback Automatic Selection System" and switching to a higher-quality model.

Conclusion

In this article, I set up a local validation environment for Kaggle's "Text-to-Image Generation Challenge" and verified the end-to-end process, achieving an initial F1 score of 0.5102.

I also gained a deeper understanding of the mechanisms behind how Stable Diffusion generates images from text strings.

From here, I will select different models and perform prompt hacking to improve the score.

I hope this helps.

Japanese Version:

16-(1)Kaggle実践2「Text-to-Image Generation Challenge」コンペ用にローカル検証環境を構築してみた

Q: What is Latent Space?
A: A digital image(pixel space) is a collection of dots(e.g., 512x512), which contains too much information. Latent space is a compressed digital space that extracts only the key features of the image(outlines, colors, semantic meaning) from the raw pixel data.
↩
Q: What is noise removal?
A: This refers to the task handled by the U-Net. The U-Net predicts the "useless noise components" to be subtracted next, working step-by-step. The concept is similar to "sculpting."
↩

Kaggle Titanic Practice 7: Overcoming Class 3 Deadlocks with Ticket Neighbor Survival OOF

kito2718 — Tue, 07 Jul 2026 13:50:53 +0000

Available on GitHub

Abstract

Identified a model blind spot through error analysis showing that misclassified samples for Class 3 passengers made up approximately 57% of the total errors.
Achieved CV accuracy of 0.8687, but Public Score dropped to 0.78947.

Finding the Model's Blind Spot

Using our optimized CatBoost model from the previous runs, we performed a detailed error analysis on the misclassified samples.
The analysis revealed that 56.9% of the 130 misclassified samples were Class 3 (Pclass 3) passengers, specifically male passengers.

Error Analysis (Pclass distribution of misclassified samples)

Reconstructing Cabin Layouts via Ticket Proximity

While the Cabin number feature is missing for approximately 77% of the passengers, the physical room location on the ship was a critical factor for survival. In other words, which cabin area passengers stayed in directly relates to the distance to the lifeboat deck and the escape routes from the flooded areas.
Thus, we focused on the numerical sequence of the Ticket numbers. Passengers with adjacent ticket numbers (difference in the last digits <= 5) likely purchased their tickets together and were assigned adjacent cabins in the same deck sector.
By feeding the "survival rate of adjacent ticket neighbors (who stayed in the same physical area)" to the model, it can capture whether that specific boarding sector was favorable for evacuation or was a deadlock zone.

Ticket Proximity OOF Distribution

Based on this domain knowledge, we engineered two key features:

Ticket Neighbor Survival (OOF_Ticket_Neighbor_Survival): We calculated the average survival rate of adjacent ticket neighbors (excluding the passenger themselves). To prevent target leakage, we computed this using the Out-of-Fold (OOF) method.
Class 3 Prefix Interaction (Prefix_[prefix]_3rd): We created interaction features between major ticket prefixes (a5, pc, ca, stono, sotono2) and Pclass 3 to capture specific boarding sectors.

# Calculation logic for Ticket Neighbor Survival (OOF)
cv_for_oof = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
oof_neighbor_survival = pd.Series(0.5, index=train_fe.index)

for train_idx, val_idx in cv_for_oof.split(train_fe, y_train):
    tr_df = train_fe.iloc[train_idx].copy()
    tr_df['Survived'] = y_train.iloc[train_idx]

    for idx in val_idx:
        row = train_fe.iloc[idx]
        t_num = row['Ticket_Num']
        if not pd.isna(t_num):
            # Extract adjacent ticket neighbor samples within +-5
            neighbors = tr_df[
                (tr_df['Ticket_Num'] >= t_num - 5) & 
                (tr_df['Ticket_Num'] <= t_num + 5) & 
                (tr_df['PassengerId'] != row['PassengerId'])
            ]
            if len(neighbors) > 0:
                oof_neighbor_survival.loc[idx] = neighbors['Survived'].mean()
            else:
                p_s_mean = tr_df[(tr_df['Pclass'] == row['Pclass']) & (tr_df['Sex_male'] == row['Sex_male'])]['Survived'].mean()
                oof_neighbor_survival.loc[idx] = p_s_mean
        else:
            p_s_mean = tr_df[(tr_df['Pclass'] == row['Pclass']) & (tr_df['Sex_male'] == row['Sex_male'])]['Survived'].mean()
            oof_neighbor_survival.loc[idx] = p_s_mean

Validation Results and Kaggle Submission

We ran our evaluation script to compare multiple patterns using 5-Fold Stratified CV.

evaluate_step8.py (Full code of the temporary validation script)

import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import RandomForestRegressor
from catboost import CatBoostClassifier
import optuna
import os

# --- 1. Data Loading and Basic Processing ---
print("--- Loading Data and Basic Processing ---")
train = pd.read_csv('data/raw/train.csv')
test = pd.read_csv('data/raw/test.csv')

df_all = pd.concat([train, test], sort=False).reset_index(drop=True)
df_all['Last_Name'] = df_all['Name'].apply(lambda x: x.split(',')[0])

# Family Survival setup
DEFAULT_SURVIVAL_VALUE = 0.5
df_all['Family_Survival'] = DEFAULT_SURVIVAL_VALUE

for grp, grp_df in df_all.groupby(['Last_Name', 'Fare']):
    if len(grp_df) &gt; 1:
        for ind, row in grp_df.iterrows():
            smax = grp_df.drop(ind)['Survived'].max()
            smin = grp_df.drop(ind)['Survived'].min()
            passID = row['PassengerId']
            if smax == 1.0:
                df_all.loc[df_all['PassengerId'] == passID, 'Family_Survival'] = 1.0
            elif smin == 0.0:
                df_all.loc[df_all['PassengerId'] == passID, 'Family_Survival'] = 0.0

for grp, grp_df in df_all.groupby('Ticket'):
    if len(grp_df) &gt; 1:
        for ind, row in grp_df.iterrows():
            passID = row['PassengerId']
            if df_all.loc[df_all['PassengerId'] == passID, 'Family_Survival'].values[0] == 0.5:
                smax = grp_df.drop(ind)['Survived'].max()
                smin = grp_df.drop(ind)['Survived'].min()
                if smax == 1.0:
                    df_all.loc[df_all['PassengerId'] == passID, 'Family_Survival'] = 1.0
                elif smin == 0.0:
                    df_all.loc[df_all['PassengerId'] == passID, 'Family_Survival'] = 0.0

df_all['Title'] = df_all['Name'].str.extract(r' ([A-Za-z]+)\.', expand=False)
title_map = {'Mr':'Mr','Miss':'Miss','Mrs':'Mrs','Master':'Master','Dr':'Rare','Rev':'Rare','Col':'Rare','Major':'Rare','Mlle':'Miss','Countess':'Rare','Ms':'Miss','Lady':'Rare','Jonkheer':'Rare','Don':'Rare','Dona':'Rare','Mme':'Mrs','Capt':'Rare','Sir':'Rare'}
df_all['Title'] = df_all['Title'].map(title_map).fillna('Rare')

df_all['Fare'] = df_all['Fare'].fillna(df_all['Fare'].median())
df_all['Embarked'] = df_all['Embarked'].fillna(df_all['Embarked'].mode()[0])
df_all['Deck'] = df_all['Cabin'].fillna('U').apply(lambda x: x[0])
df_all['FamilySize'] = df_all['SibSp'] + df_all['Parch'] + 1
df_all['IsAlone'] = (df_all['FamilySize'] == 1).astype(int)

# Age Imputation
age_features = ['Pclass', 'Sex', 'SibSp', 'Parch', 'Fare', 'Embarked', 'Title', 'Deck', 'FamilySize', 'IsAlone', 'Age']
df_age_prep = df_all[age_features].copy()
cat_cols_for_age = ['Sex', 'Embarked', 'Title', 'Deck']
df_age_encoded = pd.get_dummies(df_age_prep, columns=cat_cols_for_age, drop_first=True)
train_age = df_age_encoded[df_age_encoded['Age'].notnull()]
test_age = df_age_encoded[df_age_encoded['Age'].isnull()]
X_train_age = train_age.drop(columns=['Age'])
y_train_age = train_age['Age']
X_test_age = test_age.drop(columns=['Age'])

age_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
age_regressor.fit(X_train_age, y_train_age)
predicted_ages = age_regressor.predict(X_test_age)
df_all.loc[df_all['Age'].isnull(), 'Age'] = predicted_ages

# Advanced Group features
df_all['Ticket_Group_Size'] = df_all.groupby('Ticket')['PassengerId'].transform('count')
df_all['Group_Id'] = df_all['Ticket']
mask = df_all['Ticket_Group_Size'] == 1
df_all.loc[mask, 'Group_Id'] = df_all.loc[mask, 'Last_Name'] + '_' + df_all.loc[mask, 'Fare'].astype(str)

df_all['Group_Size'] = df_all.groupby('Group_Id')['PassengerId'].transform('count')
df_all['Is_Female_or_Child'] = ((df_all['Sex'] == 'female') | (df_all['Age'] &lt; 16)).astype(int)
df_all['Group_Female_Child_Ratio'] = df_all.groupby('Group_Id')['Is_Female_or_Child'].transform('mean')
df_all['Group_Mean_Age'] = df_all.groupby('Group_Id')['Age'].transform('mean')
pclass_fare_median = df_all.groupby('Pclass')['Fare'].transform('median')
df_all['Group_Fare_Median_Diff'] = df_all['Fare'] - pclass_fare_median

df_all = df_all.drop(columns=['Is_Female_or_Child'])


# --- 2. New Feature Engineering ---
df_all['Fare_per_person'] = df_all['Fare'] / df_all['Ticket_Group_Size']

# Step 1: Socio-Physical Class Features
fare_threshold_3rd = df_all[df_all['Pclass'] == 3]['Fare_per_person'].quantile(0.1)
df_all['Is_Ultra_Poor_3rd'] = ((df_all['Pclass'] == 3) &amp; (df_all['Fare_per_person'] &lt;= fare_threshold_3rd)).astype(int)
df_all['Embarked_S_3rd'] = ((df_all['Pclass'] == 3) &amp; (df_all['Embarked'] == 'S')).astype(int)
df_all['Embarked_C_3rd'] = ((df_all['Pclass'] == 3) &amp; (df_all['Embarked'] == 'C')).astype(int)
df_all['Embarked_Q_3rd'] = ((df_all['Pclass'] == 3) &amp; (df_all['Embarked'] == 'Q')).astype(int)

# Step 2: Proximity Features
def extract_ticket_num(ticket):
    parts = ticket.split()
    if len(parts) == 0:
        return np.nan
    last_part = parts[-1]
    if last_part.isdigit():
        return int(last_part)
    return np.nan
df_all['Ticket_Num'] = df_all['Ticket'].apply(extract_ticket_num)

def extract_ticket_prefix(ticket):
    parts = ticket.split()
    if len(parts) &gt; 1:
        prefix = "".join(parts[:-1])
        prefix = prefix.replace(".", "").replace("/", "").lower()
        return prefix
    return 'none'
df_all['Ticket_Prefix'] = df_all['Ticket'].apply(extract_ticket_prefix)

major_prefixes = ['a5', 'pc', 'ca', 'stono', 'sotono2']
for pref in major_prefixes:
    col_name = f'Prefix_{pref}_3rd'
    df_all[col_name] = ((df_all['Pclass'] == 3) &amp; (df_all['Ticket_Prefix'] == pref)).astype(int)

# Step 3: Family Action Signals
df_all['Male_3rd_Has_Family'] = ((df_all['Pclass'] == 3) &amp; (df_all['Sex'] == 'male') &amp; (df_all['Age'] &gt;= 16) &amp; (df_all['SibSp'] + df_all['Parch'] &gt; 0)).astype(int)
df_all['Group_Has_Child'] = df_all.groupby('Group_Id')['Age'].transform(lambda x: (x &lt; 16).any()).astype(int)
df_all['Is_3rd_Parent_Guardian'] = ((df_all['Pclass'] == 3) &amp; (df_all['Age'] &gt;= 16) &amp; (df_all['Group_Has_Child'] == 1)).astype(int)

df_all = df_all.drop(columns=['Group_Has_Child', 'Ticket_Group_Size'])


# --- 3. Feature Set Definition ---
base_features = [
    'Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Family_Survival', 'FamilySize', 'IsAlone',
    'Group_Size', 'Group_Female_Child_Ratio', 'Group_Mean_Age', 'Group_Fare_Median_Diff',
    'Sex_male', 'Embarked_Q', 'Embarked_S', 'Title_Miss', 'Title_Mr', 'Title_Mrs', 'Title_Rare',
    'Deck_B', 'Deck_C', 'Deck_D', 'Deck_E', 'Deck_F', 'Deck_G', 'Deck_T', 'Deck_U'
]
features_pattern_a = ['Is_Ultra_Poor_3rd', 'Embarked_S_3rd', 'Embarked_C_3rd', 'Embarked_Q_3rd']
features_pattern_c = ['Male_3rd_Has_Family', 'Is_3rd_Parent_Guardian']


# --- 4. Dummy Variable Encoding ---
cat_cols = ['Sex', 'Embarked', 'Title', 'Deck']
df_encoded = pd.get_dummies(df_all, columns=cat_cols, drop_first=True)

train_fe = df_encoded.iloc[:len(train)].copy()
test_fe = df_encoded.iloc[len(train):].copy()
y_train = train_fe['Survived'].astype(int)

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

def get_objective(X_tr_full, y_tr):
    def objective(trial):
        params = {
            'iterations': trial.suggest_int('iterations', 50, 300),
            'depth': trial.suggest_int('depth', 3, 8),
            'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.15),
            'l2_leaf_reg': trial.suggest_float('l2_leaf_reg', 1, 10),
            'verbose': 0,
            'random_seed': 42
        }
        model = CatBoostClassifier(**params)
        scores = []
        for train_idx, val_idx in cv.split(X_tr_full, y_tr):
            X_tr, y_tr_fold = X_tr_full.iloc[train_idx], y_tr.iloc[train_idx]
            X_va, y_va_fold = X_tr_full.iloc[val_idx], y_tr.iloc[val_idx]
            model.fit(X_tr, y_tr_fold)
            preds = model.predict(X_va)
            scores.append(np.mean(preds == y_va_fold))
        return np.mean(scores)
    return objective

optuna.logging.set_verbosity(optuna.logging.WARNING)
results = {}

# [Baseline]
print("\n--- Evaluating Baseline ---")
X_train_base = train_fe[base_features]
study = optuna.create_study(direction='maximize')
study.optimize(get_objective(X_train_base, y_train), n_trials=20)
results['Baseline'] = {'accuracy': study.best_value, 'params': study.best_params, 'features': base_features}

# [Pattern A]
print("\n--- Evaluating Pattern A ---")
features_a = base_features + features_pattern_a
X_train_a = train_fe[features_a]
study = optuna.create_study(direction='maximize')
study.optimize(get_objective(X_train_a, y_train), n_trials=20)
results['Pattern A'] = {'accuracy': study.best_value, 'params': study.best_params, 'features': features_a}

# [Pattern B]
print("\n--- Evaluating Pattern B ---")
oof_neighbor_survival = pd.Series(0.5, index=train_fe.index)
for train_idx, val_idx in cv.split(train_fe, y_train):
    tr_df = train_fe.iloc[train_idx].copy()
    tr_df['Survived'] = y_train.iloc[train_idx]
    for idx in val_idx:
        row = train_fe.iloc[idx]
        t_num = row['Ticket_Num']
        if not pd.isna(t_num):
            neighbors = tr_df[(tr_df['Ticket_Num'] &gt;= t_num - 5) &amp; (tr_df['Ticket_Num'] &lt;= t_num + 5) &amp; (tr_df['PassengerId'] != row['PassengerId'])]
            if len(neighbors) &gt; 0:
                oof_neighbor_survival.loc[idx] = neighbors['Survived'].mean()
            else:
                oof_neighbor_survival.loc[idx] = tr_df[(tr_df['Pclass'] == row['Pclass']) &amp; (tr_df['Sex_male'] == row['Sex_male'])]['Survived'].mean()
        else:
            oof_neighbor_survival.loc[idx] = tr_df[(tr_df['Pclass'] == row['Pclass']) &amp; (tr_df['Sex_male'] == row['Sex_male'])]['Survived'].mean()

train_fe_b = train_fe.copy()
train_fe_b['OOF_Ticket_Neighbor_Survival'] = oof_neighbor_survival
features_b = base_features + [f'Prefix_{pref}_3rd' for pref in major_prefixes] + ['OOF_Ticket_Neighbor_Survival']
X_train_b = train_fe_b[features_b]

study = optuna.create_study(direction='maximize')
study.optimize(get_objective(X_train_b, y_train), n_trials=20)
results['Pattern B'] = {'accuracy': study.best_value, 'params': study.best_params, 'features': features_b}

# [Pattern C]
print("\n--- Evaluating Pattern C ---")
features_c = base_features + features_pattern_c
X_train_c = train_fe[features_c]
study = optuna.create_study(direction='maximize')
study.optimize(get_objective(X_train_c, y_train), n_trials=20)
results['Pattern C'] = {'accuracy': study.best_value, 'params': study.best_params, 'features': features_c}

# [Pattern D]
print("\n--- Evaluating Pattern D ---")
train_fe_d = train_fe_b.copy()
train_fe_d['Is_Ultra_Poor_3rd'] = train_fe['Is_Ultra_Poor_3rd']
train_fe_d['Embarked_S_3rd'] = train_fe['Embarked_S_3rd']
train_fe_d['Embarked_C_3rd'] = train_fe['Embarked_C_3rd']
train_fe_d['Embarked_Q_3rd'] = train_fe['Embarked_Q_3rd']
train_fe_d['Male_3rd_Has_Family'] = train_fe['Male_3rd_Has_Family']
train_fe_d['Is_3rd_Parent_Guardian'] = train_fe['Is_3rd_Parent_Guardian']
features_d = features_b + features_pattern_a + features_pattern_c
X_train_d = train_fe_d[features_d]

study = optuna.create_study(direction='maximize')
study.optimize(get_objective(X_train_d, y_train), n_trials=20)
results['Pattern D'] = {'accuracy': study.best_value, 'params': study.best_params, 'features': features_d}

print("\n--- Final CV Comparison ---")
for key, val in results.items():
    print(f"{key}: {val['accuracy']:.5f}")

Here is the summary of validation results:

Baseline (Before features): CV 0.8530
Pattern A (Socio-Physical features only): CV 0.8530
Pattern B (Ticket Proximity OOF & Prefix Interaction): CV 0.8687 (Significant Improvement!)
Pattern C (Family & Guardian features only): CV 0.8574
Pattern D (All features combined): CV 0.8642

Pattern B using Ticket Proximity achieved a record-high CV of 0.8687.
We generated predictions using this optimal configuration and submitted them to Kaggle.
However, the Public Score remained at 0.78947, not showing an immediate increase on the leaderboard.

Conclusion

We are still far from satisfied, but resolving the physical cabin layouts through ticket number sequences has proven to be highly effective for CV improvement.
We hope this walkthrough helps.

Japanese version:
Kaggle Practice 1 "Titanic Survival Prediction" 1. Creating Kaggle Titanic Execution Environment on Local PC
Kaggle Practice 1 "Titanic Survival Prediction" 2. Initial Submission
Kaggle Practice 1 "Titanic Survival Prediction" 3. Cabin Feature Engineering
Kaggle Practice 1 "Titanic Survival Prediction" 4. Feature Engineering (Age Imputation via Random Forest)
Kaggle Practice 1 "Titanic Survival Prediction" 5. Feature Engineering (Nonlinear Transformation and Binning of Numerical Features)
Kaggle Practice 1 "Titanic Survival Prediction" 6. Adding Group Statistics to Capture Evacuation Behavior and CatBoost × Optuna Optimization
Kaggle Practice 1 "Titanic Survival Prediction" 7. Pseudo-restoration of Cabin Layouts via Ticket Proximity and Model Deadlock Analysis

Stuck on Android's 16KB Page Size Error? Here is the CMake Quick Fix

kito2718 — Tue, 07 Jul 2026 13:37:30 +0000

Abstract

Support for 16KB page sizes is mandatory for publishing Android NDK apps on Google Play.
How to support 16KB page sizes.
How to verify 16KB page size alignment.

Overview

Following the official Android developer documentation below, it can still be confusing to understand what steps to actually take:
https://developer.android.com/guide/practices/page-sizes#ndk-build

When I tried to publish an app for the first time in about 10 years, I found that the publication rules and procedures had changed significantly, which took a lot of effort. Among various requirements, like setting up a privacy policy, I got stuck on the 16KB page size support issue.
This post shares how to resolve it.

16KB Page Size Support is Mandatory for Android NDK Apps

If your app does not support 16KB page sizes, uploading your AAB (or APK) to Google Play Console will fail.
Thus, you have no choice but to implement it.
The error looks like this:

Notice the error: **4KB LOAD section alignment, but 16KB is required.

How to Support 16KB Page Sizes

Normally, upgrading to NDK version 28 or higher is sufficient to resolve this issue. However, due to specific constraints, I had to stick to version 27.
In such cases, adding the following single line to your CMakeLists.txt resolves the problem:

cmake_minimum_required(VERSION 3.22.1)

project("videophotobook")

add_library(VUFORIA_LIBRARY SHARED IMPORTED)
set_target_properties(VUFORIA_LIBRARY PROPERTIES IMPORTED_LOCATION
        ${CMAKE_CURRENT_SOURCE_DIR}/../jniLibs/${ANDROID_ABI}/libVuforiaEngine.so)

add_library(${CMAKE_PROJECT_NAME} SHARED
        VuforiaController.cpp
        GLESRenderer.cpp
        GLESUtils.cpp
        Jni.cpp)

# ↓ this one. 
target_link_options(${CMAKE_PROJECT_NAME} PRIVATE "-Wl,-z,max-page-size=16384")
target_include_directories(${CMAKE_PROJECT_NAME} PUBLIC include)

target_link_libraries(${CMAKE_PROJECT_NAME}
        android
        log
        GLESv3
        VUFORIA_LIBRARY)

How to Verify 16KB Page Size Support

Open Android Studio, select Build -> Analyze APK..., and inspect your built package.

You will see the details:

If the error **4KB LOAD section alignment, but 16KB is required* is gone, it is OK.*

Summary

Ideally, build your app using NDK version 28 or higher. If you must use an older NDK version, add target_link_options(${CMAKE_PROJECT_NAME} PRIVATE "-Wl,-z,max-page-size=16384") to your CMakeLists.txt.

I hope this helps!

Japanese version:

Androidアプリ公開直前でハマった「16KBページサイズ対応」問題と対処方法

zenn.dev

Kaggle Titanic Practice 6: Elevating Survival Predictions with Group Features & CatBoost Tuning

kito2718 — Mon, 06 Jul 2026 14:33:18 +0000

GitHub Repository

Abstract

Created advanced passenger group statistics (Group_Size, Group_Female_Child_Ratio, etc.) based on last names and ticket numbers.
Introduced CatBoost Classifier to handle categorical values and optimized hyperparameters automatically using Optuna.
Reached our highest 5-Fold Cross-Validation (CV) accuracy of 0.8563, though the Kaggle Public Score remained at 0.79665.

Overview

This post covers feature engineering refinement, model diversification using CatBoost, and hyperparameter tuning with Optuna. Although the Public Score didn't increase from our previous high, the CV score significantly improved.

1. Advanced Group Features

During the Titanic evacuation, passengers tended to act in groups (families or travel companions). To capture this behavior, we identified passengers sharing the same ticket number or last name and fare as a group.
We engineered the following four group features:

Group Size (Group_Size): The actual number of companions sharing the ticket or last name and fare. Unlike the traditional Family_Size (which only counts biological/legal relatives), this captures friends, couples, and staff traveling together.
Group Female & Child Ratio (Group_Female_Child_Ratio): The ratio of priority rescue candidates (females or children under 16) within the group.
Group Mean Age (Group_Mean_Age): The average age of the group.
Group Fare Median Difference (Group_Fare_Median_Diff): The difference between the passenger's fare and the median fare of their class (Pclass). This acts as a proxy for cabin location and quality.

Visualization & Analysis of Group Features

Let's look at the relationship between these group features and survival rate.

① Group Size vs Survival Rate (`Group_Size`)

X-axis: Number of people in the group. Y-axis: Survival rate.
Analysis: Mid-sized groups of 2-4 people show high survival rates (50-70%). In contrast, solo travelers and large families (5+ people) show lower survival rates. This indicates that having a moderate number of companions to cooperate with during the crisis increased the chance of survival.

② Group Female & Child Ratio Distribution (`Group_Female_Child_Ratio`)

X-axis: Female and child ratio within the group (0.0 to 1.0). Y-axis: Probability density.
Analysis:
- Left Side (Near 0.0): Groups with no women or children (adult males only) have a dense cluster of deceased passengers (red area). This represents the lowest priority for rescue.
- Right Side (Near 1.0): Groups consisting entirely of women and children have a massive peak of surviving passengers (blue area), indicating they were evacuated first.
- Takeaway: Even if a passenger is an adult male (who typically has a very low survival rate), if he belongs to a group with many women and children, his chance of survival increases because he was more likely guided to a lifeboat alongside his group.

③ Group Mean Age Distribution (`Group_Mean_Age`)

X-axis: Average age of the group. Y-axis: Probability density.
Analysis: The surviving passengers (blue area) show peaks around younger averages (families with children) and mature age groups (35-40, likely family heads). The overall age profile of a group plays a significant role in their mobility and evacuation priority.

④ Group Fare Median Difference (`Group_Fare_Median_Diff`)

X-axis: Survival status (0 = Deceased, 1 = Survived). Y-axis: Difference from Pclass median fare.
Analysis: Surviving passengers tended to belong to groups that paid a higher fare relative to their class median. This suggests their cabins were situated in more accessible locations (closer to the deck or evacuation routes).

Implementation Code

Below is the feature engineering code implemented in our pipeline:

# --- Advanced Group Features Creation ---
# Define groups (Group_Id) by ticket number, or last name and fare for solo tickets
df_all['Ticket_Group_Size'] = df_all.groupby('Ticket')['PassengerId'].transform('count')
df_all['Group_Id'] = df_all['Ticket']
mask = df_all['Ticket_Group_Size'] == 1
df_all.loc[mask, 'Group_Id'] = df_all.loc[mask, 'Last_Name'] + '_' + df_all.loc[mask, 'Fare'].astype(str)

# 1. Group Size
df_all['Group_Size'] = df_all.groupby('Group_Id')['PassengerId'].transform('count')

# 2. Female and Child Ratio within Group
df_all['Is_Female_or_Child'] = ((df_all['Sex'] == 'female') | (df_all['Age'] < 16)).astype(int)
df_all['Group_Female_Child_Ratio'] = df_all.groupby('Group_Id')['Is_Female_or_Child'].transform('mean')

# 3. Mean Age of Group
df_all['Group_Mean_Age'] = df_all.groupby('Group_Id')['Age'].transform('mean')

# 4. Difference from Pclass median fare
pclass_fare_median = df_all.groupby('Pclass')['Fare'].transform('median')
df_all['Group_Fare_Median_Diff'] = df_all['Fare'] - pclass_fare_median

# Drop temporary columns
df_all = df_all.drop(columns=['Ticket_Group_Size', 'Is_Female_or_Child'])

2. CatBoost & Optuna Tuning

The Titanic dataset contains many categorical variables (Sex, Embarked, Title, Deck). To handle these effectively, we introduced CatBoost (CatBoostClassifier), which features robust target statistics encoding natively.

We also performed hyperparameter optimization using the optuna library across LightGBM, XGBoost, and CatBoost.

Tuning Implementation Code

Here is our Optuna search script, evaluating each trial with 5-fold cross-validation accuracy over 30 trials:

import optuna
from sklearn.model_selection import StratifiedKFold, cross_val_score
import lightgbm as lgb
from catboost import CatBoostClassifier

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# --- LightGBM Tuning ---
def objective_lgb(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 9),
        'num_leaves': trial.suggest_int('num_leaves', 7, 63),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 50),
        'verbosity': -1,
        'random_state': 42
    }
    model = lgb.LGBMClassifier(**params)
    return cross_val_score(model, X_train, y_train, cv=cv, scoring='accuracy').mean()

study_lgb = optuna.create_study(direction='maximize')
study_lgb.optimize(objective_lgb, n_trials=30)
print(f"Best LightGBM CV Score: {study_lgb.best_value:.4f}")

# --- CatBoost Tuning ---
def objective_cat(trial):
    params = {
        'iterations': trial.suggest_int('iterations', 50, 300),
        'depth': trial.suggest_int('depth', 3, 8),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1),
        'l2_leaf_reg': trial.suggest_float('l2_leaf_reg', 1.0, 10.0),
        'random_seed': 42,
        'verbose': 0
    }
    model = CatBoostClassifier(**params)
    return cross_val_score(model, X_train, y_train, cv=cv, scoring='accuracy').mean()

study_cat = optuna.create_study(direction='maximize')
study_cat.optimize(objective_cat, n_trials=30)
print(f"Best CatBoost CV Score: {study_cat.best_value:.4f}")

The optimized parameter scores were:

LightGBM: CV 0.8530 (improved)
XGBoost: CV 0.8530 (significantly improved)
CatBoost: CV 0.8563 (New Highest CV Score)
- Best params: iterations=186, depth=4, learning_rate=0.068, l2_leaf_reg=5.63

3. Results & Submission

Model	Baseline CV (Age Imputation)	Feature Addition CV	Hyperparameter Optimized CV	Kaggle Public Score
Logistic Regression	0.8519	0.8485	-	-
LightGBM	0.8485	0.8518	0.8530	0.79425 (Step 1)
XGBoost	0.8226	0.8272	0.8530	-
CatBoost	-	-	0.8563	0.79665 (Highest Score Tie)

We also tested stacking ensembles (Meta: Ridge Classifier, CV: 0.8485), but due to the small dataset size, the meta-model suffered from overfitting. The standalone tuned CatBoost classifier provided the best results.

Summary

Integrating advanced group features with CatBoost and Optuna yielded a new local CV record of 0.8563 and tied our highest Kaggle Public Score of 0.79665.

We will continue searching for further improvements.
I hope this helps.

Japanese version:
Kaggle Titanic Practice 1: Setting up Kaggle Titanic Environment on Local PC
Kaggle Titanic Practice 2: First Submission
Kaggle Titanic Practice 3: Feature Engineering with Cabin
Kaggle Titanic Practice 4: Feature Engineering (Age Imputation using Random Forest)
Kaggle Titanic Practice 5: Feature Engineering (Non-linear Transforms & Binning of Numerical Features)
Kaggle Titanic Practice 6: Feature Engineering (Advanced Group Features), CatBoost & Optuna Tuning

How I Built an Offline AR Video-Book App on Android (Without Cloud APIs or Premium SDKs)

kito2718 — Sun, 05 Jul 2026 09:50:04 +0000

Abstract

A detailed breakdown of developing an image recognition AR application using Google's ARCore.

Overview

My previously released AR application VideoPhotoBook was built using the Vuforia SDK. While it worked well, Vuforia requires pre-registering reference images on their web developer portal.

When I realized that Google's official ARCore supports dynamic image database generation directly on the device, I decided to rebuild the project. This article documents the development of VideoPhotoBookv3, a completely local-first AR app that allows users to create dynamic AR experiences on the fly.

App: VideoPhotoBookv3 (TBD)
GitHub: VideoPhotoBookv3
Zenn Article (Japanese): ARCoreを使った画像認識ARアプリを公開してみた
DEV.to Article: (This Post)

1. Requirements

Core Feature: Match any image (marker) with a video from the user's gallery. When the camera points at the image, overlay the corresponding video on top of the physical image target in real-time.
Multiple Pairs: Support listing and registering multiple "Image ⇄ Video" pairs, allowing the AR session to simultaneously track and play multiple videos.
Design: Minimalist UI.

2. UI Design

Designed under minimalist principles: generous spacing, clean typography, and a cohesive, simple color scheme.

Settings List Screen (App Entry Point):

Pair List (Card List):
- Displays a list of configured pairs.
- Each card displays the marker image thumbnail, video file name, target physical width, and video scale factor.
- Interactive Edit: Tapping any card opens the "Edit Pair Dialog", allowing the user to update the image, video, and physical width individually.
- All active pairs in the list will be target markers for AR tracking.
Add/Edit Pair Dialog:
- Lets users configure new pairs with preview capabilities.
- Shows a thumbnail preview of the selected marker image to ensure visual clarity.
Launch AR Button:
- A slim, elegant black button positioned at the bottom of the screen, displayed only when one or more pairs are configured.

AR Camera Screen:

Camera Preview: Full-screen view.
Back Button: A clean arrow icon in the top-left corner.
Status Indicator: Subtle overlay text indicating whether the app is searching for markers or actively tracking them.

3. Technical & Functional Design

(1) Data Persistence (Data Model)

To save multiple image-video pairs locally, we define the following data structure:

data class ArKeyPair(
    val id: String,          // Unique Identifier (UUID)
    val markerUri: String,   // Gallery URI of the marker image
    val videoUri: String,    // Gallery URI of the video file
    val physicalWidth: Float, // Actual physical width of the marker (meters, default = 0.1m = 10cm)
    val scaleFactor: Float   // Scale multiplier (default = 1.0 = 100%)
)

The pairs are serialized to a JSON string and persisted using SharedPreferences.

(2) AR Rendering via Sceneview

We use Sceneview (built on Google's Filament 3D renderer) to handle the 3D scene graph and ARCore integration.

Video Overlay Architecture Overview:

ARScene Composable: Sceneview provides a Jetpack Compose wrapper called ARScene to host the camera preview and handle the ARCore lifecycle out of the box.
Asynchronous Database Generation: During initialization, the app decodes the gallery image URIs into Bitmaps on a background thread and compiles them into ARCore's AugmentedImageDatabase via database.addImage(id, bitmap, physicalWidth).
Anchor and Node Binding: When ARCore detects a marker, it creates an Anchor. We then instantiate an AnchorNode at this anchor's position.
VideoNode for Media Playback:
- Sceneview's VideoNode allows rendering video streams onto a 3D plane.
- We instantiate a standard MediaPlayer (or ExoPlayer), assign the videoUri as the data source, and bind it to the VideoNode.
- The video will overlay precisely on top of the physical target image based on its coordinates and width.

4. Sequences

1. App Launch & Initialization

Sequence Diagram

Explanation

When the app launches, it asynchronously loads the image-video pair configurations (JSON string) from SharedPreferences. The deserialized list updates the uiState (StateFlow) in MainScreenViewModel, triggering Jetpack Compose to render the list of cards.

Code Implementation

MainScreenViewModel.kt:

  class MainScreenViewModel(private val repository: KeyPairRepository) : ViewModel() {
      private val _uiState = MutableStateFlow<List<ArKeyPair>>(emptyList())
      val uiState: StateFlow<List<ArKeyPair>> = _uiState.asStateFlow()

      init { loadPairs() }

      fun loadPairs() {
          viewModelScope.launch {
              _uiState.value = repository.getPairs()
          }
      }
  }

KeyPairRepository.kt:

  fun getPairs(): List<ArKeyPair> {
      val json = prefs.getString(key, null) ?: return emptyList()
      return try {
          val type = object : TypeToken<List<ArKeyPair>>() {}.type
          gson.fromJson(json, type) ?: emptyList()
      } catch (e: Exception) {
          emptyList()
      }
  }

2. Selecting Media & Persisting URI Access

Sequence Diagram

Explanation

When selecting images and videos via the Android Photo Picker, the returned Uri loses its read permissions once the app process restarts. To bypass this, we immediately request a persistable read permission using takePersistableUriPermission. This ensures the app can access the selected local assets across device restarts.

Code Implementation

MainScreen.kt (Photo Picker & URI Persistence):

  // Helper to persist URI read permission
  private fun persistUriAccess(context: Context, uri: Uri) {
      try {
          val takeFlags: Int = Intent.FLAG_GRANT_READ_URI_PERMISSION
          context.contentResolver.takePersistableUriPermission(uri, takeFlags)
      } catch (e: SecurityException) {
          // Handle exception
      }
  }

  // Photo Picker Launcher
  val pickImageLauncher = rememberLauncherForActivityResult(
      contract = ActivityResultContracts.PickVisualMedia()
  ) { uri ->
      if (uri != null) {
          persistUriAccess(context, uri)
          markerUri = uri
      }
  }

3. AR Startup & ARCore Session Initialization

Sequence Diagram

Explanation

When transitioning to the AR screen, the app requests camera permission if it has not been granted. Next, it uses a coroutine running on Dispatchers.IO to load and decode the target images into Bitmaps asynchronously. Once the AR session starts, these Bitmaps are dynamically added to the AugmentedImageDatabase and configured on the session along with auto-focus configurations.

Code Implementation

ArViewScreen.kt (Async Bitmaps & Database Init):

  // Decode marker images asynchronously
  LaunchedEffect(pairs) {
      withContext(Dispatchers.IO) {
          for (pair in pairs) {
              val bitmap = loadBitmapFromUri(context, pair.markerUri)
              if (bitmap != null) {
                  bitmaps[pair.id] = bitmap
              }
          }
          isBitmapsLoaded = true
      }
  }

  // Build the image database when session initializes
  LaunchedEffect(arSession, isBitmapsLoaded) {
      val session = arSession
      if (session != null && isBitmapsLoaded && isLoading) {
          withContext(Dispatchers.IO) {
              val database = AugmentedImageDatabase(session)
              for (pair in pairs) {
                  val bitmap = bitmaps[pair.id]
                  if (bitmap != null) {
                      database.addImage(pair.id, bitmap, pair.physicalWidth)
                  }
              }
              val config = session.config
              config.augmentedImageDatabase = database
              config.focusMode = Config.FocusMode.AUTO
              session.configure(config)
              withContext(Dispatchers.Main) {
                  isLoading = false
              }
          }
      }
  }

4. Tracking and Dynamic Video Lifecycle Control

Sequence Diagram

Explanation

We capture frames from the AR session callback onSessionUpdated. When target images are detected, their tracking states (TRACKING, PAUSED, STOPPED) drive the video lifecycle:

Newly Detected (TRACKING): We create an Anchor at the center of the image. We extract the video's aspect ratio via the MediaPlayer and compute the correct fitting bounds within the marker's physical size. The video plays, and the AnchorNode is reactively rendered onto the screen.
Lost Sight (PAUSED): The video pauses to save resources, resuming once tracking is restored.
Destroyed (STOPPED): The player resources are released, and the node is removed from the layout hierarchy.

Code Implementation

ArViewScreen.kt (Tracking Lifecycle Handler):

  onSessionUpdated = { session, frame ->
      val updatedImages = frame.getUpdatedTrackables(AugmentedImage::class.java)
      for (image in updatedImages) {
          val id = image.name
          val pair = pairs.firstOrNull { it.id == id } ?: continue

          when (image.trackingState) {
              TrackingState.TRACKING -> {
                  val activeVideo = activeVideos[id]
                  if (activeVideo == null) {
                      val anchor = image.createAnchor(image.centerPose)
                      val mediaPlayer = MediaPlayer().apply {
                          setDataSource(context, Uri.parse(pair.videoUri))
                          isLooping = true
                          prepare()
                      }

                      // Preserve video aspect ratio within the marker bounds
                      val videoWidth = mediaPlayer.videoWidth.toFloat()
                      val videoHeight = mediaPlayer.videoHeight.toFloat()
                      val videoRatio = if (videoWidth > 0f && videoHeight > 0f) videoWidth / videoHeight else (image.extentX / image.extentZ)
                      val imageRatio = image.extentX / image.extentZ
                      val baseSize = if (videoRatio > imageRatio) {
                          Size(image.extentX, image.extentX / videoRatio)
                      } else {
                          Size(image.extentZ * videoRatio, image.extentZ)
                      }
                      val scale = if (pair.scaleFactor <= 0f) 1f else pair.scaleFactor
                      val size = Size(baseSize.x * scale, baseSize.y * scale)

                      mediaPlayer.start()
                      activeVideos[id] = ActiveVideo(id, anchor, mediaPlayer, size)
                  } else {
                      if (!activeVideo.mediaPlayer.isPlaying) {
                          activeVideo.mediaPlayer.start()
                      }
                  }
              }
              TrackingState.PAUSED -> {
                  activeVideos[id]?.let {
                      if (it.mediaPlayer.isPlaying) it.mediaPlayer.pause()
                  }
              }
              TrackingState.STOPPED -> {
                  activeVideos[id]?.let {
                      try {
                          if (it.mediaPlayer.isPlaying) it.mediaPlayer.stop()
                          it.mediaPlayer.release()
                      } catch (e: Exception) {}
                      activeVideos.remove(id)
                  }
              }
          }
      }
  }

ArViewScreen.kt (Declarative Scene Graph rendering):

  ARSceneView(
      // ...,
      onSessionUpdated = { /* ... */ }
  ) {
      activeVideos.values.forEach { activeVideo ->
          AnchorNode(anchor = activeVideo.anchor) {
              VideoNode(
                  player = activeVideo.mediaPlayer,
                  size = activeVideo.size,
                  // Rotates -90 degrees on the X axis to lie flat on the horizontal X-Z plane
                  rotation = Rotation(x = -90f, y = 0f, z = 0f)
              )
          }
      }
  }

Key Implementation Summary

Persisting URI read permission: Crucial when working with Android's system pickers. Bypasses permission expiration on app process restarts via takePersistableUriPermission.
Asynchronous Bitmap Loading: De-clutters the main thread by preparing Bitmaps off the main thread before starting the ARCore engine.
Runtime Augmented Image Database creation: Builds the database dynamically, removing the need for pre-calculated static .imgdb files.
Declarative 3D Scene Graph via Compose: Sceneview's Compose layout eliminates imperative parent.addChild() commands. 3D nodes reactively follow the Compose lifecycle.

Conclusion & Takeaways

Development is much simpler without proprietary platforms: Moving away from Vuforia meant getting rid of web portal pre-registration. Everything is now processed on-device, offering a much cleaner development and user flow.
ARCore Tracking is highly reliable: Even when tracking targets generated dynamically at runtime, ARCore's image tracking accuracy and latency are highly performant.

I hope this walkthrough assists you in building your next local-first AR app. Happy coding!

Kaggle Titanic Practice 5: Improving Score with Fare Log-Transformation and Age Stage Binning

kito2718 — Fri, 03 Jul 2026 12:55:52 +0000

Available on GitHub

Abstract

Applied log-transformation to passenger Fare to mitigate skewness and stage binning to Age for life stage classification.
While the linear model (Logistic Regression) accuracy decreased due to representation changes, tree-based models (Random Forest, XGBoost, LightGBM) showed improved CV accuracy under Pattern C (excluding original continuous Fare and Age).
Submitted predictions from the improved LightGBM model (Pattern C) and achieved a new best Kaggle Public Score of 0.79665 (previously 0.78947).

Introduction

So far, we achieved a 5-Fold CV of 0.8519 and a Kaggle Public Score of 0.78947 by introducing ML-based age imputation. In this iteration, we focused on scaling numerical features and handling non-linear relationships to push our score further.

Preprocessing & Features

1. Log-Transformation of Fare

The ticket Fare in Titanic has a highly right-skewed distribution because a small number of wealthy passengers paid extremely high fares compared to the majority of third-class passengers.
To mitigate this skewness, we applied log1p(Fare). This transforms the skewed distribution into a normal-like bell curve, which stabilizes the training of linear models.

Linear models like Logistic Regression generally perform better and stabilize training when the input numerical features follow a normal distribution.

2. Age Binning (Life Stages)

Using continuous values of Age limits linear models to capturing simple monotonic relationships (e.g., higher age equals higher survival probability). However, actual survival rates vary non-linearly across age groups (higher for children/infants, lower for young adults).

To capture this non-linear relationship, we split passengers into 5 life stages (bins) and dummy-encoded them:

Infant (0-5 years old)
Child (6-15 years old)
Youth (16-30 years old)
Middle-aged (31-55 years old)
Senior (56+ years old)

This classification allows models to capture specific group boundaries (e.g., protecting children during evacuation) more effectively.

Evaluation Patterns for Multicollinearity

Since the log-transformed Log_Fare and binned Age_Bin dummy variables are highly correlated with their original continuous counterparts, keeping both might cause instability due to multicollinearity. Thus, we evaluated 4 different feature configurations using 5-Fold Stratified Cross-Validation:

Pattern A: Keep original Fare and Age, add Log_Fare and Age_Bin dummy variables.
Pattern B: Exclude Fare (replace with Log_Fare), keep Age, add Age_Bin dummy variables.
Pattern C: Exclude Fare (replace with Log_Fare), exclude Age (use only Age_Bin dummy variables).
Pattern D: Keep Fare, exclude Age (use only Age_Bin dummy variables).

Validation Results (5-Fold CV Accuracy)

Model	Baseline (RF Age Imputed)	Pattern A	Pattern B	Pattern C	Pattern D
Logistic Regression	0.8519	0.8474	0.8474	0.8474	0.8440
Random Forest	0.8249	0.8238	0.8170	0.8339 (+0.0090)	0.8271
XGBoost	0.8226	0.8159	0.8170	0.8283 (+0.0057)	0.8272
LightGBM	0.8485	0.8474	0.8474	0.8496 (+0.0011)	0.8496

Discussion

Logistic Regression: Converting Age to dummy variables resulted in a loss of granular numerical information, which led to a slightly lower CV score compared to the baseline (0.8519).
Tree-Based Models: In Pattern C (excluding continuous Fare and Age), all tree models showed significant accuracy improvements. Removing redundant continuous variables likely prevented trees from splitting too deeply, acting as a form of regularization.

Kaggle Submission Result

Although the overall best CV score did not exceed the baseline Logistic Regression, the LightGBM model trained on Pattern C features achieved a better generalization performance.

Kaggle Public Score: 0.79665 (Improved from the previous best of 0.78947!)

The results demonstrate that while linear models struggled with the binned representations, the tree-based models benefited significantly, translating to a better Public Score.

The corresponding code has been committed to GitHub: titanic_eda_20260703_2025_fare_log_and_age_binning.ipynb.

Conclusion & Next Steps

Log-transformation and binning proved to be a highly effective combination for boosting tree-based models.
For our next attempt, we will target hyperparameter tuning (using Optuna) and model ensembling (stacking/blending) to push the score even further.

Japanese version:
Kaggle Practice 1: Setting Up a Local Environment for the Kaggle Titanic Competition
Kaggle Practice 2: First Submission
Kaggle Practice 3: Feature Engineering for Cabin
Kaggle Practice 4: Feature Engineering (Imputing Age with Random Forest)
Kaggle Practice 5: Feature Engineering (Fare Log-Transformation and Age Stage Binning)

Kaggle Titanic Practice 4: Improving Survival Prediction with Random Forest Age Imputation

kito2718 — Thu, 02 Jul 2026 14:00:04 +0000

Available on GitHub

Abstract

Changed age imputation from median values by title to predictive imputation using RandomForestRegressor.
The best 5-Fold CV (Cross-Validation) score improved from 0.8507 to 0.8519 (Logistic Regression).
Kaggle Public Score increased from 0.78708 to 0.78947.
Validation code is committed to GitHub: titanic_eda_20260702_2031_age_imputation.ipynb.

Overview

In the Kaggle Titanic: Machine Learning from Disaster competition, passenger Age is a critical factor for predicting survival.
Previously, we filled missing values with the median age of each passenger title (Mr, Miss, Mrs, Master, Rare). This time, we tried a more advanced approach: predicting the missing ages using a machine learning model (RandomForestRegressor) based on other features (Pclass, Sex, SibSp, Parch, Fare, Embarked, Deck).

Implementation

Here is the preprocessing code for the imputation. We trained RandomForestRegressor on passengers with known ages and predicted the missing values.

from sklearn.ensemble import RandomForestRegressor

# Features used for predicting Age
age_features = ['Pclass', 'Sex', 'SibSp', 'Parch', 'Fare', 'Embarked', 'Title', 'Deck', 'FamilySize', 'IsAlone', 'Age']
df_age_prep = df_all[age_features].copy()

# One-Hot Encoding for categorical features
cat_cols_for_age = ['Sex', 'Embarked', 'Title', 'Deck']
df_age_encoded = pd.get_dummies(df_age_prep, columns=cat_cols_for_age, drop_first=True)

# Split into known and unknown age datasets
train_age = df_age_encoded[df_age_encoded['Age'].notnull()]
test_age = df_age_encoded[df_age_encoded['Age'].isnull()]

X_train_age = train_age.drop(columns=['Age'])
y_train_age = train_age['Age']
X_test_age = test_age.drop(columns=['Age'])

# Train regressor and predict missing age
age_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
age_regressor.fit(X_train_age, y_train_age)
predicted_ages = age_regressor.predict(X_test_age)

# Impute missing values in the original dataframe
df_all.loc[df_all['Age'].isnull(), 'Age'] = predicted_ages

Validation Results

Comparison of 5-Fold CV (Cross-Validation) accuracy across different models:

Model	Before (Median by Title)	After (Random Forest Imputation)	Difference
Logistic Regression	0.8507 +/- 0.0104	0.8519 +/- 0.0115	+0.0012
Random Forest	0.8204 +/- 0.0193	0.8249 +/- 0.0348	+0.0045
XGBoost	0.8215 +/- 0.0241	0.8226 +/- 0.0244	+0.0011
LightGBM	0.8496 +/- 0.0211	0.8485 +/- 0.0147	-0.0011

Logistic Regression achieved our personal best 5-Fold CV accuracy of 0.8519.
We also observed accuracy improvements in tree-based models like Random Forest and XGBoost.

Kaggle Submission Score

We predicted the test dataset using the updated Logistic Regression model and submitted it to Kaggle.
Our Public Score successfully improved from 0.78708 to 0.78947!
It is encouraging to see that the local CV score improvement translated directly to the Kaggle Public Score.

Summary & Next Steps

By estimating passenger age from other relevant features instead of using simple median values, the model could learn a more realistic passenger representation.
For our next attempt, we will target hyperparameter tuning (using Optuna) and model ensembling to achieve further improvements.

Hope this helps!

1	2	3	4	5	6	7	8	9	10

11	12	13	14	15	16	17	18	19	20

21	22	23	24	25	26	27	28	29	30

31	32	33	34	35	36	37	38	39	40

41	42	43	44	45	46	47	48	49	50

1	2	3	4	5	6	7	8	9	10

11	12	13	14	15	16	17	18	19	20

21	22	23	24	25	26	27	28	29	30

31	32	33	34	35	36	37	38	39	40

41	42	43	44	45	46	47	48	49	50

DEV Community: kito2718

BCT4: 3D U-Net Bottlenecks and Minimal CV Score Improvement (0.4483 -> 0.4578)

Abstract

Overview & Background

Main Content

1. Overall 3D U-Net Pipeline

2. Quantitative Results & Score Progression

3. Key Technical Bottlenecks

(1) Extreme Class Imbalance in 3D Space(Zero Collapse)

(2) Loss of Spatial Resolution in Deep Pooling

(3) Limitations of Frame-to-Frame Distance Tracking

Conclusion

Japanese Series Links

BCT3: Building a Fast Local Cross-Validation Environment for 80GB Dataset

Abstract

Overview

Details

(1) Isolated Directory Structure

(2) Fast Package Management with uv

(3) Resolving Windows Security DLL Blocks (Smart App Control)

(4) CPU Environment Compatibility Patch (io.py)

(5) Resolving Graph Counting Performance Bottleneck

(6) Local Validation Notebook Implementation & GitHub Reference

(7) VS Code Execution Steps

(8) Sanity Check & Baseline Local CV Score Report

Conclusion

Japanese Series Articles (Zenn)

BCT2: Baseline Source Code Walkthrough

Abstruct

Introduction and Background

Content

1. Understanding OME-Zarr Directory Structures

2. Pipeline Architecture

3. Step-by-Step Code Walkthrough

Step 3.1: Open Zarr Store

Step 3.2: Iterate Time Frames

Step 3.3: Slice 3D Image

Step 3.4: Min-Max Normalization & 3D Cell Detection

Step 3.5: Physical Scale Transformation & Nearest Neighbor Tracking

Step 3.6: Combine Nodes & Edges and Type Cast

Step 3.7: Export CSV

Complete Source Code

Japanese Version of This Series

Conclusion

BCTx: Installing Custom Packages Offline

Abstruct

Introduction and Background

Content

Overall Flow

Step 1: Download Wheel Files Locally

Step 2: Upload Wheels to Kaggle as a Dataset

Step 3: Add the Dataset and Install Offline

Summary

Note

Japanese Version of This Series

Conclusion

BCT1: Kaggle Environment Setup and First Submission

Abstract

Introduction and Background

Kaggle Notebook Setup Steps

1. Development Environment Setup & Preparation

2. Run All

3. Save Version

4. Submit to Competition

5. Verify Results

Summary

First Submission Code

Japanese Version of This Series

Conclusion

T2I(1). Setting Up a Local Validation Environment for Kaggle's Text-to-Image Generation Challenge

Abstract

Overview

Environment Setup

1. Hardware Configuration

2. Installed Libraries and Selection Rationale

3. Installation Steps

Running the Baseline Pipeline

Processing Flow

1. How Images Are Generated from Text

1.1 Overview of the Generation Process

(2) Fast Package Management with `uv`

(4) CPU Environment Compatibility Patch (`io.py`)

Output Files(`output/`)

① Group Size vs Survival Rate (`Group_Size`)

② Group Female & Child Ratio Distribution (`Group_Female_Child_Ratio`)

③ Group Mean Age Distribution (`Group_Mean_Age`)

④ Group Fare Median Difference (`Group_Fare_Median_Diff`)

1	2	3	4	5	6	7	8	9	10

11	12	13	14	15	16	17	18	19	20

21	22	23	24	25	26	27	28	29	30

31	32	33	34	35	36	37	38	39	40

41	42	43	44	45	46	47	48	49	50