Calvin Z

Posted on Apr 30

How I Got MediaPipe Face Landmarker Running in the Browser with Zero Build Tools (And the Import Bug That Wasted My Week)

#javascript #machinelearning #showdev #webdev

I built facecalculators.com a free, privacy-first face shape detector that runs entirely in the browser. No uploads. No server. No account. All facial analysis happens locally via WebAssembly, and nothing biometric ever leaves the device.

The core of it is Google's MediaPipe Face Landmarker with the 478-point 3D model. Getting it working in a pure vanilla JavaScript environment, without a build tool, without React, and inside a WordPress theme, was not as straightforward as the docs make it look. This article documents every failure mode I hit and the exact solution that finally worked.

If you are building anything with MediaPipe in the browser and not using a bundler, this will save you real hours.

What the Docs Tell You vs. What Actually Works

The official Google documentation shows two ways to load the library via CDN.

Option A (script tag):

<script src="https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/vision_bundle.mjs"
  crossorigin="anonymous"></script>

Option B (dynamic import inside a module):

const { FaceLandmarker, FilesetResolver } =
  await import("https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision");

Both of these fail in specific and frustrating ways if you are not using a bundler.

Failure 1: The `/wasm/index.js` Path

A lot of older tutorials and Stack Overflow answers use this pattern for the FilesetResolver:

const vision = await FilesetResolver.forVisionTasks(
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm/index.js"
);

This will silently fail in certain environments. The path resolves but the WASM runtime does not initialize correctly. You get no error. The landmarker just never loads. Camera permissions fire, the video stream starts, and then nothing comes back from detectForVideo().

The correct path drops the filename entirely:

const vision = await FilesetResolver.forVisionTasks(
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);

That trailing directory path lets jsDelivr resolve the correct entry point. This alone fixed what looked like a camera failure but was actually a model initialization failure.

Failure 2: `vision_bundle.mjs` and the Default Export Problem

The docs also suggest vision_bundle.mjs as a script tag source. If you then try to use it as an ES module import, you will get an error because vision_bundle.mjs uses a default export.

// This fails with a module import error
import { FaceLandmarker, FilesetResolver } from
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/vision_bundle.mjs";

The named exports you need (FaceLandmarker, FilesetResolver, DrawingUtils) are not directly available through that path when you import it as a module.

The correct approach is to import from the package root, which jsDelivr resolves to the proper entry point:

const {
  FaceLandmarker,
  FilesetResolver,
  DrawingUtils
} = await import("https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.14");

Note the pinned version number. Using @latest works but can break if Google pushes a breaking change. Pin your version in production.

Failure 3: WordPress and `type="module"`

If you are loading your JavaScript inside WordPress and your script is not tagged as type="module", dynamic import() calls will fail silently in some browsers.

In your functions.php , enqueue the script with the module attribute:

function fc_enqueue_calculator_scripts() {
  if ( is_page_template( 'page-calculator.php' ) ) {
    wp_enqueue_script(
      'fc-calculator',
      get_stylesheet_directory_uri() . '/assets/js/calculator.js',
      [],
      '1.0.0',
      true
    );
    // Force type="module" on the script tag
    add_filter( 'script_loader_tag', function( $tag, $handle ) {
      if ( $handle === 'fc-calculator' ) {
        return str_replace( ' src=', ' type="module" src=', $tag );
      }
      return $tag;
    }, 10, 2 );
  }
}
add_action( 'wp_enqueue_scripts', 'fc_enqueue_calculator_scripts' );

Without type="module", the browser treats your JS file as a classic script. Dynamic imports work technically, but the module scope isolation breaks the way MediaPipe resolves its internal WASM paths, causing initialization to fail or behave inconsistently across browsers.

The Working Import Pattern

Here is the complete initialization sequence that works reliably in a vanilla JS ES module without any build step:

const FC_MP_CDN = "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.14";
const FC_MODEL_URL =
  "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task";

let fcLandmarker = null;

async function fcLoadMediaPipe() {
  const vision = await import(FC_MP_CDN);
  const { FaceLandmarker, FilesetResolver } = vision;

  const filesetResolver = await FilesetResolver.forVisionTasks(
    `${FC_MP_CDN}/wasm`
  );

  fcLandmarker = await FaceLandmarker.createFromOptions(filesetResolver, {
    baseOptions: {
      modelAssetPath: FC_MODEL_URL,
      delegate: "GPU"
    },
    runningMode: "VIDEO",
    numFaces: 1
  });

  return fcLandmarker;
}

That is the entire initialization. No npm. No webpack. No node_modules. Runs straight in the browser.

VIDEO Mode and IMAGE Mode on the Same Landmarker Instance

This is the part that no tutorial covers, because most tutorials only implement one analysis method.

On facecalculators.com, there are three input modes: live camera (VIDEO), photo upload (IMAGE), and manual measurements. The camera and photo upload both use the same fcLandmarker instance. Switching between them requires calling setOptions() to change the running mode.

If you do not do this, calling detect() on a VIDEO-mode landmarker throws an error, and calling detectForVideo() on an IMAGE-mode landmarker also throws an error.

// Camera (VIDEO) detection loop
function fcCameraDetectLoop(videoEl) {
  if (!fcLandmarker) return;
  const timestamp = performance.now();
  const result = fcLandmarker.detectForVideo(videoEl, timestamp);
  processLandmarkResult(result);
  requestAnimationFrame(() => fcCameraDetectLoop(videoEl));
}

// Photo upload (IMAGE) analysis
async function fcAnalyzePhoto(imageEl) {
  if (!fcLandmarker) return null;

  // Switch the shared instance to IMAGE mode
  await fcLandmarker.setOptions({ runningMode: "IMAGE" });

  const result = fcLandmarker.detect(imageEl);

  // Switch back to VIDEO mode for the camera to work again
  await fcLandmarker.setOptions({ runningMode: "VIDEO" });

  return result;
}

The setOptions() call is async and takes a moment. Do not call detect() before await setOptions() resolves or you will get a running mode conflict error.

Stealth Preloading with Intersection Observer

The MediaPipe WASM runtime is around 6MB. If you initialize it on page load, it blocks visible page rendering on slower connections. If you wait for the user to click the camera button, there is a noticeable delay between tap and camera opening.

The solution is intersection observer based preloading. The landmarker starts downloading as soon as the detect button enters the viewport, which is effectively instant on most pages since the button is above the fold. By the time the user reads the page and decides to tap, the model is already loaded.

function fcStealthPreload() {
  const detectBtn = document.getElementById('fc-detect-btn');
  if (!detectBtn) return;

  const observer = new IntersectionObserver((entries) => {
    if (entries[0].isIntersecting) {
      fcLoadMediaPipe(); // Fire and forget populates fcLandmarker when ready
      observer.disconnect();
    }
  }, { threshold: 0.1 });

  observer.observe(detectBtn);
}

// Also trigger on full page load as a backup
window.addEventListener('load', () => {
  if (!fcLandmarker) fcLoadMediaPipe();
});

// Start observing once DOM is ready
document.addEventListener('DOMContentLoaded', fcStealthPreload);

On a fast connection, MediaPipe is initialized before the user ever taps. On a slow connection, the camera opens and shows a brief loading state while the model finishes. Either way, the perceived performance is dramatically better than loading on tap.

The 478-Point Model vs. The Old 468-Point Mesh

Most tutorials still reference the old MediaPipe Face Mesh solution, which outputs 468 landmarks. The current Face Landmarker outputs 478 landmarks, with the 10 additional points covering the iris region.

For face shape classification, the iris landmarks are useful for measuring eye distance accurately, which is a key input for alignment detection (confirming the user is centered before capture). The outer iris boundary gives a more stable eye width measurement than guessing from the surrounding mesh points.

The model file for the 478-point landmarker is:

https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task

The older face mesh CDN paths from tutorials dated before 2023 point to deprecated models. If your tutorial uses @mediapipe/face_mesh as the npm package name rather than @mediapipe/tasks-vision, it is targeting the legacy solution.

Landmark Indices for Geometric Measurements

For a face shape classifier, the landmarks you care most about are structural, not expressive. These are the indices used in facecalculators.com for the primary measurements:

// Approximate landmark index references for geometric analysis
// These are stable across most detection results

const FC_LANDMARKS = {
  // Face width at cheekbones (approximately)
  LEFT_CHEEK:     234,
  RIGHT_CHEEK:    454,

  // Forehead width
  LEFT_FOREHEAD:  71,
  RIGHT_FOREHEAD: 301,

  // Jawline width
  LEFT_JAW:       132,
  RIGHT_JAW:      361,

  // Chin tip
  CHIN:           152,

  // Top of forehead hairline region
  TOP_HEAD:       10,

  // Key eye landmarks for alignment detection
  LEFT_EYE_OUTER:  33,
  RIGHT_EYE_OUTER: 263,
};

function fcMeasureWidth(landmarks, leftIndex, rightIndex) {
  const left  = landmarks[leftIndex];
  const right = landmarks[rightIndex];
  return Math.abs(left.x - right.x);
}

function fcMeasureHeight(landmarks, topIndex, bottomIndex) {
  const top    = landmarks[topIndex];
  const bottom = landmarks[bottomIndex];
  return Math.abs(top.y - bottom.y);
}

The landmark coordinates are normalized to the range 0.0 to 1.0 relative to the image dimensions. For ratio calculations (face length divided by cheekbone width, etc.), the normalization cancels out, so you do not need to convert to pixel values unless you are drawing on a canvas.

Alignment Detection Before Capture

One thing that significantly improves classification accuracy is only capturing when the face is reasonably centered and forward-facing. A face turned 20 degrees sideways will produce wrong measurements for a face shape classifier.

Three checks before enabling capture:

function fcCheckAlignment(landmarks) {
  const leftEye  = landmarks[FC_LANDMARKS.LEFT_EYE_OUTER];
  const rightEye = landmarks[FC_LANDMARKS.RIGHT_EYE_OUTER];

  // 1. Eye distance check - face must be close enough to fill the frame
  const eyeDistance = Math.abs(leftEye.x - rightEye.x);
  const eyeDistanceOk = eyeDistance > 0.25;

  // 2. Horizontal rotation - eyes should be at similar Y positions
  const eyeYDiff = Math.abs(leftEye.y - rightEye.y);
  const rotationOk = eyeYDiff < 0.05;

  // 3. Vertical tilt - nose tip should be between eyes vertically
  const noseTip = landmarks[1];
  const eyeMidY = (leftEye.y + rightEye.y) / 2;
  const tiltOk  = Math.abs(noseTip.y - eyeMidY) < 0.15;

  return eyeDistanceOk && rotationOk && tiltOk;
}

When all three pass with 90%+ overall detection confidence, the UI unlocks the capture button. This is the pattern used in the live camera mode on facecalculators.com and it virtually eliminates bad captures.

What I Learned

The documentation gap around no-build-tool MediaPipe setups in the browser is real. Most resources assume npm and a bundler. The combination of:

Importing from the jsDelivr package root (not /wasm/index.js)
Using type="module" on the script tag
Switching runningMode via setOptions() when sharing one landmarker across VIDEO and IMAGE inputs
Preloading via IntersectionObserver instead of on-demand

...is not documented anywhere as a complete pattern. Hopefully this saves someone the week of debugging it cost me.

The live implementation is at facecalculators.com - completely free, no account needed. If you want to see the alignment lock, the X-Ray scan, and the Polaroid result card in action, open it on your phone.

Happy to answer questions in the comments about any of the implementation details above.

DEV Community

How I Got MediaPipe Face Landmarker Running in the Browser with Zero Build Tools (And the Import Bug That Wasted My Week)

What the Docs Tell You vs. What Actually Works

Failure 1: The `/wasm/index.js` Path

Failure 2: `vision_bundle.mjs` and the Default Export Problem

Failure 3: WordPress and `type="module"`

The Working Import Pattern

VIDEO Mode and IMAGE Mode on the Same Landmarker Instance

Stealth Preloading with Intersection Observer

The 478-Point Model vs. The Old 468-Point Mesh

Landmark Indices for Geometric Measurements

Alignment Detection Before Capture

What I Learned

Top comments (0)

What the Docs Tell You vs. What Actually Works

Failure 1: The /wasm/index.js Path

Failure 2: vision_bundle.mjs and the Default Export Problem

Failure 3: WordPress and type="module"

The Working Import Pattern

VIDEO Mode and IMAGE Mode on the Same Landmarker Instance

Stealth Preloading with Intersection Observer

The 478-Point Model vs. The Old 468-Point Mesh

Landmark Indices for Geometric Measurements

Alignment Detection Before Capture

What I Learned

Failure 1: The `/wasm/index.js` Path

Failure 2: `vision_bundle.mjs` and the Default Export Problem

Failure 3: WordPress and `type="module"`