Bend Has a Free Language: The Parallel Programming Language That Automatically Uses All Your CPU and GPU Cores

#programming #opensource #gpu #tutorial

You write a recursive function. It runs on one core. You want parallelism, so you add threading primitives, locks, channels, async/await. The code becomes 5x more complex for 4x more performance. What if the compiler could automatically parallelize your code — including recursive algorithms that seem inherently sequential? That's Bend.

What Bend Actually Does

Bend is a programming language where every program is automatically parallel. You write normal-looking functional code — recursion, pattern matching, higher-order functions — and the Bend runtime distributes it across all available CPU cores or GPU threads. No async, no mutexes, no manual thread management.

Bend achieves this through interaction nets, a computation model that's inherently parallel. The runtime (HVM2) can execute on CPU (multi-threaded) or CUDA GPU. The same program that uses 1 core on your laptop uses 10,000 threads on an NVIDIA GPU — with zero code changes.

Created by HigherOrderCO, Bend is open-source under Apache 2.0.

Quick Start

# Install
cargo install hvm
cargo install bend-lang

# Or via pip
pip install bend-lang

A simple parallel program:

# Bend syntax (Python-like)
def sum(n):
  if n == 0:
    return 0
  else:
    return n + sum(n - 1)

def main():
  return sum(1000000)

Run on CPU (all cores):

bend run program.bend       # Single-threaded
bend run-c program.bend     # Multi-threaded C backend
bend run-cu program.bend    # CUDA GPU backend

The recursive sum function automatically distributes across cores.

3 Practical Use Cases

1. Parallel Tree Processing

# Binary tree operations — automatically parallel
type Tree:
  Leaf { value }
  Node { left, right }

def tree_sum(tree):
  match tree:
    case Tree/Leaf:
      return tree.value
    case Tree/Node:
      left = tree_sum(tree.left)
      right = tree_sum(tree.right)
      return left + right

def main():
  tree = gen_tree(24)  # 16M nodes
  return tree_sum(tree) # Parallel across all cores/GPU

Bend's runtime automatically processes left and right subtrees in parallel — recursively, at every level.

2. Parallel Map/Reduce

def parallel_map(f, list):
  match list:
    case List/Nil:
      return List/Nil
    case List/Cons:
      head = f(list.head)
      tail = parallel_map(f, list.tail)
      return List/Cons { head, tail }

def expensive_transform(x):
  # Some heavy computation
  return x * x + fibonacci(20)

def main():
  data = range(0, 100000)
  return parallel_map(expensive_transform, data)

3. Sorting — Parallel by Default

def merge_sort(list):
  match list:
    case List/Nil:
      return List/Nil
    case List/Cons:
      if len(list) <= 1:
        return list
      (left, right) = split(list)
      sorted_left = merge_sort(left)    # Runs in parallel
      sorted_right = merge_sort(right)  # Runs in parallel
      return merge(sorted_left, sorted_right)

Merge sort is naturally parallel — Bend exploits this without any annotation.

Why This Matters

Modern hardware is massively parallel (64-core CPUs, GPUs with thousands of cores), but programming models are still sequential by default. Bend flips this: parallelism is the default, sequential execution is the special case. For compute-heavy workloads — simulations, data processing, tree algorithms — this means dramatic speedups with zero complexity cost.

Bend is still experimental, but the core idea is sound: if we change the computation model, we can get parallelism for free.

Need custom data extraction or web scraping solutions? I build production-grade scrapers and data pipelines. Check out my Apify actors or email me at spinov001@gmail.com for custom projects.

Follow me for more free API discoveries every week!