DEV Community: Josiah Bryan

1x -> 10x -> 100x Engineer with Claude Code and CLIs

Josiah Bryan — Fri, 17 Apr 2026 19:08:03 +0000

It is rather amusing to me that this is the sum total of a prompt needed to get Claude Code to take a task from my task system, find the code, look at the production database, debug the problem, propose a solution, update task status for review, test it, and push the fix to production.

Prompting has gone from memorize prompts and articulate myself -> assemble a bunch of skills in a single line:

/bizcoach-task bctask_gdn03rem36apsaoax7vei8qzy /prod-data /team-debug

Creating CLIs for the tools I use everyday (bizcoach and prod DB access) + custom skills to guide CC on all of this has been such an unlock - not just for code, but also for ingesting data from all the com channels - client regularly WhatsApp me bug reports (so I created whatsapp-cli to access WhatsApp - see https://github.com/josiahbryan/whatsapp-cli), then talk about the problems in meetings (so I pull transcripts via read-ai MCP), and also file bugs in ClickUp (so I use composio-cli to talk to ClickUp), and email me screenshots even (composio-cli -> Gmail) - all orchestrated by ClaudeCode + my task app. ... then I just tell another CC window to pull the new task and do it while I go eat lunch.

People used to talk about the 10x engineer. I'd wager I'm getting up to 100x, given that I no longer need 4x juniors + Project Manager + 2x seniors to run all this - payroll used to be $12K/month for me, now it's < $1K/month - and AI spend on tokens for coding is < $3K (Cursor Pro + Claude Pro + extra usages/overages). Fun times we're living in!

I AI-Roasted 40 Famous GitHub Repos — Here's What Vibe-Coded Projects Actually Look Like

Josiah Bryan — Sat, 11 Apr 2026 19:01:59 +0000

The Setup

I've been vibe-coding for a few months — building apps with Cursor, shipping fast, feeling great about it. Then I started wondering: is any of this code actually any good?

Professional code review tools exist, but they cost $24/seat/month and produce reports that read like compliance audits. I wanted something that would give me a straight answer — and maybe make me laugh while delivering bad news.

So I built RoastMyCode.ai. It analyzes any GitHub repo and gives it a letter grade (A+ through F) with specific findings, one-liner "burns," and fix suggestions.

Then I pointed it at 40 repos — everything from React and Next.js to explicitly vibe-coded projects and AI-generated Uber clones.

Here's what I found.

The A-Tier: Suspiciously Clean

create-t3-app (A-)

The verdict: "28,775 stars and they left ONE console.log? This is the coding equivalent of finding a single crumb on a pristine white carpet."

The best burn: "5,460 lines of code and the only sin is a single console.log. This developer either sold their soul or is secretly three senior engineers in a trench coat."

One issue found. One. The T3 stack earned its reputation.

Next.js (A)

Even our analyzer struggled to find things to complain about. Lesson: projects with strong opinionated structure score well because the code has guardrails built in.

The B-Tier: Solid But With Character

React (B+)

Yes, I roasted React. The verdict: "244k stars, 1 eval() — even Facebook can't resist the forbidden fruit."

One unsafe eval() call in 244K stars of code. Meta's engineers are annoyingly competent.

chatbot-ui (B+)

The most-cloned ChatGPT UI on GitHub (33K stars). The verdict: "33k stars for leaving console.log in prod? The internet has questionable taste."

Best burn: "We sent our best bug hunters into this codebase. They came back with two mosquito bites and existential dread."

bolt.diy (B-)

The open-source Bolt.new fork. 19K stars. The irony of a vibe coding tool getting roasted by another AI was chef's kiss.

The verdict: "19k stars, 5 issues, 15k lines — either you're TypeScript wizards or the bugs are really good at hide-and-seek."

Best burns:

"Found XSS in avatar fallback — apparently even profile pictures can't be trusted"
"NetlifyTab.tsx is so large it has its own ZIP code and congressional representative"
"Using 'any' type in TypeScript is like buying a Ferrari and removing the engine"

claude-task-master (B)

Every Cursor user's favorite task manager. The verdict: "This codebase is so clean it made our bug detector file a harassment complaint."

Then it found a memory leak in the destroy method: "Even your cleanup code needs cleanup."

The D and F Tier: Code Quality Emergencies

Onlook (D) — 25K stars, YC-backed

The verdict: "25k stars but still writing 600-line God files and leaving console.logs in prod like it's 2015."

7 issues found. Two God files over 500 lines each. This is what happens when a project grows fast without architecture reviews.

openv0 (F)

An AI UI component generator. The verdict: "Nearly perfect AI playground, but running eval() on GPT output is like giving your toddler a chainsaw."

Code injection vulnerability via eval() on untrusted OpenAI API responses. This is the kind of bug that ends up in a security blog post.

The Patterns

After analyzing 40 repos, five clear patterns emerged:

1. Error handling is universally weak

This was the #1 issue across almost every repo. AI-generated code handles the happy path perfectly and skips the sad path entirely. catch (e) { } — the sound of errors being silently murdered.

2. Architecture degrades past 50 files

Small AI-generated projects are surprisingly clean. But past ~50 files, the architecture starts collapsing. AI doesn't maintain a mental model of the whole system the way a senior engineer does.

3. Console.log is an epidemic

Almost every repo ships debug logging to production. It's the #1 most common finding — breadcrumbs from development that nobody swept up. One tool called it "turning your server logs into a developer's personal diary."

4. UI code is consistently better than backend code

AI tools are genuinely good at generating frontend code. The visual feedback loop helps — you can see when something is wrong. Backend code doesn't have that luxury.

5. Established starters produce measurably better code

Projects that started from create-t3-app, next-forge, or similar opinionated starters scored 20-30% higher than from-scratch projects. Starting with good patterns matters more than which AI tool you use.

What This Means for Vibe Coders

This isn't an argument against vibe coding. Half these repos scored B or above — that's solid code by any standard.

But there are practical takeaways:

Use established starters. The structure matters more than you think.
Get some form of code review. Even automated. The AI is great at generating code; it's not great at questioning its own work.
Pay extra attention to error handling. This is where AI falls down hardest. After every AI coding session, search for empty catch blocks.
Watch the architecture past 50 files. When your project grows, take time to organize. AI won't do it for you.

Try It Yourself

RoastMyCode is free for all public repos. Paste a GitHub URL, get a roast.

https://roastmycode.ai

I'm genuinely curious what grades your projects get. Drop your results in the comments — I'll discuss any interesting findings.

Tackling JSON Perplexity in LLM Outputs: A Weekend Project

Josiah Bryan — Mon, 15 Apr 2024 05:00:42 +0000

This weekend, I dove deep into a problem we often encounter in natural language processing (NLP): ensuring the accuracy and reliability of JSON outputs from large language models (LLMs), particularly when dealing with key/value pairs.

The Challenge

We frequently face the issue of not having direct methods to measure perplexity or log probabilities on function calls from LLMs. This makes it tough to trust the reliability of the JSON generated by these models, especially when it's critical to ensure that each key and value in our outputs not only makes sense but is also based on predictable patterns.

My Solution

To address this, I developed a robust JSON parser. The goal was to extract JSON directly from the stream of log probabilities provided by OpenAI when generating text outputs that contain JSON elements. This parser isn't just about pulling JSON out of the text—it's smart enough to calculate the perplexity and probabilities for each key/value, ensuring that what we get is as accurate as it can be. While JSON parsing can get a bit complex, and my solution isn't flawless, it has passed all my tests and is proving quite robust for my needs.

For example: for a given JSON object generated by an LLM, such as:

{ formalName: 'Josiah Bryan', nickname: 'Joey', ageGuess: 28 }

For that same object, my parser can generate a metadata object with the following data:

{
  formalName: {
    key: 'formalName',
    value: 'Josiah Bryan',
    keyProb: 0.999996,
    valueProb: 0.999957,
    keyPerplexity: 1.000001,
    valuePerplexity: 1.000014,
    finished: true
  },
  nickname: {
    key: 'nickname',
    value: 'Joey',
    keyProb: 0.999996,
    valueProb: 0.872926,
    keyPerplexity: 1.000004,
    valuePerplexity: 1.070314,
    finished: true
  },
  ageGuess: {
    key: 'ageGuess',
    value: 28,
    keyProb: 0.999994,
    valueProb: 0.594872,
    keyPerplexity: 1.000003,
    valuePerplexity: 1.681035,
    finished: true
  }
}

(The finished prop in this example is useful when parsing a stream of chunks. When parsing JSON from a firehose like that, the finished prop is false while the parser is still consuming more tokens for the value. Once the parser hits an end token (e.g. , or ", etc), it flips finished to true so you know the value is final.)

Why It's Cool

This is made practically useful part with a custom yup decorator to actively manage the model's output. If the parser detects that the perplexity of a generated content goes above our comfort threshold, it can automatically tweak the prompt or inject additional grounding into the model’s inputs. This ensures that the generated JSON is not only precise but also deeply rooted in factual accuracy.

For example, here's how the schema is specified with custom max perplexity values per field:

const schema = yup.object().shape({
    formalName: yup
        .string()
        .required()
        .description('Formal name')
        .perplexity({ max: 1.125 }),
    nickname: yup
        .string()
        .required()
        .description('Generated nickname')
        .perplexity({ max: 1.5 }),
    ageGuess: yup
        .number()
        .required()
        .description('Generated age guess')
        .perplexity({ max: 99 }),
});

Then, when passing that to the coaxLLm method, we can also include a callback to add more grounding when perplexity is too high on a given field:

const { content, object, objectWithMetadata, failure } = await coaxLlm({
    prompt,
    schema,
    logger,
    langfuseTrace,
    cacheMode: 'save',
    failureInjectCallback: async ({ type, path }) => {
        if (
            type === 'perplexity' && 
            ['nickname', 'formalName'].includes(path)
        ) {
            return [`My name is: "${authorization.user.name}"`];
        }

        return [];
    },
});

Just in time for a busy upcoming week, this tool has become an indispensable asset in my toolkit, enhancing the grounding of LLM outputs and significantly speeding up JSON generation—a win-win for any developer.

Check Out the Code

Interested in seeing this in action or integrating it into your own projects? Here’s the link to the full code on how to coax and re-ground the LLM effectively: coax-llm.js.

Bonus: Real-Time Streaming

This parser also works seamlessly with streaming outputs from LLMs. This means we can fetch JSON objects and log probabilities in real-time, without waiting for the entire text generation to complete. It’s efficient and allows for immediate adjustments or error handling, boosting both performance and reliability.

Dive Deeper

For those who love digging into the nuts and bolts, here’s a direct link to the parser itself: logprobsToAnnotatedJson.js.

While I haven’t made the underlying detailed benchwork public, the gists provided are self-contained and full of actionable insights. They're not just theoretical but are primed for real-world application, and I'm using them personally in production (pushing them to my k8s cluster tonight, even as I type.)

Looking forward to your thoughts and any feedback you might have!

React Draggable Bottom Panel

Josiah Bryan — Sat, 21 Aug 2021 19:01:32 +0000

TL;DR: Source for BottomPanel.jsx and BottomPanel.module.scss is at
https://gist.github.com/josiahbryan/c220708256f7c8d79760aff37f64948f.

Live Demo: https://josiahbryan.com/#/bottompanel-demo

I've been working on a couple of different projects lately, one involves working on the next-generation marketplace for fringe.us, and the other project is an app for a luxury driving service.

Both of these projects called for a bottom panel that can be partially exposed and then dragged/swiped up to reveal content.

I searched high and low and could not find any acceptable implementations of just such a UI component in React - which was rather shocking, I thought surely someone had solved this rather common UI paradigm already for React!

I found many implementations of the paradigm in non-web-React formats, here's a couple examples that show what I wanted:

React Native: https://github.com/enesozturk/rn-swipeable-panel
Flutter: https://github.com/enesozturk/rn-swipeable-panel

Both of those packages look beautiful and I would love to use them! However, the projects I'm working on require React in a browser, so those packages are not options.

I almost gave up on finding a solution, but yesterday I decided to give it one last try. I thought surely I can implement it myself! I first tried extracting the SwipeableDrawer component from @material-ui's source, but that proved incredibly painful and never got that working.

Then I tried writing a simple implementation of a drawer myself using react-swipeable's awesome hook. That worked okay, but the FPS (especially on mobile) was HORRIBLE. I'm talking ~10-~12 fps when dragging. NOT accetable.

Then, almost as if by providence, I stumbled upon this section in react-swipeable's docs: https://github.com/FormidableLabs/react-swipeable#how-to-use-touch-action-to-prevent-scrolling - that mentioned a package I hadn't looked at yet, use-gesture. By this point, I was exhausted from reading docs and thought that I would just glance at that package, but didn't think anything would be useful. Boy, was I wrong.

I read the docs in use-gesture and was subtly impressed. Then I found their examples page, which led me to their example for an "Action Sheet": https://codesandbox.io/embed/zuwji?file=/src/index.js&codemirror=1 - needless to say, I was incredibly impressed!

I set about porting their code with very minimal tweaks into a reusable BottomDrawer component that had the various extra niceties I wanted:

Drag handle at the top
Customizable open size / closed size
Scrollable content area inside the sheet

After a good two hours of banging my head against the keyboard, I finally solved all the things I needed and created the following beautiful component (screenshot is at the top of this post). I call it <BottomPanel> - I know, so original - my excuse is I like to KISS.

To see a live working example of this component, head over to my website:

https://josiahbryan.com/#/bottompanel-demo

Example of <BottomPanel> closed:

Example of <BottomPanel> open:

Usable like this:



<BottomPanel
    maxOpenHeight={window.innerHeight * 0.8} // px
    closedPanelSize={200} // px
>
    <LoremIpsum />
</BottomPanel>

You can find the full source for BottomPanel.jsx and the required styles (BottomPanel.module.scss) in the following gist:
https://gist.github.com/josiahbryan/c220708256f7c8d79760aff37f64948f.

Cheers!
-Josiah Bryan

LERPing and Cleaning Data to Improve AI Classification

Josiah Bryan — Fri, 27 Sep 2019 15:15:39 +0000

More Training

After my last post on WalkSafe and machine learning classification on running, I've spent a lot of time testing WalkSafe in real-world scenarios personally. I've been mostly favorably impressed with the performance of the classification, but there's been something in the back of my mind telling me I could do better.

I was experiencing a number of false-positives (Driving slow looked like Running, for example, or Walking fast looked like Running), so I decided to retrain my neural network to better generalize for unseen conditions and improve general classification performance from my last article.

Three Big Gains

1. Normalize

The first and biggest gain came when I realized that I was feeding raw speeds (15 m/s, for example) into the neural network and I discovered that it might perform better on 0-1 ranged data. So, I setup a simple normalization routine to normalize/unnormalize the data by setting a MAX speed. Basically, I took the raw speed points and did this for every point:

const inputSpeed = rawSpeed / MAX_SPEED

For my app, I've decided to use 33 m/s as a max speed, which is roughly 75 mph or 110 kph.

I did try experimenting with bucketing speeds (e.g. "snapping to a grid" or rounding to every 2 m/s), as well as averaging speeds together (average two readings into one). These were both done in an attempt to get the network to better generalize with unseen data. However, testing with datasets the network had not seen (and even recall tests) showed that bucketing and averaging produced significant DROPS in performance (recall and generalization.) Therefore, those techniques were discarded.

2. Training Set Structure

Another gain, albeit somewhat smaller, was made by changing the way I loaded by test data.

Originally, I loaded all the data from ~8 separate CSV files, then concatenated all those points into a single array, and finally made ngrams out of that array of points.

This had the unrealized effect of making ngrams out of two separate data sets - when one set ended and the new set was concatenated onto the end, an ngram could span both sets.

Therefore, in order not to "confuse" the network by feeding it training data that was not real, I changed the loading process to something like this:

const csvData = [
   getCsv('file1.csv'),
   getCsv('file2.csv'),
   getCsv('file3.csv')
];

const trainingData = csvData
  .map(lerpData) // see #3 "fill in the gaps", below
  .map(makeNgrams) // from last article: [1,2,3,4] into [[1,2],[3,4]]
  .reduce((list, ngrams) => list.concat(ngrams), []);

The end result is still a giant set of training data points in trainingData, but it doesn't concatenate the points from the different data sets together until after they've been properly transformed

3. Fill in the Gaps

The second largest fundamental generalization and classification gain was made when I realized that there were gaps in the GPS speed readings. Which, of course, is obvious in a real-world collection scenario. However, I came to the conclusion that training the network on a speed transition of 1m/s > 5m/s without any context as to how fast that transition happened would be to deprive it of valuable contextual information that could aid in classification.

In order to capture this concept of time, I decided to normalize the inputs so that every input into the network represented a finite set of time stamps with a finite interval between each input. (Before, every input was NOT guaranteed to have a finite, fixed interval between each input.)

In order to accomplish this "finite, fixed interval" guarantee, I used a very simple concept, Linear interpolation.

Thanks to mattdes on GitHub, I've found this lerp function (MIT licensed) useful in a number of my projects and I've reused it many times. Here it is in it's entirety:

//https://github.com/mattdesl/lerp/blob/master/index.js
function lerp(v0, v1, t) {
    return v0*(1-t)+v1*t
}

The entirety of my lerping routine to normalize my data is shown below, in hopes that perhaps someone else might find it useful.

In short, it takes a set of points that look like {speed:1.5, timestamp: '2019-09-26 02:53:02'}, and if the points are more than 1 second apart, this routine interpolates the speeds between the two points at 1-second steps.

The return list from this routine will be "guaranteed" to have data at 1 second intervals, so that every point into the neural network is guaranteed to have a difference of 1 second. This allows the network to better capture the idea of "speed of change" in the readings.

function lerpRawData(rawData) {
    const lerped = [];
    rawData.map((row, idx) => {

        const speed = parseFloat(row.speed);
        if(idx === rawData.length - 1) {
            // at end, don't do lerp
            lerped.push({ ...row });
            return;
        }

        // Already checked if we're at end, so this doesn't need check
        const nextIdx  = idx + 1,
            nextRow    = rawData[nextIdx],
            thisTime   = new Date(row.timestamp).getTime(),
            nextTime   = new Date(nextRow.timestamp).getTime(),
            nextSpeed  = parseFloat(nextRow.speed), 
            delta      = nextTime - thisTime;

        // Step between the two timestamps in 1000ms steps
        // and lerp the speeds between the timestamps based on percent distance
        for(let time=thisTime; time<nextTime; time+=1000) {
            const progress   = (time - thisTime) / delta;
            const interSpeed = lerp(speed, nextSpeed, progress);
            const interTimestamp = new Date(time);
            const d = {
                ...row,
                timestamp: interTimestamp,
                speed:     interSpeed,
                progress, // just for debugging
            };

            // Just for debugging
            if(time > thisTime && time < nextTime)
                d._lerped = true;

            lerped.push(d);
        }
    });
    return lerped;
}

4. Hidden Layers

I know the headline said three big gains, but it's worth mentioning here that an additional hidden layer appeared to aid in generalization as well. My hidden layer setup now looks like this:

hiddenLayers: [ inputSize * 2, inputSize * 1.5 ]

This produces a network similar to this hackish pseudocode:

inputSize = 4
[ * , * , *, * ] # inputs (ngram size)
[ * , * , *, * , *, *, * ] # hidden layer 1
[ * , * , *, * , * ] # hidden layer 2
[ * , * , *, * ] # outputs (4 classes)

Conclusion

With these tweaks, my network now has slightly reduced recall across the board but exhibits consistently improved generalization. Performance on unseen data is now consistently at greater than 85% accuracy.

Personal Safety, GPS, and Machine Learning: Are You Running from Danger?

Josiah Bryan — Fri, 20 Sep 2019 15:15:29 +0000

Imagine that you're getting a text every minute from your best friend, and all it has in that text is their current speed. Then you have to write back to them what you think they're doing - are they walking, running, driving, or sitting still?

In my app, I went from "Hey, I've got some GPS points being streamed to my server" to "real-time machine learning classification triggering push notifications" and it took me less than a day of coding. Here's how I did it.

Walk Safe

That's exactly the scenario I'm addressing in an app I'm making. I get a GPS speed reading from the user, and I want to know if they're walking, running, etc. This app is called "WalkSafe", and making it available for free in the Play Store and App Store. (Not published yet - still in the review stages, hence why I have time to blog while waiting for the reviewers to approve it!)

I decided to create WalkSafe after my sister moved into an apartment with her young son where she felt very unsafe. It was a good move for her, but being a single mom and out at night alone - well, she felt unsafe. My family lived near by, but sometimes she might not be able to whip out her phone and call if something happened. Enter the idea for "WalkSafe."

With WalkSafe, you can set a timer when you're in danger. If the timer goes off before you stop it, a SMS and voice phone call is sent to your emergency contacts with your location and any notes you enter. Of course, if you get to where you're going safely, you just stop the timer and all is well! But if you can't stop it for whatever reason, our cloud servers will monitor your timer and if it goes off, the SOS is sent immediately. That means that even if your phone is destroyed, offline, or no service, the SOS still gets sent.

When you set the timer in WalkSafe, it starts recording your GPS location and streaming it to the server for the duration of the timer. No GPS is stored before or after, only while you're in danger. However, I felt like simply logging the GPS while in danger wasn't enough. I thought there might be some way I can use the GPS to try to tell if the person using the app is in danger (or safe) without their interaction.

Drawing the Line

That's how we arrive at this example at the start - how do we interpret a stream of speeds coming in with no other context? How do we decide if it represents running/driving/walking/etc?

Sure, sitting still is easy. Less than 0.5 m/s? Probably sitting still. What about driving? Over 15 m/s? Yeah, probably driving. But then it get's fuzzy. Where do you draw the line at for walking? Running? How do you tell running from driving based on just speed?

To answer those questions, you can do one of two things (or three, but I'll get back to that.) You can either:

Write a bunch of if/then statements, taking into account the last few speed readings from them, how long they've been at that speed, what they did this time yesterday, etc.
Train a simple neural network to classify data for you while you sit and drink tea.

Obviously, since this post is tagged #machinelearning, I decided to use a neural network.

In my case, I used the excellent brain.js library since I was writing my server in javascript. I've also used brain.js in the bast, and I've found it to be incredibly easy to use and quick to pick up and implement in a project.

All in all, going from "Hey, I've got some GPS points being streamed to my server" to "real-time machine learning classification triggering push notifications" took me less than a day of coding. Here's basically how I did it.

Client-side, I'm using the Cordova project to make the Android/iOS apps, writing my UI in React, and utilizing the excellent @mauron85/cordova-plugin-background-geolocation plugin to stream GPS to my server in the background.

Server-Side Magic

The server is where the magic happens.

Everyone knows that to train a neural network you need labeled data. You put data in, run the training, get a trained set of weights, then use it later. Pretty simple, yes? Well, allow me to walk you though how I did it and the interesting parts along the way.

Gathering Data

I started by just logging a ton of GPS points from my own usage of the app. Over the course of two days, I logged GPS points when I was walking, running, driving, walking to my car and driving, running up to my car and driving, driving, parking, then walking, and many other scenarios. I kept a notebook with timestamps of when I did each action as well.

Labeling Data

Later, I dumped the timestamps and speeds to a CSV file and applied a simple naïve pre-labeling of the speeds. (E.g. 0m/s=STILL, <2m/s=WALKING, <10m/s=RUNNING, >10m/s=DRIVING) Then I opened each of the CSV files and compared the timestamps to my notebook, making sure the naïve labels were correct. Changed a lot of DRIVING>RUNNING or RUNNING>DRIVING when I was driving slow, stuff like that. When I was done, I had a set of ~5,000 speed measurements in CSV files, all hand-labeled with activity labels from a simple set of STILL, WALKING, RUNNING, or DRIVING.

Formatting Data: N-Grams

Now I had a set of speed measurements in sequence, looking something like:

[ 0, 1.2, 0.78, 1.9, 2.1, 1.8, 2.8, 3.3, 3.6, 4.1, 3.3, 4.9, 5.7 ]

Can you see anything interesting in that? (Assume they are meters per second) If you look carefully, you'll notice an uptick where they start to trend above 2 m/s for a while - right there is where I started to run. Before that, I was walking.

In order to capture sequentiality in my data, I decided to train my network with a set of points representing the previous X values, with the final value being the "current" point we are classifying. This is similar in concept to n-grams in language modeling, where they break up a sequence of text into a set of finite item sets. Ex. given "abcd" and an n-gram size of two, we could generate "ab", "bc", "cd".

Therefore, I wrote a simple makeNgramsTrainingNN routine that took the raw stream of speeds and packaged them into sets of speed readings. It was a lot like taking a sliding window of a fixed size and running it over my data set, one item at a time, and recording each set of data inside the window as a new "n-gram". So my makeNgramsTrainingNN routine would take an array of speed objects (speed and label), and return a new array that looked like this:

[
  { input: { speed0: 0, speed1: 1.2, speed3: 0.78 }, output: { WALKING: 1 } }, 
  { input: { speed0: 1.2, speed1: 0.78, speed3: 1.9 }, output { WALKING: 1 } },
  { input: { speed0: 0.78, speed1: 1.9, speed3: 2.1 }, output { WALKING: 1 } }
]

The label is always the label from my hand-edited data set for the last speed value in the n-gram.

Training the Neural Network

Then, I had to decide how I wanted to train my network - and what type of network to use. After much trial and error, I found that brain.CrossValidate worked amazingly well to reduce error rates.

Once I had all my n-grams in a nice big ngrams array, all I had to do to train the network was this:

const trainingOptions = {
    iterations: 35000,
    learningRate: 0.2,
    hiddenLayers: [ngramSize+2],
    log: details => console.log(details),
};

// Use CrossValidation because it seems to give better accuracy
const crossValidate = new brain.CrossValidate(brain.NeuralNetwork, trainingOptions);

// Found it doesn't do us any good to specify kfolds manually
const stats = crossValidate.train(ngrams, trainingOptions);

// Convert the CV to a nerual network for output (below)
const net = crossValidate.toNeuralNetwork();

Once I had the network trained, I saved it to a json file so I could use it in real time to classify GPS:

// Stringify the nerual network 
const json = JSON.stringify(net.toJSON());
const outFile = 'gps-speed-classifier.net.json';
fs.writeFileSync(outFile, json);

It was pure trial and error to discover that iterations of 35000 was a good number, and to discover that adding a hidden layer sized at my ngramSize + 2 was a good number. All just testing and re-testing and seeing what error rates came out.

For what it's worth, I'm using an ngramSize of 6 - which means my neural network sees 6 speed readings at once to make it's classification decision. I've configured the GPS plugin client-side to try to send me GPS readings every 1000ms, so an ngram size of 6 means approx 6 seconds of data is used in training and classification. It's important to note that I must use the same ngram size when using the trained network in production.

Proving to Myself it Worked

To test the error rates, first I bucketed all my training ngrams by class and tested the recall rates on each of the classes. I considered the training a success when I received >95% recall rate for every class.

The final test I did on every trained network was to take a single "session" of data and run it through as if it was being streamed live, and compare the predicted labels with the hand-labeled data. Once I hit over 90% accuracy on that, I was happy.

Getting from "hand labeling data sets" to finally having a trained network that I was happy with took roughly 6 hours or so of testing and trial and error.

Integrating the Trained Network into the App

Integrating it into the app was a very quick process by comparison - maybe two hours, if that. I created a "simple" class I call GpsActivityClassifier that loads the trained network weights from gps-speed-classifier.net.json. This class is responsible for the classification and updating of the user's "motionState"

The app's API into the GpsActivityClassifier is deceptively simple:

const result = await GpsActivityClassifier.updateUserMotionState(gpsLogEntry);

The gpsLogEntry is our internal database record for the current GPS entry. Really the only thing the classifier needs from the log entry is the speed, the current timer, and the user that we're classifying.

Internally, it is rather simple, but the code looks a bit more complex, so I'll break it down here. Internally, updateUserMotionState looks something like this:

Take the timestamp of the given gpsLogEntry and load the previous ngramSize entries for the current timer
Convert that list of X entries (which looks like [{speed:0.1,...},{speed:0.5,...}, {speed:1.23,...}, ...]) into a single ngram object that looks like {speed0:0.1, speed1:0.5, speed2:1.23, ...}. The conversion code looks like:

const ngram = {};
Array.from(speedValues)
    .slice(0, TRAINED_NGRAM_SIZE)
    .forEach((value, idx) => ngram[`speed${idx}`] = value);

After making the ngram, it uses the preloaded brain.js NeuralNetwork object (with weights already loaded from disk) to run the ngram like this:

const rawClassification = this.net.run(ngram);
const classification = maxClass(rawClassification);

The utility maxClass(...) just takes the raw output of the final layer of the network and returns the predicted class label that has the highest probability.

Pressure to Change

At this point, we have a predicted label (predictedState) for the gpsLogEntry. But here's where we do that "third thing" we hinted at earlier in this blog.

Instead of just applying the predictedState directly to the user and calling it that user's current motionState, we apply a little bit of hard logic to the state.

We don't just want the user's motionState to oscillate wildly if the classification changes quickly from one point to the other, so I built in a simple "pressure" mechanism whereby the prediction must stay stable for at least CLASSIFICATIONS_NEEDED_TO_CHANGE counts. Through trial and error, I found 5 to be a good number.

That means that for a given gpsLogEntry, the classifier may return RUNNING. Only after it returns RUNNING for five continuous gps readings do we then update the user's motionState. Should the classifier go to a different classification before it hits 5 times, the counter starts over. (For example, if on the 3rd point the classifier returns DRIVING, we reset the counter and wait for 5 points until we actually set the user's motionState to DRIVING.)

Change is Good (or Bad)

Once the counter to change motionStates is actually met, we update the user record in the database with the new motionState and return to the caller of our GpsActivityClassifier.updateUserMotionState method an object that looks like { changed: "DRIVING", confidence: 0.98, previousState: "RUNNING" }. I consider this an "event", since we only get a return value of { changed: truthy } if the user's motionState ACTUALLY changed. All other times, if classification stayed the same or was "about to change", the object would look like {changed: false, ...}.

So what do we do with a changed event when it occurs?

In the case of WalkSafe, what we do with this event is we run a bit of "business logic" when the change happens. We take the stateFrom (previousState) and the stateTo (changed), build up a simple transition map (txMap) that defines valid/useful transitions, and then react accordingly.

For kicks and grins, here's what our txMap looks like in WalkSafe:

const { WALK, RUN, DRIVE, STILL } = GpsActivityClassifier.CLASSIFIER_STATES,
    OK_30   = 'OK_30',
    OK_60   = 'OK_60',
    SAFE_60 = 'SAFE_60',
    SAFE_5  = 'SAFE_5',
    NOOP    = 'NOOP',
    txMap   = {
        [ WALK + RUN  ]: OK_30,
        [STILL + RUN  ]: OK_30,
        [DRIVE + RUN  ]: OK_60,
        [STILL + DRIVE]: SAFE_60,
        [ WALK + DRIVE]: SAFE_60,
        [  RUN + DRIVE]: SAFE_60,
        [  RUN + WALK ]: SAFE_5,
        [  RUN + STILL]: NOOP,
        [ WALK + STILL]: NOOP,
        [DRIVE + STILL]: NOOP,
        [STILL + WALK ]: NOOP,
        [DRIVE + WALK ]: NOOP,
    };

Then we just query the txMap when the user's motionState changes with the from and the to state, and react accordingly. For illustrations sake, here's what that looks like as well:

const txTest = stateFrom + stateTo,
    txAction = txMap[txTest];

if(!txAction) {
    // Should never encounter, but if we find a tx we don't have defined,
    // we throw which should be caught by Sentry and dashboarded/emailed
    throw new Error(`Undefined transition from ${stateFrom} to state ${stateTo})`);
}

switch(txAction) {
    case OK_30:
    case OK_60: {
        const time = txAction === OK_60 ? 60 : 30;
        return await this._txAreYouInDanger({ time, stateTo, stateFrom, ...props });
    }
    case SAFE_60:
    case SAFE_5: {
        const time = txAction === SAFE_60 ? 60 : 60 * 5;
        return await this._txAreYouSafe({ time, stateTo, stateFrom, ...props });
    }
    default: 
        // NOOP;
        break;
}

Won't go into detail on the _txAreYouSafe or _txAreYouInDanger functions, but they basically add to (if safe) or set (if in danger) the remaining time in the running timer, and then send a push notification via Firebase to the user's device.

To tie a bow on it though, here's what it looks like to send the push notification shown in the screenshot at the top of this article:

// Triggered possible danger scenario, so reduce time remaining
// to only `time` seconds...
await timer.setSecondsRemaining(time);

// Alert the user to this change ...
user.alert({
    // Channel is Android-specific and MUST EXIST OR 
    // NO NOTIFICATION DELIVERED on Androids. 
    // See list in client/src/utils/NativePushPlugin of valid channels.
    channel: "sos",
    title: "Are you running??",
    body:  `
        If you're not okay, KEEP RUNNING! We'll send an SOS in 
        less than a minute unless you stop the timer or add more time. 
        Don't stop unless it's safe to do so!
    `,

    // onClick is base64-encoded and sent via Firebase 
    // as the action URL for this push notification
    onClick: {
        // This event key is "special":
        // When the user clicks on the notification,
        // our app will emit this event on the ServerStore object...
        // Any other properties in this onClick handler are passed as
        // a data object to the event. This is emitted in PushNotifyService.
        // Obviously, the event does nothing unless some other part of the
        // app is listening for it.
        event:  'gps.areYouInDanger',
        // Extra args for the event:
        timerId: timer.id,
        stateTo, 
        stateFrom,
    },
});

Walk Safely but Run if Needed, We've Got You

The combination all of this effects an additional safeguard for people using WalkSafe. If they set a danger timer, but start running in the middle of the timer, the server will recognize this state change, reduce the time left on the timer so it will send an SOS right away if they are in fact running from danger.

And that's how we tie Personal Safety, GPS, and Machine Learning together to improve the real-world safety of people who use a simple personal safety SOS timer!

Beta Testers Wanted

If you want to test out this app, send me a message. Or if you're interested in working with me on the app, I'd be open to talking! And if you're interested in hiring me for consulting work - drop me a line as well! You can reach me at josiahbryan@gmail.com. Cheers and crackers!