DEV Community: James Briggs

OpenAI's Whisper is Insane!

James Briggs — Fri, 28 Oct 2022 07:40:09 +0000

OpenAI, the company behind GPT-3 and DALLE-2 have done it again. This time they've build a state-of-the-art transcription and translation model and it's open source!

Check out this article explaining why OpenAI's Whisper is so important, how it works, and how to apply to a real world use-case (making YT search better).

How to Learn (a lot)

James Briggs — Wed, 12 Oct 2022 12:35:04 +0000

The modern world rewards those that can learn and adapt quickly. Both financially and in your own personal fulfilment. Learning new skills keeps life interesting and fresh, but it isn't easy to do.

Continuous learning can fry your brain very quickly. Yet, there's nothing better than expanding the scope of your world in the way that learning can do.

How do we learn (a lot) optimally without hitting roadblocks like burnout?

Multi-modal (image+text) ML

James Briggs — Thu, 11 Aug 2022 15:08:00 +0000

OpenAI Question-Answering Demo

James Briggs — Thu, 07 Jul 2022 15:14:30 +0000

Demo app showing off the cutting edge of language AI, and how to build it

Python 3.10 is out now!

James Briggs — Tue, 05 Oct 2021 10:43:46 +0000

The latest release of Python is out (as of 4th October 2021), it includes several fascinating new features such as:

Structural Pattern Matching
Parenthesized Context Managers
More typing
New and improved error messages

Check out this article covering all of Python 3.10's new features.

Or this video from the article:

BERT WordPiece Tokenizer Explained

James Briggs — Tue, 14 Sep 2021 14:16:57 +0000

Building a transformer model from scratch can often be the only option for many more specific use cases. Although BERT and other transformer models have been pre-trained for many languages and domains, they do not cover everything.

Often, these less common use cases stand to gain the most from having someone come along and build a specific transformer model. It could be for an uncommon language or a less tech-savvy domain.

BERT is the most popular transformer for a wide range of language-based machine learning — from sentiment analysis to question and answering. BERT has enabled a diverse range of innovation across many borders and industries.

The first step for many in designing a new BERT model is the tokenizer. In this article, we’ll look at the WordPiece tokenizer used by BERT — and see how we can build our own from scratch.

Full walkthrough or free link if you don't have Medium!

Why are there so many tokenization methods in Transformers?

James Briggs — Sat, 31 Jul 2021 13:14:32 +0000

HuggingFace's transformers library is the de-facto standard for NLP - used by practitioners worldwide, it's powerful, flexible, and easy to use. It achieves this through a fairly large (and complex) code-base, which has resulted in the question:

"Why are there so many tokenization methods in HuggingFace transformers?"

Tokenization is the process of encoding a string of text into transformer-readable token ID integers. In this video we cover five different methods for this - do these all produce the same output, or is there a difference between them?

📙 Check out the Medium article or if you don't have Medium membership here's a free access link

I also made a NLP with Transformers course, here's 70% off if you're interested!

Angular App Setup With Material - Stoic Q&A #5

James Briggs — Tue, 20 Jul 2021 14:42:27 +0000

Video #5 in a Stoic Q&A ML app build I'm working through - here I'm setting up the Angular project with Google's Angular Material design.

Building a MLM Training Input Pipeline

James Briggs — Mon, 05 Jul 2021 16:50:03 +0000

The input pipeline of our training process is the more complex part of the entire transformer build. It consists of us taking our raw OSCAR training data, transforming it, and preparing it for Masked-Language Modeling (MLM). Finally, we load our data into a DataLoader ready for training!

Similarity Search: Measuring Meaning From Jaccard to Bert

James Briggs — Wed, 30 Jun 2021 09:45:23 +0000

Semantic Search: Measuring Meaning From Jaccard to Bert

I put together this article covering six of the most popular similarity search techniques. Three traditional, and three vector-based. Which are:

Traditional:

Jaccard Similarity
w-shingling
Levenshtein distance

and Vector-based:

TF-IDF
BM25
Sentence-BERT (SBERT)

70% Discount on the Transformers NLP Course

James Briggs — Thu, 17 Jun 2021 14:55:08 +0000

Discount here!

Transformer models are the de-facto standard in modern NLP. They have proven themselves as the most expressive, powerful models for language by a large margin, beating all major language-based benchmarks time and time again.

In this course, we learn all you need to know to get started with building cutting-edge performance NLP applications using transformer models like Google AI's BERT, or Facebook AI's DPR.

We cover several key NLP frameworks including:

HuggingFace's Transformers
TensorFlow 2
PyTorch
spaCy
NLTK
Flair

And learn how to apply transformers to some of the most popular NLP use-cases:

Language classification/sentiment analysis
Named entity recognition (NER)
Question and Answering
Similarity/comparative learning

Throughout each of these use-cases we work through a variety of examples to ensure that what, how, and why transformers are so important. Alongside these sections we also work through two full-size NLP projects, one for sentiment analysis of financial Reddit data, and another covering a fully-fledged open domain question-answering application.

All of this is supported by several other sections that encourage us to learn how to better design, implement, and measure the performance of our models, such as:

History of NLP and where transformers come from
Common preprocessing techniques for NLP
The theory behind transformers
How to fine-tune transformers

We cover all this and more, I look forward to seeing you in the course!

What's New in Python 3.10?

James Briggs — Tue, 08 Jun 2021 16:47:10 +0000

The Python 3.10 release has several new features like structural pattern matching, a new typing Union operator, and parenthesized context managers!

Python 3.10 development has stabilized and we can finally test out all of the new features that will be included in the final release.

We'll cover some of the most interesting additions to Python - structural pattern matching, parenthesized context managers, more typing, and the new and improved error messages.

What's New in Python 3.10? | Towards Data Science

James Briggs ・ Jun 8, 2021 ・
towardsdatascience.com

Free link