High Signal AI

High Signal AI

The Evolution of Tokenization - Byte Pair Encoding in NLP

Introduction to Byte Pair Encoding

Harshit Tyagi's avatar
Harshit Tyagi
Oct 03, 2021
∙ Paid
1
Share

NLP may have been a little late to the AI epiphany but it is doing wonders with organisations like Google, OpenAI releasing state-of-the-art(SOTA) language models like BERT and GPT-2/3 respectively.

GitHub Copilot and OpenAI codex are among a few very popular applications that are in the news. As someone who has very limited exposure to NLP, I decided to…

Keep reading with a 7-day free trial

Subscribe to High Signal AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Harshit Tyagi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture