A Brief History Of Technology Behind ChatGPT

Large Language models have gained huge popularity recently because of text generation and image generation models such as ChatGPT, Midjourney, Stable Diffusion, etc. These models are the results of the history of natural language processing and neural networks.

LLM apps

Rule-based systems

Let’s start from the 1960s when computers were gaining popularity in various industries and automation sectors. Language models marked their presence in the Computer Science field with rule-based systems. Rule-based systems were the basis of early natural language processing (NLP). These rule-based systems consisted of handcrafted grammatical rules and dictionaries to parse and understand the language and make decisions based on the text. Linguists and computer scientists collaborated to create such systems. ELIZA (1964) and SHRDLU (1972) are examples of such systems.

Statistical methods

In the 1980s, statistical methods became mainstream because of the rise in computational power to handle statistical algorithms. Scientists used probabilities derived from large-text data to make predictions about language. These new models, such as Hidden Markov Models (HMMs) and n-gram models, were more powerful than rule-based systems.

Machine Learning

In the 1990s, machine-learning models were gaining popularity. New language models also started emerging using machine learning such as decision trees and support vector machines (SVM). These models learned from the large dataset and then used the learned knowledge for predictions and analysis.

Neural Networks

In the 2000s, neural networks started getting popular because of the increase in computational power to handle the training of large data. Language models used simple feed-forward networks initially, and later new models emerged to handle sequential data more effectively. Models such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTMs) models could capture sequential data fairly accurately but they were slow to train on the large data.


In the 2010s, there was a breakthrough in language models. The development of Transformer architecture by Google in 2017 marked a significant step in the explosion of language models specifically large language models. The self-attention mechanism of the Transformer model allows to measure the significance of different words in a text without the need of sequential processing. This model is significantly faster than RNNs and LSTMs in training large text. After this breakthrough, BERT was created by Google (Bidirectional Encoder Representations from Transformer) which could understand the context and meaning of a word based on all the words in the sentence, not just the ones before it. Hence the name bidirectional.

In early 2020s, there was an influx of even larger language models, each trying to outperform the other. Transformer architecture enabled the language models to train on extremely massive datasets. One such example is GPT, which used the Transformer for more generic tasks, capable of understanding and generating human like text.

Now we have to wait and see which new breakthrough we will encounter in the field of language models in future.


Tech and programming enthusiast

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *