LLM Embeddings – A Practical Introduction in Python

    In this post, we'll briefly learn what LLM embeddings are, how they work, and how to generate and use them in Python. The tutorial covers:

  1. What are Embeddings?
  2. How LLMs Generate Embeddings
  3. Types of Embeddings
  4. Generating Embeddings with Sentence Transformers
  5. Generating Embeddings with OpenAI API
  6. Measuring Semantic Similarity
  7. Visualizing Embeddings with TSNE
  8. Conclusion
  9. Source Code Listing

     Let's get started. 

Self-Attention Mechanism – A Practical Introduction in Python

    In this post, we'll briefly learn what the self-attention mechanism is, how it works, and how to implement it from scratch in Python. The tutorial covers:
  1. What is Self-Attention?
  2. How Does Self-Attention Work?
  3. Query, Key, and Value Explained
  4. Self-Attention Step by Step
  5. Implementing Self-Attention in Python
  6. Self-Attention with PyTorch
  7. Conclusion
  8. Source Code Listing

Let's get started.

What is an LLM? A Practical Introduction in Python

     In this post, we'll briefly learn what a Large Language Model (LLM) is, how it works, and how to run your first LLM in Python with just a few lines of code. The tutorial covers:

  • What is an LLM?
  • How does an LLM work?
  • Types of LLM architectures
  • Popular LLMs
  • Running your first LLM in Python
  • Source code listing

Tokenization in LLMs – SentencePiece and Byte-level BPE (part-2)

     In the previous tutorial, we explored LLM tokenization and learned how to use BPE and WordPiece tokenization with the tokenizers library. In the second part of the tutorial, we will learn how to use SentencePiece and Byte-level BPE methods. 

    The tutorial will cover:

  1. Introduction to SentencePiece
  2. Implementing SentencePiece Tokenization
  3. Introduction to Byte-level BPE 
  4. Implementing Byte-level BPE Tokenization
  5. Conclusion

     Let's get started.

Tokenization in LLMs – BPE and WordPiece (part-1)

     Tokenization plays a key role in large language models—it turns raw text into a format that the models can actually understand and work with. 

    When building RAG (Retrieval-Augmented Generation) systems or fine-tuning large language models, it is important to understand tokenization techniques. Input data must be tokenized before being fed into the model. Since tokenization can vary between models, it’s essential to use the same tokenization method that was used during the model’s original training.

    In this tutorial, we'll go through the tokenization and its practical applications in LLM tasks. The tutorial will cover:

  1. Introduction to Tokenization
  2. Tokenization in LLMs
  3. Byte Pair Encoding (BPE)
  4. WordPiece
  5. Key Differences Between BPE and WordPiece  
  6. Conclusion

     Let's get started.

Building RAG-Based QA System with LlamaIndex

           In this tutorial, we will implement a RAG (Retrieval-Augmented Generation) chatbot using LlamaIndex, Hugging Face Transformer, and Flan-T4 model. We use a sample industrial equipment documentation as our knowledge base and allow an LLM (Flan-T5) to generate responses using retrieved external data. We also add relevance filtering for accuracy control. The tutorial covers:

  1. Introduction to RAG
  2. Why LlamaIndex?
  3. Setup and custom data preparation
  4. Creating a vector store index
  5. Load a pre-trained LLM (Flan-T5)
  6. Retrieval with relevance check
  7. Enhanced QA method
  8. Execution
  9. Conclusion
  10. Full code listing