DataTechNotes

Few-Shot Prompting with LLMs in Python

In this post, we'll briefly learn what few-shot prompting is, how it works, and how to apply it to real-world NLP tasks to produce more accurate and consistent outputs from a large language model in Python. The tutorial covers:

What is Few-Shot Prompting?
How Few-Shot Prompting Works
Installation and Setup
Zero-Shot vs Few-Shot Comparison
Few-Shot Text Classification
Few-Shot Named Entity Extraction
Few-Shot Structured JSON Output
Few-Shot Style Transfer
Choosing the Right Number of Examples
Conclusion

Let's get started.

How to Limit LLM Response Length with Max Tokens in Python

In this post, we'll briefly learn what max tokens means in the context of large language models, how it controls the length of generated responses, and how to set it effectively for different tasks in Python. The tutorial covers:

What are Max Tokens?
How Tokens are Counted
Installation and Setup
Setting Max Tokens for Short Responses
Setting Max Tokens for Long Responses
Detecting a Truncated Response
Max Tokens for Structured Output Control
Estimating Token Count Before Sending
Choosing the Right Max Tokens Value
Conclusion

Let's get started.

How to Use Top-P and Top-K Sampling in LLMs

In this post, we'll briefly learn what Top-K and Top-P sampling are, how they differ from temperature, and how to tune them to control the quality and diversity of LLM output in Python. The tutorial covers:

What are Top-K and Top-P Sampling?
How Top-K Sampling Works
How Top-P Sampling Works
Installation and Setup
Effect of Top-K on Output
Effect of Top-P on Output
Comparing Top-K and Top-P Directly
Combining Temperature, Top-K, and Top-P
Choosing the Right Sampling Parameters
Conclusion

Let's get started.

How to Control LLM Output Randomness with Temperature in Python

In this post, we'll briefly learn what temperature is in the context of large language models, how it controls the randomness of generated text, and how to set it correctly for different tasks in Python. The tutorial covers:

What is Temperature?
How Temperature Works
Installation and Setup
Comparing Temperature Values Side by Side
Low Temperature for Factual and Structured Tasks
High Temperature for Creative Tasks
Temperature and Top-p Sampling
Choosing the Right Temperature
Conclusion

Let's get started.

How to Use System Prompts to Control LLM Behavior

In this post, we'll briefly learn what a system prompt is, why it is the most powerful lever for controlling LLM behaviour, and how to craft effective system prompts for a variety of real-world scenarios in Python. The tutorial covers:

What is a System Prompt?
How System Prompts Work
Installation and Setup
Setting Tone and Persona
Constraining the Output Format
Restricting the Topic Domain
Controlling Response Length and Style
Chaining System and Few-Shot Prompts
Conclusion

Let's get started.

How to Run a Local LLM in Python with Ollama

In this post, we'll briefly learn what Ollama is, how to set it up, and how to run a local large language model (LLM) entirely on your own machine using Python. The tutorial covers:

What is Ollama?
Installation and Setup
Pulling a Model
Basic Chat Completion
Streaming Responses
Multi-turn Conversation
Generating Embeddings
Using the OpenAI-Compatible API
Conclusion

Let's get started.

Pages

Few-Shot Prompting with LLMs in Python

How to Limit LLM Response Length with Max Tokens in Python

How to Use Top-P and Top-K Sampling in LLMs

How to Control LLM Output Randomness with Temperature in Python

How to Use System Prompts to Control LLM Behavior

How to Run a Local LLM in Python with Ollama