AI-Agents Cheatsheet

What is an AI Agent?

▼

An AI agent is simply a digital helper that can sense, think, and act on its own to reach goals.

Key Characteristics:

Autonomy: Operates without constant human intervention

Reactivity: Responds to environmental changes

Proactivity: Takes initiative to achieve goals

Social Ability: Interacts with other agents or humans

Agent Types:

Simple Reflex: reacts to inputs & predefined rules

Model-Based: maintains memory

Goal-Based: pursues specific objectives

Utility-Based: maximizes rewards

Learning Agents: learns from experience

Agent Architecture

▼

Core Components:

Perception: Sensors/APIs to gather environmental data

Reasoning: Decision-making logic (LLM, rule-based, ML)

Memory: Short-term (context) and long-term (vector DB)

Planning: Strategy formulation for goal achievement

Action: Tools/actuators to interact with environment

Agent Loop:
1. Perceive → Observe environment state
2. Think → Process information & plan
3. Act → Execute actions via tools
4. Learn → Update memory & strategy

Memory Systems

▼

Memory Types:

Short-term: Current conversation context (LLM context window)

Long-term: Persistent knowledge (Vector databases)

Episodic: Past interactions (Traditional DB)

Semantic: Domain knowledge (Knowledge graphs)

Vector DBs:

Pinecone: managed, scalable

Weaviate: GraphQL, hybrid search

Chroma: lightweight, dev-friendly

Qdrant: high performance, Rust

Milvus: large-scale, production

ReAct Pattern

▼

Reasoning + Acting - Interleave reasoning and action steps

Thought: I need to find the current weather
Action: search("weather in San Francisco")
Observation: Temperature is 65°F, partly cloudy

Thought: Now I have the weather information
Action: respond("It's 65°F and partly cloudy")

Benefits:

Interpretable: Decision-making process is transparent

Dynamic: Task decomposition happens on the fly

Resilient: Error recovery through reasoning

Effective: Better handling of multi-step tasks

LangChain

▼

Framework for building LLM-applications

from langchain.agents import initialize_agent, load_tools
from langchain.llms import OpenAI

# Define a simple LLM (replace with your API key)
llm = OpenAI(temperature=0, openai_api_key="your-openai-api-key")

# Load built-in tools or define custom ones
tools = load_tools(["llm-math"], llm=llm)

# Initialize the agent
agent = initialize_agent(
    tools, llm, agent="zero-shot-react-description"
)

# Run the agent
result = agent.run("What's 25% of 842?")
print(result)

Key Features:

Agent Types: Pre-built agent types (ReAct, Plan-and-Execute)

Tool Calling: Tool/function calling abstractions

Memory: Memory management (conversation, vector)

Chains: Chain composition for complex workflows

Integrations: 200+ integrations (LLMs, vector DBs, APIs)

LangGraph

▼

Build stateful, multi-actor agents as graphs (by LangChain team)

from langgraph.graph import StateGraph

workflow = StateGraph(AgentState)
workflow.add_node("research", research_node)
workflow.add_node("write", write_node)
workflow.add_edge("research", "write")
workflow.set_entry_point("research")

app = workflow.compile()

Use Cases:

Multi-Agent: Multi-agent systems with coordination

Workflows: Complex, branching workflows

Human-in-Loop: Human-in-the-loop patterns

Cyclic: Cyclic agent behaviors

Stateful: Persistent conversation state

AutoGen

▼

Microsoft's framework for multi-agent conversation systems

from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent("assistant")
user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config={"work_dir": "coding"}
)

user_proxy.initiate_chat(
    assistant,
    message="Plot a chart of stock prices"
)

Key Strengths:

Code Execution: Automatic code execution in sandboxes

Collaboration: Multi-agent debate/collaboration

Patterns: Built-in conversation patterns

Feedback: Teaching/feedback mechanisms

CrewAI

▼

Role-based agent collaboration framework

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Find latest AI trends",
    backstory="Expert researcher..."
)

writer = Agent(
    role="Writer",
    goal="Write engaging articles",
    backstory="Professional writer..."
)

crew = Crew(agents=[researcher, writer])
crew.kickoff()

Features:

Role-Playing: Agent personas with roles and backstories

Task Execution: Sequential & parallel task execution

Orchestration: Process orchestration (sequential, hierarchical)

Delegation: Built-in delegation patterns

LlamaIndex

▼

Data framework for LLM applications with RAG focus

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("Summarize the document")

Best For:

RAG: Retrieval-Augmented Generation

Document Q&A: Document question-answering systems

Knowledge Base: Knowledge base integration

Connectors: Data connector ecosystem (100+)

Semantic Kernel

▼

Microsoft's SDK for integrating LLMs into applications

import semantic_kernel as sk

kernel = sk.Kernel()
kernel.add_text_completion_service("gpt", OpenAITextCompletion())

skill = kernel.import_semantic_skill_from_directory("./skills")
result = await kernel.run_async(skill["Summarize"])

Features:

.NET Integration: Enterprise-grade .NET integration

Skills/Plugins: Skills/plugins architecture

Planner: Planner for automatic orchestration

Memory: Memory connectors

Tool/Function Calling

▼

Enable agents to use external tools and APIs

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=messages,
    tools=tools
)

Best Practices:

Naming: Clear, descriptive function names

Parameters: Detailed parameter descriptions

Error Handling: Error handling and validation

Rate Limiting: Rate limiting and retries

Logging: Logging for debugging

Chain of Thought (CoT)

▼

Improve reasoning by making agents think step-by-step

Encourages detailed, logical problem solving

Prompt: "Let's solve this step by step:

1) First, identify what we know
2) Then, determine what we need to find
3) Break down the problem into smaller steps
4) Solve each step
5) Combine the results

Problem: If a store has 25% off sale..."

Variants:

Zero-shot CoT: "Let's think step by step"

Few-shot CoT: Provide example reasoning

Self-Consistency: Sample multiple paths, vote

Tree of Thoughts: Explore multiple reasoning paths

RAG (Retrieval-Augmented Generation)

▼

Enhance agents with external knowledge retrieval

Feeds context to LLM & reduces hallucinations

1. User Query → Embed query into vector
2. Vector Search → Find relevant documents
3. Context Assembly → Combine top-k results
4. LLM Generation → Generate with context
5. Response → Return grounded answer

Implementation Tips:

Chunking: Chunk documents (500-1000 tokens optimal)

Embeddings: Use semantic embeddings (OpenAI, Cohere, local)

Reranking: Implement reranking for better results

Filtering: Add metadata filtering (date, source, type)

Monitoring: Monitor retrieval quality metrics

Reflection & Self-Critique

▼

Agent evaluates and improves its own outputs

Loop:
1. Generate initial response
2. Critique: "What are weaknesses in this response?"
3. Refine: "Improve based on critique"
4. Repeat until quality threshold met

Example:
- Generate code
- Check for bugs/inefficiencies
- Refactor and improve
- Validate against requirements

Techniques:

Self-Refinement: Iterative improvement

Constitutional AI: Critique against principles

Debate: Multiple agents critique each other

Plan-and-Execute

▼

Separate planning from execution for complex tasks

Phase 1 - Planning:
- Analyze goal
- Break into subtasks
- Order dependencies
- Create execution plan

Phase 2 - Execution:
- Execute each subtask
- Monitor progress
- Handle failures
- Adapt plan if needed

Benefits:

Multi-Step: Better handling of multi-step tasks

Tracking: Clear progress tracking

Debugging: Easier debugging and recovery

Efficiency: Reduced token usage (plan once, execute many)

Multi-Agent Collaboration

▼

Multiple specialized agents work together

Patterns:

Debate: Agents argue different perspectives

Delegation: Manager assigns tasks to specialists

Cooperation: Agents work on shared goal

Competition: Best solution wins

Example Roles:

Researcher: gathers information

Analyst: processes data

Writer: creates content

Critic: evaluates quality

Executor: implements actions

OpenAI

▼

Models:

GPT-4 Turbo: Best reasoning, function calling
GPT-4o: Multimodal, fast, cost-effective
GPT-3.5 Turbo: Fast, affordable

from openai import OpenAI

client = OpenAI(api_key="...")
response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "Hello"}],
    tools=[...]  # Function calling
)

Best For:

Reasoning: Complex reasoning tasks

Tool Calling: Function/tool calling

Structured Output: JSON mode for structured output

Anthropic Claude

▼

Models:

Claude 3.5 Sonnet: Best overall, 200K context

Claude 3 Opus: Most capable

Claude 3 Haiku: Fast, affordable

import anthropic

client = anthropic.Anthropic(api_key="...")
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Strengths:

Context: Long context (200K tokens)

Quality: Excellent at analysis and writing

Safety: Strong safety features

Open Source Models

▼

Popular Models:

Llama 3: Meta's latest (8B, 70B, 405B)

Mistral: Efficient European models

Mixtral: Mixture of Experts (8x7B, 8x22B)

Gemma: Google's lightweight models

Phi-3: Microsoft's small models

Deployment Options:

Ollama: local model runner

LM Studio: GUI for local models

vLLM: high-performance serving

TGI: text generation inference

llama.cpp: C++ inference

Specialized Providers

▼

Providers:

Cohere: Embeddings, RAG, multilingual

Together AI: Open model hosting, inference

Replicate: Easy model deployment

Hugging Face: Model hub, inference API

Anyscale: Ray-based scaling

Vector Databases

▼

Databases:

Pinecone: Managed, scalable (Cloud)

Weaviate: GraphQL, hybrid search (Cloud/Self)

Chroma: Lightweight, dev-friendly (Self)

Qdrant: High performance, Rust (Cloud/Self)

Milvus: Large-scale, production (Cloud/Self)

pgvector: PostgreSQL extension (Self)

Embeddings

▼

Popular Models:

text-embedding-3-large: 3072 dims (OpenAI)

embed-english-v3.0: 1024 dims (Cohere)

all-MiniLM-L6-v2: 384 dims (HuggingFace)

bge-large-en-v1.5: 1024 dims (BAAI)

Use Cases:

Search: Semantic search in knowledge bases

Similarity: Document similarity

Classification: Clustering and classification

Observability & Monitoring

▼

LLM Observability:

LangSmith: LangChain's debugging platform

Weights & Biases: Experiment tracking

Arize AI: LLM monitoring & evaluation

Helicone: LLM logging & analytics

Traceloop: OpenTelemetry for LLMs

Key Metrics:

Latency: p50, p95, p99 performance

Costs: Token usage & costs

Errors: Error rates & types

Satisfaction: User satisfaction scores

Tool Success: Tool calling success rates

Evaluation & Testing

▼

Frameworks:

LangChain Eval: Built-in evaluators

PromptFoo: Test prompts, compare models

RAGAS: RAG system evaluation

Deepeval: Unit testing for LLMs

TruLens: Feedback & evaluation

# Example evaluation metrics
- Correctness: Does it answer correctly?
- Relevance: Is response on-topic?
- Faithfulness: Grounded in context?
- Coherence: Logical and consistent?
- Helpfulness: Useful to user?

Prompt Engineering

▼

Core Principles:

Be Specific: Clear instructions and context

Use Examples: Few-shot learning works well

Set Constraints: Format, length, style requirements

Chain Prompts: Break complex tasks into steps

Iterate: Test and refine prompts

Good Prompt Template:

Role: You are an expert [domain] assistant
Context: [Relevant background information]
Task: [Specific task description]
Format: [Output format requirements]
Examples: [1-3 example inputs/outputs]
Constraints: [Any limitations or requirements]

Error Handling

▼

Strategies:

Retry Logic: Exponential backoff for API failures

Fallbacks: Backup models or degraded modes

Validation: Check tool outputs before using

Timeouts: Prevent hanging operations

Circuit Breakers: Stop cascading failures

try:
    result = agent.run(query)
except RateLimitError:
    time.sleep(exponential_backoff())
    result = agent.run(query)
except InvalidToolOutput:
    result = agent.run_with_fallback()
finally:
    log_metrics(result)

Cost Optimization

▼

Techniques:

Model Selection: Use appropriate model for task

Caching: Cache similar queries and embeddings

Context Management: Trim unnecessary context

Batch Processing: Group similar requests

Streaming: Start processing before full response

Model Recommendations:

Simple Q&A: GPT-3.5, Claude Haiku

Complex Reasoning: GPT-4, Claude Opus

Long Context: Claude (200K)

Code Generation: GPT-4, Claude Sonnet

Security & Safety

▼

Considerations:

Input Validation: Sanitize user inputs

Output Filtering: Check for harmful content

Sandboxing: Isolate code execution

API Key Management: Secure credential storage

Rate Limiting: Prevent abuse

Audit Logging: Track all agent actions

# Security checklist
✓ Never expose API keys in code
✓ Use environment variables
✓ Implement content filters
✓ Validate all tool outputs
✓ Set execution timeouts
✓ Log security events
✓ Regular security audits

Customer Support Agents

▼

Capabilities:

FAQs: Answer FAQs using knowledge base (RAG)

Tickets: Ticket creation and routing

Status: Order status lookup via APIs

Sentiment: Sentiment analysis and escalation

Multi-language: Multi-language support

Tools Needed:

Vector DB: document storage

CRM API: customer data

Ticketing System: support tickets

Translation API: language support

Personal Assistants

▼

Capabilities:

Calendar & Email: Calendar and email management

Tasks: Task planning and tracking

Information: Information retrieval

Scheduling: Appointment scheduling

Reminders: Reminders and notifications

Integrations:

Google Calendar: scheduling

Gmail: email

Notion: notes

Slack: messaging

Todoist: tasks

Research & Analysis

▼

Capabilities:

Search: Web search and information gathering

Analysis: Document analysis and summarization

Extraction: Data extraction and structuring

Intelligence: Competitive intelligence

Reports: Report generation

Architecture:

Researcher: Gather information

Analyst: Process and analyze

Writer: Create reports

Critic: Review and refine

Code Generation & Review

▼

Capabilities:

Generation: Generate code from requirements

Review: Code review and bug detection

Refactoring: Refactoring suggestions

Testing: Test generation

Documentation: Documentation creation

Tools:

GitHub API: version control

Code Execution: sandboxed runtime

Linters: code quality

Test Frameworks: testing

AI-Agents Cheatsheet

Key Characteristics:

Agent Types:

Core Components:

Memory Types:

Vector DBs:

Benefits:

Key Features:

Use Cases:

Key Strengths:

Features:

Best For:

Features:

Best Practices:

Variants:

Implementation Tips:

Techniques:

Benefits:

Patterns:

Example Roles:

Models:

Best For:

Models:

Strengths:

Popular Models:

Deployment Options:

Providers:

Databases:

Popular Models:

Use Cases:

LLM Observability:

Key Metrics:

Frameworks:

Core Principles:

Strategies:

Techniques:

Model Recommendations:

Considerations:

Capabilities:

Tools Needed:

Capabilities:

Integrations:

Capabilities:

Architecture:

Capabilities:

Tools:

AI Agents Flashcards

AI Agents Quiz

Quiz Complete!

🎤 AI Interview

Content Completed!