Our inboxes contain dozens (if not hundreds) of newsletters we subscribed to during moments of curiosity, but we seldom read most of them. Manually unsubscribing is tedious: open each email, scroll to the bottom, click unsubscribe, confirm … repeat 50+ times.

This post covers a personal project developing an AI agent using the ReAct pattern to analyse newsletters I have subscribed to and recommend the ones to unsubscribe based on my reading behaviour.

TL;DR: This project demonstrates how to build production-ready AI agents where 80% of the work is systems engineering. Key lessons: ReAct patterns handle real-world variability better than linear approaches; function schemas are your API contract with the LLM; and robust error handling distinguishes prototypes from production systems.

The Problem & The Solution

My inbox had accumulated thousands of unread emails - a bulk of which were newsletters I subscribed to but never read. The manual solution (open → scroll → click unsubscribe → confirm × 98) would have consumed at least a few hours.

This is precisely the type of repetitive task that an AI agent should handle.

But here’s the interesting challenge: this isn’t just about automation. It’s about intelligent orchestration, i.e. analyzing my behaviour, making recommendations with uncertainty quantification, and respecting user agency in the final decision loop.

This blog post documents the experience of developing such an AI agent. It is written for engineers and technical leaders focused on building real-world AI systems, not demos.

80% Engineering, 20% AI

Developing the AI was the easy part

The majority of the development time went into:

  • OAuth token refresh
  • Error handling & recovery
  • Gmail API quirks
  • Adaptive rate limiting

AI/LLM Work (20% of development time):

  • Prompt engineering + getting the function calling working with OpenAI : ~ 1 hour
  • ReAct loop implementation : ~ 1 hour

This is not unique to my project.

💡 Production AI systems are fundamentally systems engineering problems.

The LLM handles intelligent orchestration; you handle everything else. If you are building production AI agents, expect to spend most of your time on:

  • Authentication flows and credential management
  • API quota management and rate limiting
  • Network error recovery and retry logic
  • State management and caching
  • Cloud deployment quirks
  • Observability and debugging

Let’s dig into how each of these manifested in this project.

Solution Design Goals

📌 The AI Agent should analyse and recommend, not just automate the execution

  1. Intelligence over automation: The AI agent should analyse and recommend, not blindly execute
  2. Composability: Each function should be independently testable and reusable
  3. Transparency: Users should understand why the agent makes specific recommendations
  4. Safety: No destructive actions without explicit user confirmation

Why ReAct?

Three common agent patterns illustrate the trade-offs involved:

1. Chain-of-Thought (CoT)

  • Pattern: Linear reasoning within a single model invocation (scan → analyse → respond)

  • Limitation: Reasoning is completed before any external actions are executed

  • Why: The model produces a full reasoning trace upfront, without observing actual tool outputs or system state, so if an external call fails or returns unexpected data, the reasoning must be restarted

  • Example:I will scan 90 days, analyse all results, then extract links” - drawback: the decision was made before discovering that the scan returned an unexpectedly large number of newsletters

  • Result: Effective for static problems, but fragile when interacting with real-world APIs or variable data

2. Prompt Chaining

  • Pattern: A sequence of prompts where each step consumes the output of the previous one
  • Limitation: Lacks a unified reasoning loop that continuously re-evaluates earlier decisions
  • Why: While the context can be passed forward and retries can be added, adaptation is handled by external orchestration code (e.g., retry logic, conditional branching) rather than by the model’s continuous reasoning; the model does not explicitly reason about why earlier choices were made or revise them based on new observations
  • Example: Prompt 1 scans → Prompt 2 analyses results → Prompt 3 extracts links - limitation: each step reasons locally rather than revisiting earlier decisions
  • Result: More flexible than single-pass CoT, but prone to losing reasoning continuity under dynamic conditions

3. ReAct (Reasoning + Acting)

  • Pattern: Iterative loop with shared context (Think → Act → Observe → Think …)
  • Advantage: The model reasons directly over tool outputs while maintaining awareness of goals and prior actions
  • Why: Each reasoning step incorporates original intent, action history, and observed results, allowing the agent to revise its strategy based on what actually happened
  • Example:I will scan 90 days” → [scan] → “Found 98 newsletters. Too many. I will analyse the top N by frequency first” → [analyse] → adapt based on observed data
  • Result: Handles unexpected states (rate limits, errors, data anomalies) through informed, observation-driven adaptation

Side-by-side,

Pattern Approach Takeaway Result
Chain-of-Thought (CoT) Linear reasoning within a single model invocation (scan → analyse → respond) Reasoning is completed before any external actions are executed Effective for static problems, but fragile when interacting with real-world APIs or variable data ❌
Prompt Chaining A sequence of prompts where each step consumes the output of the previous one Lacks a unified reasoning loop that continuously re-evaluates earlier decisions More flexible than single-pass CoT, but prone to losing reasoning continuity under dynamic conditions ❌
ReAct Iterative loop with shared context (Think → Act → Observe → Think …) The model reasons directly over tool outputs while maintaining awareness of goals and prior actions Handles unexpected states (rate limits, errors, data anomalies) through informed, observation-driven adaptation ✅

The Newsletter Decluttering AI Agent needs to deal with:

  • unpredictable rate limits,
  • intermittent network failures,
  • variable user data volumes

💡 A ReAct-based design allows the agent to observe real conditions as they occur, reason about their implications, and adjust its behavior dynamically rather than following a predetermined plan that assumes ideal execution.

For example:

  • Scenario: API rate limit triggered during newsletter scan
  • ReAct response: Agent observes 429 error, reasons “I’ll pause and batch remaining requests”, adapts its strategy mid-execution
  • Non-ReAct: A fixed execution path fails or relies on external retry logic or manual restart

This makes the system more robust and better aligned with production-grade agent design.

Note: In practice, these patterns are often combined. For example, CoT-style prompting can be used within a ReAct loop to improve reasoning quality. The key architectural distinction is whether the system supports observation-driven adaptation.

An example of ReAct adaptation:

User: "Clean up my newsletters"
AI: [Reasons] I should scan first
AI: [Acts] scan_newsletters(days_back=90)
API: [Returns] {98 newsletters found}
AI: [Observes] Found 98 → analyzing all at once might hit rate limits
AI: [Adapts] I'll analyse just the top N by frequency first
AI: [Acts] `analyze_engagement(newsletter_ids=[top 50 by frequency])`
API: [Returns] {engagement data}
AI: [Observes] 34 of these have <10% open rate
AI: [Acts] `extract_unsubscribe_links(sender_emails=[those 34])`
API: [Returns] {unsubscribe URLs}
AI: [Synthesizes] Here are my recommendations...

👉 Takeaway: The agent is not following a pre-determined script → it made a decision to limit the batch size after seeing the 98 newsletters. A linear pipeline cannot do this.

What is Function Calling?

Function calling enables LLMs to interact with external tools by generating structured JSON requests which are executed by the application code.

Instead of the LLM just producing text, it can output: {"function": "scan_newsletters", "args": {"days_back": 90}}, which the application executes and returns results to the LLM for further reasoning.

This transforms the LLM from a text generator into an orchestration engine that can:

  • Decide which tools to use based on context
  • Generate properly-formatted API calls
  • Interpret tool results and adapt next actions

Modern LLM APIs (OpenAI, Claude, Google Gemini, Amazon Bedrock, etc.) provide native function calling support through tool/function schemas that constrain the LLM’s outputs to valid tool invocations.

(links last accessed: 20 Dec 2025)

GenAI Agent Framework

The initial implementation of the Newsletter Decluttering AI Agent uses OpenAI’s native SDK for simplicity.

Future versions will migrate to Pydantic AI. Key benefits of using Pydantic AI include: type safety, multi-provider support, better validation.

How the Solution Works

┌────────────────────────────────────────────────────────────────┐
│                         USER REQUEST                           │
│           "Clean up my newsletter subscriptions"               │
└────────────────────────────┬───────────────────────────────────┘
                             ▼
┌────────────────────────────────────────────────────────────────┐
│                    REACT AGENT LOOP                            │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Initial Reasoning                                        │  │
│  │ AI: "I need to scan emails first to identify newsletters"│  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ ACTION: scan_newsletters(days_back=90)                   │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Gmail API                                                │  │
│  │ ├─ Fetch messages with List-Unsubscribe headers          │  │
│  │ ├─ Apply adaptive rate limiting between calls            │  │
│  │ └─ Retry on 429/500/503 errors                           │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ OBSERVATION: Found XX unique newsletters                 │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Adaptive Reasoning                                       │  │
│  │ AI: "XX is too many - I will analyse top YY by frequency │  │
│  │      to avoid rate limits"                               │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ ACTION: analyse_engagement(                              │  │
│  │   newsletter_ids=[top YY senders by frequency]           │  │
│  │ )                                                        │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Gmail API                                                │  │
│  │ ├─ Check read/unread status for each sender              │  │
│  │ ├─ Calculate open rates                                  │  │
│  │ └─ Adaptive rate limiting adjusts to API responses       │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ OBSERVATION: ZZ newsletters have <10% open rate          │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Final Action                                             │  │
│  │ AI: "These ZZ are low-engagement. Extract unsubscribe    │  │
│  │      links for user review"                              │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ ACTION: extract_unsubscribe_links(                       │  │
│  │   sender_emails=[those 34 low-engagement senders]        │  │
│  │ )                                                        │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Gmail API                                                │  │
│  │ ├─ Parse List-Unsubscribe headers                        │  │
│  │ ├─ Extract HTTP and mailto links                         │  │
│  │ └─ Handle missing headers gracefully                     │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ OBSERVATION: Extracted 34 unsubscribe URLs               │  │
│  └─────────────────────────┬────────────────────────────────┘  │
│                            ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ SYNTHESIS: Generate Final Recommendations                │  │
│  │ AI: "Here are 34 newsletters to consider unsubscribing   │  │
│  │      from, ranked by priority..."                        │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────┬───────────────────────────────────┘
                             ▼
┌────────────────────────────────────────────────────────────────┐
│                    RESULTS TO USER                             │
└────────────────────────────────────────────────────────────────┘
  ✅ XX newsletters identified
  ✅ YY analysed (top frequency)
  ⚠️ ZZ flagged as low-engagement (<10% open rate)
  🔗 ZZ unsubscribe links extracted
  💰 Cost: ~$0.01 (GPT-4o-mini)

Implementation Challenges

A deep dive into solutions to a few challenges, such as OAuth token expiry, API rate limits, flexible function schema not being very reliable, and semantics of Gmail labels.

OAuth Token Management

The Gmail API uses OAuth 2.0, with a subtle complication: access tokens expire after 60 minutes, but refresh tokens can last longer.

What seems like a simple authentication problem becomes a state machine with multiple failure modes. Here is what production systems must handle (whether learned through experience, mentorship, or reading RFC 6749):

Naïve approach (breaks after 60 minutes):

creds = get_credentials()
service = build('gmail', 'v1', credentials=creds)

Slightly better approach (will break after a few months):

# Handles token expiry but not refresh failures
if creds.expired and creds.refresh_token:
    creds.refresh(Request())

Needs to handle:

  • Network failures during refresh - Should retry, not fall back to re-auth
  • Expired refresh tokens - Need new OAuth flow
  • Revoked permissions - Need new OAuth flow
  • Corrupted token file - Need graceful degradation

Improved approach: Handling edge cases

# (Note that a few lines have been skipped for the sake of brevity)

class GmailAuthenticator:
    def authenticate(self) -> Credentials:
        # Load saved credentials
        if os.path.exists(self.token_file):
            try:
                # Read the previously saved token from a pickle file
                ...
            except (pickle.UnpicklingError, EOFError):
                # Corrupted file - start fresh (re-authenticate)
                creds = None
        
        # Handle invalid/expired credentials
        if not creds or not creds.valid:
            if creds and creds.expired and creds.refresh_token:
                # Try to refresh with retry logic
                refresh_succeeded = False
                
                for attempt in range(2):
                    try:
                        creds.refresh(Request())
                        refresh_succeeded = True
                        break
                    
                    except RefreshError:
                        # Refresh token expired/revoked - NOT retryable
                        # Need to start a new OAuth flow
                        creds = self._run_oauth_flow()
                        refresh_succeeded = True
                        break
                    
                    except Exception as e:
                        # Distinguish network vs other errors
                        error_msg = str(e).lower()
                        is_network = any(term in error_msg 
                            for term in [
                              "network",
                              "timeout",
                              "connection",
                              "unreachable"])
                        
                        if is_network and attempt == 0:
                            # Network error - retry once
                            time.sleep(2)
                            continue
                        else:
                            # Not network or retry failed - re-auth
                            creds = self._run_oauth_flow()
                            refresh_succeeded = True
                            break
                
                if not refresh_succeeded:
                    creds = self._run_oauth_flow()
            else:
                # No valid credentials - start OAuth
                creds = self._run_oauth_flow()
            
            # Save credentials (non-fatal if fails)
            self._save_credentials()
        
        return creds

Things To Take Note Of:

  • RefreshError vs network errors require different handling:

    • RefreshError = refresh token expired → re-auth required
    • Network errors = transient → retry with backoff
    • Other errors = unknown → log and re-auth to be safe
  • Saving credentials should be non-fatal:

    def _save_credentials(self):
        try:
            with open(self.token_file, 'wb') as token:
                pickle.dump(self.creds, token)
        except Exception as e:
            # Log but do not crash - OAuth succeeded
            # Credentials save failed - will re-auth next time
            ...
    
  • OAuth callback needs dynamic port allocation:

    # DO NOT hardcode port 8080 - might be in use
    flow.run_local_server(port=0)  # OS picks available port
    

Rate Limiting

Gmail API, like most APIs have quotas.

Most developers hit rate limits on their first try because the naïve approach does not account for quota mechanics.

Evolution of rate-limiting approaches (from code reviews and production incidents I have encountered):

The iterations (below) are representative of common patterns I have seen teams discover. The final implementation shows the production-grade approach, but understanding why simpler approaches fail helps teams avoid these pitfalls.

❌ First iteration - Naïve (causes a rate limit error within the first few minutes):

for msg_id in message_ids:
    msg = service.users().messages().get(userId="me", id=msg_id).execute()

Problem: Crashes after a few calls with HttpError 429: Rate Limit Exceeded.

⚠️ Second iteration - Fixed sleep (introduce a few seconds sleep between API calls):

# Better, but wastes time with conservative sleep
for msg_id in message_ids:
    time.sleep(0.1)  # 100ms
    msg = service.users().messages().get(userId='me', id=msg_id).execute()

Problem: Overly conservative. Wasting the quota, whilst prolonging execution time.

❌ Third iteration - Parallelisation without rate limiting (quick, but fail fast):

# Tries to be fast but crashes
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=10) as executor:
    futures = [executor.submit(fetch_message, msg_id) 
               for msg_id in message_ids]
    results = [f.result() for f in futures]

Problem: HttpError 429: Rate Limit Exceeded errors encountered much quicker, quota within the first few seconds.

✅ Fourth iteration - Adaptive rate-limited parallelisation:

from functools import wraps
import time

def rate_limited(min_interval: float = 0.02):
    def decorator(func):
        last_call = [0.0]  # Mutable to store state
        
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_call[0]
            if elapsed < min_interval:
                time.sleep(min_interval - elapsed)
            
            result = func(*args, **kwargs)
            last_call[0] = time.time()
            return result
        return wrapper
    return decorator

# Apply decorator to Gmail API calls
@rate_limited(min_interval=0.02)  # 20ms between calls
def fetch_message_metadata(service, msg_id):
    return service.users().messages().get(
        userId='me',
        id=msg_id,
        format='metadata'
    ).execute()

# Use with parallelisation (controlled)
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(fetch_message_metadata, service, msg_id)
               for msg_id in message_ids]
    results = [f.result() for f in futures]

Additional complexity: Retry logic for transient failures

Even with rate limiting, network errors happen:

def retry_with_backoff(func, max_attempts=3, *args, **kwargs):
    for attempt in range(max_attempts):
        try:
            return func(*args, **kwargs)
        except HttpError as e:
            if e.resp.status in [429, 500, 503]:  # Retryable
                if attempt < max_attempts - 1:
                    wait_time = 2 ** attempt  # 1s, 2s, 4s
                    logger.warning(
                        f"API error {e.resp.status}, "
                        f"retrying in {wait_time}s "
                        f"(attempt {attempt + 1}/{max_attempts})"
                    )
                    time.sleep(wait_time)
                else:
                    logger.error("Max retries reached")
                    raise
            elif e.resp.status == 404:
                # Message deleted - not an error
                return None
            else:
                # Non-retryable (401, 403, etc.)
                logger.error(f"Non-retryable error: {e}")
                raise

Takeaway point: Distinguish error types

  • 429 (rate limit) → Retry with exponential backoff
  • 500/503 (server error) → Retry with exponential backoff
  • 404 (not found) → Return None, do not retry
  • 401 (auth) → Do not retry, re-authenticate
  • 403 (forbidden) → Do not retry, check permissions

Side-by-side,

Approach Description Takeaway
Naive approach call API in a loop breaks within the first few minutes ❌
Fixed sleep introduce a sleep between API calls better, but wastes time with conservative sleep ⚠️
Parallelisation without rate limiting Tries to be fast by using parallelisation HttpError 429: Rate Limit Exceeded errors encountered much quicker ❌
Adaptive rate-limited parallelisation Parallelisation with adaptive rate limit helps to achieve balance between speed and errors ✅

Function Schema Design for AI Reliability

Initial tool definition (too flexible):

{
    "name": "analyze_newsletters",
    "description": "analyse newsletters",
    "input_schema": {
        "type": "object",
        "properties": {
            "options": {"type": "object"}
        }
    }
}

Problem: I once observed the LLM hallucinating properties inside options, leading to runtime errors.

Improved schema (constrained):

{
    "name": "analyze_engagement",
    "description": "Calculate open rates for specific newsletter senders. Returns engagement metrics including open_rate (percentage), total_received, read_count, and recommendation (keep/unsubscribe).",
    "parameters": {
        "type": "object",
        "properties": {
            "newsletter_ids": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of sender email addresses (e.g., ['[email protected]'])",
                "minItems": 1,
                "maxItems": 50  # Prevent overwhelming API
            },
            "threshold_days": {
                "type": "integer",
                "description": "Only analyse newsletters received within this many days",
                "default": 90,
                "minimum": 1,
                "maximum": 365
            }
        },
        "required": ["newsletter_ids"]
    }
}

Handling Gmail’s Label Semantics

Gmail does not have “read/unread” as a boolean. It uses labels:

  • UNREAD label present = unread
  • UNREAD label absent = read

But there is a subtlety: archived emails may not have INBOX label, yet still be unread.

Incorrect logic:

# This misses archived unread emails
is_read = 'INBOX' in labels and 'UNREAD' not in labels

Correct logic:

# Check only UNREAD label, independent of INBOX
is_read = 'UNREAD' not in msg_data.get('labelIds', [])

This matters for engagement analysis - we want to know if the email was opened , regardless of where it is filed.

AI Agent Implementation

scan_newsletters tool

Tool used by the agent to scan Gmail for newsletter subscriptions by detecting emails with List-Unsubscribe headers (RFC 2369).

// (Note that a few lines have been skipped for the sake of brevity)
{
    "name": "scan_newsletters",
    "input_schema": {
        "type": "object",
        "properties": {
            "days_back": {
                "type": "integer",
                "description": "Number of days to look back (1-365)",
                "default": 90,
                "minimum": 1,
                "maximum": 365
            }
        }
    }
}

🔗 GitHub: tool-definitions-openai.json (lines 4-19)

analyze_engagement tool

Tool used by the agent to calculate open rates for specific newsletter senders. Tool returns:

  • open_rate: Percentage of emails opened (0-100)
  • total_received: Number of emails from sender
  • read_count: Number of emails user opened
  • recommendation: keep if open_rate > 30%, else unsubscribe
// (Note that a few lines have been skipped for the sake of brevity)
{
    "name": "analyze_engagement",
    "input_schema": {
        "type": "object",
        "properties": {
            "newsletter_ids": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Sender email addresses from scan_newsletters",
                "minItems": 1,
                "maxItems": 50
            }
        },
        "required": ["newsletter_ids"]
    }
},

🔗 GitHub: tool-definitions-openai.json (lines 23-37)

Tool used by the agent to extract unsubscribe URLs from List-Unsubscribe headers. Tool returns: URLs and email addresses for unsubscribe actions.

// (Note that a few lines have been skipped for the sake of brevity)
{
    "name": "extract_unsubscribe_links",
    "input_schema": {
        "type": "object",
        "properties": {
            "sender_emails": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Newsletter senders to extract links for"
            }
        },
        "required": ["sender_emails"]
    }
}

🔗 GitHub: tool-definitions-openai.json (lines 38-55)

Agent Loop

Run the Newsletter Decluttering AI Agent with OpenAI GPT-4o-mini. Implements the ReAct pattern with function calling.

# (Note that a few lines have been skipped for the sake of brevity)
def run_newsletter_agent():
    # Initialize Gmail service
    ...

    # Initialize conversation
    prompt_data = load_prompts()["newsletter_declutter_agent"]
    ...

    # Agent loop
    iteration, max_iterations = 0, 10
    while iteration < max_iterations:
        iteration += 1

        try:
            # Call OpenAI with function calling enabled
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                tools=tools,
                tool_choice="auto",
                temperature=0.1
            )

            response_message = response.choices[0].message
            tool_calls = response_message.tool_calls
            finish_reason = response.choices[0].finish_reason

            # Add assistant's response to message history
            messages.append(response_message)

            # Check if the model wants to call functions
            if tool_calls:
                # Model requesting one or more tool calls
                for tool_call in tool_calls:

                    # Get the function to call
                    function_name = tool_call.function.name
                    function_args = json.loads(tool_call.function.arguments)
                    function_to_call = available_functions.get(function_name)

                    if function_to_call:
                        # Execute the function
                        function_response = function_to_call(service, **function_args)
                        # Log summary of results
                        ...
                        # Add function response to messages
                        ...

            elif finish_reason == "stop":
                # Model has finished - extract final response
                final_text = response_message.content
                # Log the analysis (final_text), then exit
                ...

            else:
                # Unexpected finish reason
                ...

        except Exception as e:
            ...

    return messages

🔗 GitHub: newsletter_declutter_openai.py (run_newsletter_agent() function)

👉 Takeaway: The loop continues until the AI agent stops requesting tools. This allows multi-step reasoning.

Takeaway Points

Systems Matter More Than Just The AI

Production AI systems are “systems”. The AI is one component. Master the plumbing.

As discussed earlier, the majority of the development effort went into systems engineering, with only ~20% spent on AI prompt and function schema design. This ratio is typical of production AI systems, not an anomaly.

Function Schemas Are Your Contract

Time spent refining schemas saves a LOT of time debugging agent behaviour.

The clearer your function schemas, the more reliable your agent. Treat them like API contracts:

  • Explicit types
  • Examples in descriptions
  • Min/max constraints
  • Default values

What This Taught Me About Leading AI Teams

  • Do not assume failures, expect them: Resilience is not an enhancement; it is a baseline requirement.

  • Model choice is rarely the bottleneck: The difference between a demo and a deployed system is almost always engineering rigour, not access to a newer model.

  • Schemas, contracts, and guardrails scale better than heroics: Clear function schemas and safety boundaries reduce cognitive load, improve debuggability, and allow teams to move faster with confidence.

  • User trust is a product requirement: Intelligent recommendation with explicit confirmation loops is essential for adoption.

Future Enhancements

  • Auto-unsubscribe: Use Playwright to click unsubscribe links
  • Other providers & models: Use Pydantic AI’s OpenAIChatModel to have the flexibility of being able to use other language models, including SLMs running locally using Ollama
  • Automated note taking: Have an agent read the newsletter, follow through linked material, automatically generate notes, verify, then add to knowledge base

Conclusion

  • In almost all systems, the AI is a powerful component, but it’s still just a component
  • Effective systems engineering is critical to a successful (i.e. well adopted) application
  • Full implementation: 🔗 GitHub: newsletter-declutter-agent (code, setup, results)