<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>LangChain GPT-5 Integration on RockB</title><link>https://baeseokjae.github.io/tags/langchain-gpt-5-integration/</link><description>Recent content in LangChain GPT-5 Integration on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 10 Apr 2026 04:40:00 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/langchain-gpt-5-integration/index.xml" rel="self" type="application/rss+xml"/><item><title>How to Build an AI-Powered Chatbot with GPT-5 and RAG in 2026</title><link>https://baeseokjae.github.io/posts/ai-powered-chatbot-gpt5-rag-2026/</link><pubDate>Fri, 10 Apr 2026 04:40:00 +0000</pubDate><guid>https://baeseokjae.github.io/posts/ai-powered-chatbot-gpt5-rag-2026/</guid><description>Learn how to build an AI-powered chatbot using GPT-5 and RAG in 2026 — with step-by-step code, vector databases, LangChain integration, and deployment options.</description><content:encoded><![CDATA[<p>Building an AI-powered chatbot with GPT-5 and RAG (Retrieval-Augmented Generation) in 2026 means combining one of the most capable language models available with a retrieval pipeline that pulls real-time, domain-specific knowledge — dramatically reducing hallucinations and making your chatbot genuinely useful in production. This guide walks you through the full process, from architecture to deployment.</p>
<h2 id="why-build-an-ai-chatbot-with-gpt-5-and-rag-in-2026">Why Build an AI Chatbot with GPT-5 and RAG in 2026?</h2>
<p>The chatbot landscape has fundamentally changed in 2026. Basic keyword matching and scripted flows are no longer competitive. According to a Gartner prediction cited by Botpress, by 2027 chatbots will become the primary customer service channel for roughly 25% of organizations. What drives that shift is the combination of powerful LLMs and retrieval architectures that make responses accurate, grounded, and explainable.</p>
<p>GPT-5 alone is impressive — but without grounding in your specific knowledge base, it hallucinates, gives outdated answers, and cannot reference proprietary data. RAG solves this: it retrieves relevant documents at query time and feeds them into GPT-5&rsquo;s context window before generating a response. The result is a chatbot that actually knows your business.</p>
<p>A 2025 study by Pinecone found that RAG reduces hallucination rates by 40–60% compared to standalone LLMs in enterprise chatbot deployments. That number alone justifies the architecture — particularly for customer-facing applications where accuracy matters.</p>
<h2 id="whats-new-in-gpt-5-that-makes-chatbots-better">What&rsquo;s New in GPT-5 That Makes Chatbots Better?</h2>
<p>GPT-5, released on OpenAI&rsquo;s 2026 roadmap, brings several capabilities that directly improve chatbot quality:</p>
<ul>
<li><strong>1 million token context window</strong> — allows ingestion of entire policy documents, codebases, or conversation histories in a single call</li>
<li><strong>Native multimodal reasoning</strong> — handles images, audio, and structured data alongside text, enabling richer user interactions</li>
<li><strong>Improved tool-calling</strong> — more reliable function execution, crucial for agentic chatbots that need to query APIs or databases</li>
<li><strong>Lower latency at scale</strong> — faster inference makes real-time conversational UX viable at production traffic</li>
</ul>
<p>These improvements reduce the amount of engineering required to build reliable chatbots and make the RAG pipeline more efficient — the larger context window means fewer chunking trade-offs.</p>
<h2 id="understanding-the-rag-architecture">Understanding the RAG Architecture</h2>
<h3 id="what-is-retrieval-augmented-generation">What Is Retrieval-Augmented Generation?</h3>
<p>RAG is a two-stage architecture:</p>
<ol>
<li><strong>Retrieval</strong> — at query time, the user&rsquo;s message is converted to a vector embedding and used to search a vector database for semantically similar documents</li>
<li><strong>Generation</strong> — the retrieved documents are injected as context into the LLM prompt, which then generates a response grounded in that knowledge</li>
</ol>
<p>This approach keeps the LLM&rsquo;s weights frozen. You don&rsquo;t need to fine-tune GPT-5 every time your knowledge base changes — you just update the vector index.</p>
<h3 id="rag-vs-fine-tuning-vs-plain-prompting">RAG vs. Fine-Tuning vs. Plain Prompting</h3>
<table>
  <thead>
      <tr>
          <th>Approach</th>
          <th>Best For</th>
          <th>Cost</th>
          <th>Freshness</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Plain prompting</td>
          <td>Simple Q&amp;A with static knowledge</td>
          <td>Low</td>
          <td>Static</td>
      </tr>
      <tr>
          <td>Fine-tuning</td>
          <td>Domain-specific tone and format</td>
          <td>High</td>
          <td>Requires retraining</td>
      </tr>
      <tr>
          <td>RAG</td>
          <td>Dynamic knowledge base, accuracy-critical</td>
          <td>Medium</td>
          <td>Real-time updates</td>
      </tr>
      <tr>
          <td>RAG + Fine-tuning</td>
          <td>Enterprise with strict style requirements</td>
          <td>High</td>
          <td>Real-time</td>
      </tr>
  </tbody>
</table>
<p>For most 2026 chatbot use cases, RAG without fine-tuning is the right default.</p>
<h2 id="prerequisites-and-tools">Prerequisites and Tools</h2>
<p>Before building, you need to pick your stack. Here are the main decisions:</p>
<h3 id="gpt-5-api-access">GPT-5 API Access</h3>
<p>OpenAI&rsquo;s GPT-5 is accessed via the standard Chat Completions API. If you&rsquo;re cost-sensitive or need self-hosting, alternatives include:</p>
<ul>
<li><strong>Claude 4 (Anthropic)</strong> — strong reasoning, 200K context</li>
<li><strong>Gemini 2.0 Ultra (Google)</strong> — multimodal, competitive pricing</li>
<li><strong>Mistral Large 3</strong> — open-weights, self-hostable</li>
<li><strong>LLaMA 4 (Meta)</strong> — fully open-source, zero API cost if self-hosted</li>
</ul>
<p>For this tutorial we use GPT-5 via OpenAI API, but the architecture works with any provider.</p>
<h3 id="vector-database-comparison">Vector Database Comparison</h3>
<table>
  <thead>
      <tr>
          <th>Database</th>
          <th>Type</th>
          <th>Best For</th>
          <th>Pricing</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Pinecone</td>
          <td>Managed cloud</td>
          <td>Production, scalability, low latency</td>
          <td>From ~$70/month</td>
      </tr>
      <tr>
          <td>Weaviate</td>
          <td>Self-hosted or cloud</td>
          <td>Hybrid search, graph retrieval</td>
          <td>Open source / cloud</td>
      </tr>
      <tr>
          <td>FAISS</td>
          <td>Local library</td>
          <td>Research, prototyping</td>
          <td>Free</td>
      </tr>
      <tr>
          <td>Chroma</td>
          <td>Local or self-hosted</td>
          <td>Fast local development</td>
          <td>Free</td>
      </tr>
      <tr>
          <td>Qdrant</td>
          <td>Self-hosted or cloud</td>
          <td>High-performance production</td>
          <td>Open source / cloud</td>
      </tr>
  </tbody>
</table>
<p>The vector database market is expected to reach $4.2 billion by 2026, driven largely by RAG adoption (MarketsandMarkets 2025). For production, Pinecone or Weaviate are the default choices. For local development, FAISS or Chroma are faster to set up.</p>
<h3 id="development-framework-comparison">Development Framework Comparison</h3>
<table>
  <thead>
      <tr>
          <th>Framework</th>
          <th>Interface</th>
          <th>Best For</th>
          <th>Pricing</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>LangChain</td>
          <td>Python / JavaScript</td>
          <td>Complex agentic workflows, 500+ integrations</td>
          <td>Open source</td>
      </tr>
      <tr>
          <td>LlamaIndex</td>
          <td>Python</td>
          <td>Data-centric RAG, heavy retrieval needs</td>
          <td>Open source</td>
      </tr>
      <tr>
          <td>Haystack</td>
          <td>Python</td>
          <td>Enterprise document pipelines</td>
          <td>Open source</td>
      </tr>
  </tbody>
</table>
<p>LangChain grew to over 80,000 GitHub stars and 500+ integrations by early 2026 (GitHub analytics), making it the most widely adopted option. LlamaIndex has a narrower focus but more sophisticated indexing for document-heavy applications.</p>
<h2 id="step-by-step-tutorial-building-your-gpt-5-rag-chatbot">Step-by-Step Tutorial: Building Your GPT-5 RAG Chatbot</h2>
<p>This tutorial builds a customer support chatbot that answers questions from a product documentation knowledge base.</p>
<h3 id="step-1-define-your-use-case-and-scope">Step 1: Define Your Use Case and Scope</h3>
<p>Before writing code, answer these questions:</p>
<ul>
<li><strong>What domain?</strong> Customer support, internal knowledge base, code assistance, sales?</li>
<li><strong>What data?</strong> PDFs, web pages, databases, APIs, structured tables?</li>
<li><strong>Who uses it?</strong> Public users, internal teams, developers?</li>
<li><strong>What&rsquo;s the latency tolerance?</strong> Real-time (&lt;500ms) or async?</li>
</ul>
<p>For this tutorial: a B2B SaaS company&rsquo;s support bot ingesting product documentation and FAQs.</p>
<h3 id="step-2-set-up-your-development-environment">Step 2: Set Up Your Development Environment</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Create a virtual environment</span>
</span></span><span style="display:flex;"><span>python -m venv chatbot-env
</span></span><span style="display:flex;"><span>source chatbot-env/bin/activate  <span style="color:#75715e"># Windows: chatbot-env\Scripts\activate</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Install dependencies</span>
</span></span><span style="display:flex;"><span>pip install langchain langchain-openai langchain-pinecone pinecone-client <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    python-dotenv tiktoken pypdf streamlit
</span></span></code></pre></div><p>Create a <code>.env</code> file:</p>



<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 344 73"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='0' y='36' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='0' y='52' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='8' y='36' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='8' y='52' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='16' y='20' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='16' y='36' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='16' y='52' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='24' y='20' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='24' y='36' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='24' y='52' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='32' y='4' fill='currentColor' style='font-size:1em'>A</text>
<text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='32' y='36' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='32' y='52' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='40' y='4' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='40' y='20' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='40' y='36' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='40' y='52' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='48' y='4' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='48' y='20' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='48' y='36' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='48' y='52' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='56' y='4' fill='currentColor' style='font-size:1em'>A</text>
<text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='56' y='36' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='56' y='52' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='64' y='36' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='64' y='52' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='72' y='4' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>A</text>
<text text-anchor='middle' x='72' y='36' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='72' y='52' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='80' y='20' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='80' y='36' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='80' y='52' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>K</text>
<text text-anchor='middle' x='88' y='20' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='88' y='36' fill='currentColor' style='font-size:1em'>V</text>
<text text-anchor='middle' x='88' y='52' fill='currentColor' style='font-size:1em'>D</text>
<text text-anchor='middle' x='96' y='4' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='96' y='20' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='96' y='36' fill='currentColor' style='font-size:1em'>I</text>
<text text-anchor='middle' x='96' y='52' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>Y</text>
<text text-anchor='middle' x='104' y='20' fill='currentColor' style='font-size:1em'>K</text>
<text text-anchor='middle' x='104' y='36' fill='currentColor' style='font-size:1em'>R</text>
<text text-anchor='middle' x='104' y='52' fill='currentColor' style='font-size:1em'>X</text>
<text text-anchor='middle' x='112' y='4' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='112' y='20' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='112' y='36' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='112' y='52' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='120' y='4' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='120' y='20' fill='currentColor' style='font-size:1em'>Y</text>
<text text-anchor='middle' x='120' y='36' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='120' y='52' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='128' y='4' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='128' y='20' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='128' y='36' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='128' y='52' fill='currentColor' style='font-size:1em'>A</text>
<text text-anchor='middle' x='136' y='4' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='136' y='20' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='136' y='36' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='136' y='52' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='144' y='4' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='144' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='144' y='36' fill='currentColor' style='font-size:1em'>N</text>
<text text-anchor='middle' x='144' y='52' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='152' y='4' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='152' y='20' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='152' y='36' fill='currentColor' style='font-size:1em'>T</text>
<text text-anchor='middle' x='152' y='52' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='160' y='4' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='160' y='20' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='160' y='36' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='160' y='52' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='168' y='4' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='168' y='20' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='168' y='36' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='168' y='52' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='176' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='176' y='20' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='176' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='176' y='52' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='184' y='4' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='184' y='20' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='184' y='36' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='184' y='52' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='192' y='4' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='192' y='20' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='192' y='36' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='192' y='52' fill='currentColor' style='font-size:1em'>b</text>
<text text-anchor='middle' x='200' y='4' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='200' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='200' y='36' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='200' y='52' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='208' y='4' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='208' y='20' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='208' y='36' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='208' y='52' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='216' y='4' fill='currentColor' style='font-size:1em'>k</text>
<text text-anchor='middle' x='216' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='216' y='36' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='216' y='52' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='224' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='224' y='20' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='224' y='36' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='224' y='52' fill='currentColor' style='font-size:1em'>k</text>
<text text-anchor='middle' x='232' y='4' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='232' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='232' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='232' y='52' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='240' y='20' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='240' y='36' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='240' y='52' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='248' y='20' fill='currentColor' style='font-size:1em'>k</text>
<text text-anchor='middle' x='248' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='248' y='52' fill='currentColor' style='font-size:1em'>w</text>
<text text-anchor='middle' x='256' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='256' y='36' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='256' y='52' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='264' y='20' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='264' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='264' y='52' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='272' y='36' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='272' y='52' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='280' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='280' y='52' fill='currentColor' style='font-size:1em'>g</text>
<text text-anchor='middle' x='288' y='36' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='288' y='52' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='296' y='36' fill='currentColor' style='font-size:1em'>v</text>
<text text-anchor='middle' x='296' y='52' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='304' y='52' fill='currentColor' style='font-size:1em'>b</text>
<text text-anchor='middle' x='312' y='52' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='320' y='52' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='328' y='52' fill='currentColor' style='font-size:1em'>e</text>
</g>

    </svg>
  
</div>
<h3 id="step-3-load-and-chunk-your-knowledge-base">Step 3: Load and Chunk Your Knowledge Base</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain_community.document_loaders <span style="color:#f92672">import</span> PyPDFDirectoryLoader, WebBaseLoader
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.text_splitter <span style="color:#f92672">import</span> RecursiveCharacterTextSplitter
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Load documents</span>
</span></span><span style="display:flex;"><span>loader <span style="color:#f92672">=</span> PyPDFDirectoryLoader(<span style="color:#e6db74">&#34;./docs/&#34;</span>)
</span></span><span style="display:flex;"><span>raw_docs <span style="color:#f92672">=</span> loader<span style="color:#f92672">.</span>load()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Chunk into smaller segments for retrieval</span>
</span></span><span style="display:flex;"><span>text_splitter <span style="color:#f92672">=</span> RecursiveCharacterTextSplitter(
</span></span><span style="display:flex;"><span>    chunk_size<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>,
</span></span><span style="display:flex;"><span>    chunk_overlap<span style="color:#f92672">=</span><span style="color:#ae81ff">200</span>,
</span></span><span style="display:flex;"><span>    separators<span style="color:#f92672">=</span>[<span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">&#34;</span>, <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">&#34;</span>, <span style="color:#e6db74">&#34; &#34;</span>, <span style="color:#e6db74">&#34;&#34;</span>]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>chunks <span style="color:#f92672">=</span> text_splitter<span style="color:#f92672">.</span>split_documents(raw_docs)
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Created </span><span style="color:#e6db74">{</span>len(chunks)<span style="color:#e6db74">}</span><span style="color:#e6db74"> chunks from </span><span style="color:#e6db74">{</span>len(raw_docs)<span style="color:#e6db74">}</span><span style="color:#e6db74"> documents&#34;</span>)
</span></span></code></pre></div><p><strong>Chunking strategy matters.</strong> Too small: retrieval misses context. Too large: eats your context window and increases cost. 800–1200 tokens per chunk is a reliable starting point for most documentation.</p>
<h3 id="step-4-build-and-populate-the-vector-index">Step 4: Build and Populate the Vector Index</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain_openai <span style="color:#f92672">import</span> OpenAIEmbeddings
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain_pinecone <span style="color:#f92672">import</span> PineconeVectorStore
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> pinecone <span style="color:#f92672">import</span> Pinecone, ServerlessSpec
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Initialize Pinecone</span>
</span></span><span style="display:flex;"><span>pc <span style="color:#f92672">=</span> Pinecone(api_key<span style="color:#f92672">=</span>os<span style="color:#f92672">.</span>getenv(<span style="color:#e6db74">&#34;PINECONE_API_KEY&#34;</span>))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Create index if it doesn&#39;t exist</span>
</span></span><span style="display:flex;"><span>index_name <span style="color:#f92672">=</span> os<span style="color:#f92672">.</span>getenv(<span style="color:#e6db74">&#34;PINECONE_INDEX_NAME&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> index_name <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> pc<span style="color:#f92672">.</span>list_indexes()<span style="color:#f92672">.</span>names():
</span></span><span style="display:flex;"><span>    pc<span style="color:#f92672">.</span>create_index(
</span></span><span style="display:flex;"><span>        name<span style="color:#f92672">=</span>index_name,
</span></span><span style="display:flex;"><span>        dimension<span style="color:#f92672">=</span><span style="color:#ae81ff">1536</span>,  <span style="color:#75715e"># text-embedding-3-small dimension</span>
</span></span><span style="display:flex;"><span>        metric<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;cosine&#34;</span>,
</span></span><span style="display:flex;"><span>        spec<span style="color:#f92672">=</span>ServerlessSpec(cloud<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;aws&#34;</span>, region<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;us-east-1&#34;</span>)
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Create embeddings and upload to Pinecone</span>
</span></span><span style="display:flex;"><span>embeddings <span style="color:#f92672">=</span> OpenAIEmbeddings(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;text-embedding-3-small&#34;</span>)
</span></span><span style="display:flex;"><span>vectorstore <span style="color:#f92672">=</span> PineconeVectorStore<span style="color:#f92672">.</span>from_documents(
</span></span><span style="display:flex;"><span>    documents<span style="color:#f92672">=</span>chunks,
</span></span><span style="display:flex;"><span>    embedding<span style="color:#f92672">=</span>embeddings,
</span></span><span style="display:flex;"><span>    index_name<span style="color:#f92672">=</span>index_name
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">&#34;Knowledge base indexed successfully.&#34;</span>)
</span></span></code></pre></div><p>You only run this indexing step once (or when your documents change). The vector store persists in Pinecone.</p>
<h3 id="step-5-implement-the-rag-retrieval-chain">Step 5: Implement the RAG Retrieval Chain</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain_openai <span style="color:#f92672">import</span> ChatOpenAI
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.chains <span style="color:#f92672">import</span> ConversationalRetrievalChain
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.memory <span style="color:#f92672">import</span> ConversationBufferWindowMemory
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.prompts <span style="color:#f92672">import</span> PromptTemplate
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Initialize GPT-5</span>
</span></span><span style="display:flex;"><span>llm <span style="color:#f92672">=</span> ChatOpenAI(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-5&#34;</span>,
</span></span><span style="display:flex;"><span>    temperature<span style="color:#f92672">=</span><span style="color:#ae81ff">0.1</span>,  <span style="color:#75715e"># Low temperature for factual accuracy</span>
</span></span><span style="display:flex;"><span>    streaming<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>,
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Load existing vectorstore (no need to re-index)</span>
</span></span><span style="display:flex;"><span>vectorstore <span style="color:#f92672">=</span> PineconeVectorStore(
</span></span><span style="display:flex;"><span>    index_name<span style="color:#f92672">=</span>os<span style="color:#f92672">.</span>getenv(<span style="color:#e6db74">&#34;PINECONE_INDEX_NAME&#34;</span>),
</span></span><span style="display:flex;"><span>    embedding<span style="color:#f92672">=</span>OpenAIEmbeddings(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;text-embedding-3-small&#34;</span>)
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Configure retriever</span>
</span></span><span style="display:flex;"><span>retriever <span style="color:#f92672">=</span> vectorstore<span style="color:#f92672">.</span>as_retriever(
</span></span><span style="display:flex;"><span>    search_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;similarity&#34;</span>,
</span></span><span style="display:flex;"><span>    search_kwargs<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;k&#34;</span>: <span style="color:#ae81ff">5</span>}  <span style="color:#75715e"># Retrieve top 5 relevant chunks</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Conversation memory (last 10 turns)</span>
</span></span><span style="display:flex;"><span>memory <span style="color:#f92672">=</span> ConversationBufferWindowMemory(
</span></span><span style="display:flex;"><span>    memory_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;chat_history&#34;</span>,
</span></span><span style="display:flex;"><span>    return_messages<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>,
</span></span><span style="display:flex;"><span>    output_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;answer&#34;</span>,
</span></span><span style="display:flex;"><span>    k<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Custom system prompt</span>
</span></span><span style="display:flex;"><span>custom_prompt <span style="color:#f92672">=</span> PromptTemplate(
</span></span><span style="display:flex;"><span>    input_variables<span style="color:#f92672">=</span>[<span style="color:#e6db74">&#34;context&#34;</span>, <span style="color:#e6db74">&#34;question&#34;</span>, <span style="color:#e6db74">&#34;chat_history&#34;</span>],
</span></span><span style="display:flex;"><span>    template<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;&#34;You are a helpful customer support assistant for our SaaS product.
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Answer questions using only the provided context. If you cannot find the answer
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">in the context, say so clearly — do not make up information.
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Context: </span><span style="color:#e6db74">{context}</span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Chat History: </span><span style="color:#e6db74">{chat_history}</span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Question: </span><span style="color:#e6db74">{question}</span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Answer:&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Build the chain</span>
</span></span><span style="display:flex;"><span>rag_chain <span style="color:#f92672">=</span> ConversationalRetrievalChain<span style="color:#f92672">.</span>from_llm(
</span></span><span style="display:flex;"><span>    llm<span style="color:#f92672">=</span>llm,
</span></span><span style="display:flex;"><span>    retriever<span style="color:#f92672">=</span>retriever,
</span></span><span style="display:flex;"><span>    memory<span style="color:#f92672">=</span>memory,
</span></span><span style="display:flex;"><span>    combine_docs_chain_kwargs<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;prompt&#34;</span>: custom_prompt},
</span></span><span style="display:flex;"><span>    return_source_documents<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>,
</span></span><span style="display:flex;"><span>    verbose<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h3 id="step-6-add-conversation-memory-and-context-management">Step 6: Add Conversation Memory and Context Management</h3>
<p>GPT-5&rsquo;s 1M token context window lets you keep much longer conversation histories than GPT-4 — but you still need to manage memory deliberately to control costs.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.memory <span style="color:#f92672">import</span> ConversationSummaryBufferMemory
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># For long conversations: summarize older turns, keep recent ones verbatim</span>
</span></span><span style="display:flex;"><span>summary_memory <span style="color:#f92672">=</span> ConversationSummaryBufferMemory(
</span></span><span style="display:flex;"><span>    llm<span style="color:#f92672">=</span>llm,
</span></span><span style="display:flex;"><span>    max_token_limit<span style="color:#f92672">=</span><span style="color:#ae81ff">4000</span>,  <span style="color:#75715e"># Keep last 4K tokens verbatim, summarize the rest</span>
</span></span><span style="display:flex;"><span>    memory_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;chat_history&#34;</span>,
</span></span><span style="display:flex;"><span>    return_messages<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>For multi-session persistence, store conversation history in a database (Redis, PostgreSQL) and reload it per user session.</p>
<h3 id="step-7-build-the-api-and-ui-layer">Step 7: Build the API and UI Layer</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># app.py — Streamlit interface</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> streamlit <span style="color:#66d9ef">as</span> st
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> dotenv <span style="color:#f92672">import</span> load_dotenv
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> os
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>load_dotenv()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>st<span style="color:#f92672">.</span>set_page_config(page_title<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;Support Bot&#34;</span>, page_icon<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;🤖&#34;</span>, layout<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;centered&#34;</span>)
</span></span><span style="display:flex;"><span>st<span style="color:#f92672">.</span>title(<span style="color:#e6db74">&#34;Product Support Assistant&#34;</span>)
</span></span><span style="display:flex;"><span>st<span style="color:#f92672">.</span>caption(<span style="color:#e6db74">&#34;Powered by GPT-5 + RAG&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Initialize chat history</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;messages&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> st<span style="color:#f92672">.</span>session_state:
</span></span><span style="display:flex;"><span>    st<span style="color:#f92672">.</span>session_state<span style="color:#f92672">.</span>messages <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;chain&#34;</span> <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> st<span style="color:#f92672">.</span>session_state:
</span></span><span style="display:flex;"><span>    st<span style="color:#f92672">.</span>session_state<span style="color:#f92672">.</span>chain <span style="color:#f92672">=</span> rag_chain  <span style="color:#75715e"># from previous setup</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Display chat history</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">for</span> message <span style="color:#f92672">in</span> st<span style="color:#f92672">.</span>session_state<span style="color:#f92672">.</span>messages:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> st<span style="color:#f92672">.</span>chat_message(message[<span style="color:#e6db74">&#34;role&#34;</span>]):
</span></span><span style="display:flex;"><span>        st<span style="color:#f92672">.</span>markdown(message[<span style="color:#e6db74">&#34;content&#34;</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Chat input</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> prompt <span style="color:#f92672">:=</span> st<span style="color:#f92672">.</span>chat_input(<span style="color:#e6db74">&#34;Ask a question about our product...&#34;</span>):
</span></span><span style="display:flex;"><span>    st<span style="color:#f92672">.</span>session_state<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>append({<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: prompt})
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> st<span style="color:#f92672">.</span>chat_message(<span style="color:#e6db74">&#34;user&#34;</span>):
</span></span><span style="display:flex;"><span>        st<span style="color:#f92672">.</span>markdown(prompt)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> st<span style="color:#f92672">.</span>chat_message(<span style="color:#e6db74">&#34;assistant&#34;</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">with</span> st<span style="color:#f92672">.</span>spinner(<span style="color:#e6db74">&#34;Searching knowledge base...&#34;</span>):
</span></span><span style="display:flex;"><span>            response <span style="color:#f92672">=</span> st<span style="color:#f92672">.</span>session_state<span style="color:#f92672">.</span>chain({<span style="color:#e6db74">&#34;question&#34;</span>: prompt})
</span></span><span style="display:flex;"><span>            answer <span style="color:#f92672">=</span> response[<span style="color:#e6db74">&#34;answer&#34;</span>]
</span></span><span style="display:flex;"><span>            sources <span style="color:#f92672">=</span> response<span style="color:#f92672">.</span>get(<span style="color:#e6db74">&#34;source_documents&#34;</span>, [])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        st<span style="color:#f92672">.</span>markdown(answer)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Show sources (optional, builds user trust)</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> sources:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">with</span> st<span style="color:#f92672">.</span>expander(<span style="color:#e6db74">&#34;Sources&#34;</span>):
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">for</span> doc <span style="color:#f92672">in</span> sources[:<span style="color:#ae81ff">3</span>]:
</span></span><span style="display:flex;"><span>                    st<span style="color:#f92672">.</span>caption(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;📄 </span><span style="color:#e6db74">{</span>doc<span style="color:#f92672">.</span>metadata<span style="color:#f92672">.</span>get(<span style="color:#e6db74">&#39;source&#39;</span>, <span style="color:#e6db74">&#39;Unknown&#39;</span>)<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    st<span style="color:#f92672">.</span>session_state<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>append({<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: answer})
</span></span></code></pre></div><p>Run it locally:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>streamlit run app.py
</span></span></code></pre></div><h3 id="step-8-test-and-evaluate">Step 8: Test and Evaluate</h3>
<p>Before deploying, systematically test:</p>
<ul>
<li><strong>Retrieval quality</strong> — are the right chunks being retrieved for representative questions?</li>
<li><strong>Answer accuracy</strong> — compare responses to known ground truth</li>
<li><strong>Edge cases</strong> — out-of-scope questions, adversarial prompts, language variations</li>
<li><strong>Latency</strong> — measure p50 and p95 response times under simulated load</li>
</ul>
<p>A useful evaluation framework:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Simple evaluation script</span>
</span></span><span style="display:flex;"><span>test_cases <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    {<span style="color:#e6db74">&#34;question&#34;</span>: <span style="color:#e6db74">&#34;How do I reset my password?&#34;</span>, <span style="color:#e6db74">&#34;expected_topic&#34;</span>: <span style="color:#e6db74">&#34;authentication&#34;</span>},
</span></span><span style="display:flex;"><span>    {<span style="color:#e6db74">&#34;question&#34;</span>: <span style="color:#e6db74">&#34;What&#39;s your refund policy?&#34;</span>, <span style="color:#e6db74">&#34;expected_topic&#34;</span>: <span style="color:#e6db74">&#34;billing&#34;</span>},
</span></span><span style="display:flex;"><span>    {<span style="color:#e6db74">&#34;question&#34;</span>: <span style="color:#e6db74">&#34;How do I integrate with Slack?&#34;</span>, <span style="color:#e6db74">&#34;expected_topic&#34;</span>: <span style="color:#e6db74">&#34;integrations&#34;</span>},
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">for</span> <span style="color:#66d9ef">case</span> <span style="color:#f92672">in</span> test_cases:
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> rag_chain({<span style="color:#e6db74">&#34;question&#34;</span>: <span style="color:#66d9ef">case</span>[<span style="color:#e6db74">&#34;question&#34;</span>]})
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Q: </span><span style="color:#e6db74">{</span>case[<span style="color:#e6db74">&#39;question&#39;</span>]<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;A: </span><span style="color:#e6db74">{</span>response[<span style="color:#e6db74">&#39;answer&#39;</span>][:<span style="color:#ae81ff">200</span>]<span style="color:#e6db74">}</span><span style="color:#e6db74">...&#34;</span>)
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Sources: </span><span style="color:#e6db74">{</span>[d<span style="color:#f92672">.</span>metadata<span style="color:#f92672">.</span>get(<span style="color:#e6db74">&#39;source&#39;</span>) <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> response[<span style="color:#e6db74">&#39;source_documents&#39;</span>]]<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">&#34;---&#34;</span>)
</span></span></code></pre></div><h2 id="how-do-you-deploy-your-chatbot-to-production">How Do You Deploy Your Chatbot to Production?</h2>
<h3 id="cloud-deployment-options">Cloud Deployment Options</h3>
<table>
  <thead>
      <tr>
          <th>Platform</th>
          <th>Use Case</th>
          <th>Pros</th>
          <th>Cons</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Vercel</td>
          <td>Frontend + serverless functions</td>
          <td>Fast deploys, free tier</td>
          <td>Limited runtime for heavy tasks</td>
      </tr>
      <tr>
          <td>AWS Lambda</td>
          <td>Serverless API</td>
          <td>Scales to zero, pay-per-use</td>
          <td>Cold starts, 15min timeout</td>
      </tr>
      <tr>
          <td>Google Cloud Run</td>
          <td>Containerized apps</td>
          <td>Auto-scaling, generous free tier</td>
          <td>More setup required</td>
      </tr>
      <tr>
          <td>Fly.io</td>
          <td>Always-on containers</td>
          <td>Low latency, global edge</td>
          <td>Paid from launch</td>
      </tr>
      <tr>
          <td>Railway</td>
          <td>Full-stack apps</td>
          <td>Simple deploys, PostgreSQL included</td>
          <td>Limited scale</td>
      </tr>
  </tbody>
</table>
<h3 id="docker-containerization">Docker Containerization</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#75715e"># Dockerfile</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> python:3.11-slim</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">WORKDIR</span><span style="color:#e6db74"> /app</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> requirements.txt .<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> pip install --no-cache-dir -r requirements.txt<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> . .<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">EXPOSE</span><span style="color:#e6db74"> 8501</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">CMD</span> [<span style="color:#e6db74">&#34;streamlit&#34;</span>, <span style="color:#e6db74">&#34;run&#34;</span>, <span style="color:#e6db74">&#34;app.py&#34;</span>, <span style="color:#e6db74">&#34;--server.port=8501&#34;</span>, <span style="color:#e6db74">&#34;--server.address=0.0.0.0&#34;</span>]<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Build and run</span>
</span></span><span style="display:flex;"><span>docker build -t chatbot-gpt5 .
</span></span><span style="display:flex;"><span>docker run -p 8501:8501 --env-file .env chatbot-gpt5
</span></span></code></pre></div><h3 id="fastapi-for-production-apis">FastAPI for Production APIs</h3>
<p>For a production REST API instead of a Streamlit prototype:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> fastapi <span style="color:#f92672">import</span> FastAPI, HTTPException
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> pydantic <span style="color:#f92672">import</span> BaseModel
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>app <span style="color:#f92672">=</span> FastAPI()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">ChatRequest</span>(BaseModel):
</span></span><span style="display:flex;"><span>    message: str
</span></span><span style="display:flex;"><span>    session_id: str
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">ChatResponse</span>(BaseModel):
</span></span><span style="display:flex;"><span>    answer: str
</span></span><span style="display:flex;"><span>    sources: list[str]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@app.post</span>(<span style="color:#e6db74">&#34;/chat&#34;</span>, response_model<span style="color:#f92672">=</span>ChatResponse)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">chat</span>(request: ChatRequest):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">try</span>:
</span></span><span style="display:flex;"><span>        response <span style="color:#f92672">=</span> rag_chain({<span style="color:#e6db74">&#34;question&#34;</span>: request<span style="color:#f92672">.</span>message})
</span></span><span style="display:flex;"><span>        sources <span style="color:#f92672">=</span> [d<span style="color:#f92672">.</span>metadata<span style="color:#f92672">.</span>get(<span style="color:#e6db74">&#34;source&#34;</span>, <span style="color:#e6db74">&#34;&#34;</span>) <span style="color:#66d9ef">for</span> d <span style="color:#f92672">in</span> response<span style="color:#f92672">.</span>get(<span style="color:#e6db74">&#34;source_documents&#34;</span>, [])]
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> ChatResponse(answer<span style="color:#f92672">=</span>response[<span style="color:#e6db74">&#34;answer&#34;</span>], sources<span style="color:#f92672">=</span>sources)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">except</span> <span style="color:#a6e22e">Exception</span> <span style="color:#66d9ef">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">raise</span> HTTPException(status_code<span style="color:#f92672">=</span><span style="color:#ae81ff">500</span>, detail<span style="color:#f92672">=</span>str(e))
</span></span></code></pre></div><h2 id="advanced-agentic-chatbots-with-tool-integration">Advanced: Agentic Chatbots with Tool Integration</h2>
<p>Standard RAG answers questions from static documents. Agentic chatbots go further — they can browse the web, query live databases, send emails, or call APIs. GPT-5&rsquo;s improved tool-calling makes this significantly more reliable than previous models.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.agents <span style="color:#f92672">import</span> AgentExecutor, create_openai_tools_agent
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.tools <span style="color:#f92672">import</span> tool
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain <span style="color:#f92672">import</span> hub
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Define custom tools</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@tool</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">search_crm</span>(customer_email: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&#34;&#34;Look up customer account status and subscription tier from CRM.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Connect to your CRM API here</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Customer </span><span style="color:#e6db74">{</span>customer_email<span style="color:#e6db74">}</span><span style="color:#e6db74">: Pro plan, active since 2025-03&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@tool</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">create_support_ticket</span>(subject: str, description: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&#34;&#34;Create a support ticket in the ticketing system.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Connect to Zendesk, Linear, etc.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Ticket created: #</span><span style="color:#e6db74">{</span>hash(subject) <span style="color:#f92672">%</span> <span style="color:#ae81ff">100000</span><span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>tools <span style="color:#f92672">=</span> [search_crm, create_support_ticket]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Create agent with tools</span>
</span></span><span style="display:flex;"><span>prompt <span style="color:#f92672">=</span> hub<span style="color:#f92672">.</span>pull(<span style="color:#e6db74">&#34;hwchase17/openai-tools-agent&#34;</span>)
</span></span><span style="display:flex;"><span>agent <span style="color:#f92672">=</span> create_openai_tools_agent(llm, tools, prompt)
</span></span><span style="display:flex;"><span>agent_executor <span style="color:#f92672">=</span> AgentExecutor(agent<span style="color:#f92672">=</span>agent, tools<span style="color:#f92672">=</span>tools, verbose<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Agent can now look up customer data and create tickets autonomously</span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> agent_executor<span style="color:#f92672">.</span>invoke({
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;input&#34;</span>: <span style="color:#e6db74">&#34;My billing seems wrong for account user@example.com, can you check and escalate?&#34;</span>
</span></span><span style="display:flex;"><span>})
</span></span></code></pre></div><h2 id="cost-analysis-and-optimization">Cost Analysis and Optimization</h2>
<p>GPT-5 API pricing varies by usage tier. Here&rsquo;s a realistic cost model for a B2B support chatbot at 10,000 conversations/month:</p>
<table>
  <thead>
      <tr>
          <th>Component</th>
          <th>Estimated Cost</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>GPT-5 API (input + output tokens)</td>
          <td>$80–$200/month</td>
      </tr>
      <tr>
          <td>Pinecone (managed vector DB)</td>
          <td>$70/month</td>
      </tr>
      <tr>
          <td>Embedding API (OpenAI)</td>
          <td>$5–$15/month</td>
      </tr>
      <tr>
          <td>Hosting (Cloud Run or Railway)</td>
          <td>$20–$50/month</td>
      </tr>
      <tr>
          <td><strong>Total</strong></td>
          <td><strong>$175–$335/month</strong></td>
      </tr>
  </tbody>
</table>
<h3 id="cost-reduction-strategies">Cost Reduction Strategies</h3>
<ol>
<li><strong>Cache frequent queries</strong> — use Redis to cache responses for identical or near-identical questions</li>
<li><strong>Reduce chunk retrieval</strong> — tune <code>k</code> in the retriever (fewer chunks = fewer tokens)</li>
<li><strong>Use smaller models for triage</strong> — route simple questions to GPT-4o-mini before escalating to GPT-5</li>
<li><strong>Batch embeddings</strong> — re-embed documents in bulk during off-peak hours</li>
<li><strong>Compress conversation history</strong> — use <code>ConversationSummaryBufferMemory</code> to summarize older turns</li>
</ol>
<h2 id="no-code-platforms-vs-custom-development">No-Code Platforms vs. Custom Development</h2>
<p>Not every team needs to write code. Here&rsquo;s the honest trade-off:</p>
<table>
  <thead>
      <tr>
          <th>Criteria</th>
          <th>No-Code Platforms</th>
          <th>Custom Development</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Time to first chatbot</td>
          <td>Hours</td>
          <td>Days to weeks</td>
      </tr>
      <tr>
          <td>Technical skill required</td>
          <td>None</td>
          <td>Python + APIs</td>
      </tr>
      <tr>
          <td>Customization</td>
          <td>Limited</td>
          <td>Full control</td>
      </tr>
      <tr>
          <td>Integration flexibility</td>
          <td>Pre-built connectors only</td>
          <td>Any API</td>
      </tr>
      <tr>
          <td>Scalability</td>
          <td>Platform limits</td>
          <td>Unlimited</td>
      </tr>
      <tr>
          <td>Cost</td>
          <td>$49–$500+/month</td>
          <td>Variable (API costs)</td>
      </tr>
      <tr>
          <td>Data ownership</td>
          <td>Vendor-controlled</td>
          <td>Full ownership</td>
      </tr>
  </tbody>
</table>
<p><strong>No-code platforms to consider:</strong></p>
<ul>
<li><strong>CustomGPT.ai</strong> ($49/month) — upload documents, get a working chatbot in minutes, GPT-5 powered</li>
<li><strong>Botpress</strong> (Community edition free) — visual flow builder, open-source core, strong for complex conversation flows</li>
<li><strong>CalStudio</strong> (Freemium) — GPT-5 chatbot builder focused on rapid deployment and monetization</li>
</ul>
<p>A 2026 CalStudio user survey found that no-code platforms reduced development time from weeks to hours for 70% of surveyed businesses. If you need a working prototype in a day and customization isn&rsquo;t critical, no-code wins on speed.</p>
<p>For production systems that need full data control, custom integrations, or enterprise-grade reliability, custom development with LangChain + GPT-5 + Pinecone is the better long-term investment.</p>
<h2 id="future-trends-ai-chatbots-beyond-2026">Future Trends: AI Chatbots Beyond 2026</h2>
<p>The chatbot category is moving fast. Here&rsquo;s what to watch:</p>
<p><strong>Multi-agent systems</strong> — single chatbots give way to coordinated agent networks. A customer service &ldquo;chatbot&rdquo; becomes a team: a triage agent, a knowledge retrieval agent, a CRM lookup agent, and a human-escalation agent — all orchestrated automatically.</p>
<p><strong>Multimodal inputs</strong> — GPT-5&rsquo;s native multimodal reasoning means users can share screenshots, voice messages, and images, not just text. Support bots that can &ldquo;see&rdquo; error screenshots will resolve issues dramatically faster.</p>
<p><strong>Real-time knowledge</strong> — web browsing tools and live database connections reduce reliance on pre-indexed knowledge bases. The boundary between RAG and live search is blurring.</p>
<p><strong>Voice-native chatbots</strong> — OpenAI&rsquo;s real-time audio APIs and dedicated voice models make low-latency voice chatbots viable for call center automation and mobile applications.</p>
<p><strong>Edge deployment</strong> — smaller, distilled models running on-device (phones, browsers via WASM) enable offline-capable chatbots with zero API latency.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Building a GPT-5 RAG chatbot in 2026 is both more accessible and more powerful than it was a year ago. The core stack — OpenAI API + LangChain + Pinecone — is battle-tested and well-documented. GPT-5&rsquo;s larger context window and improved tool-calling address most of the reliability issues that plagued earlier deployments.</p>
<p>Start with the step-by-step code in this guide. Get a working RAG pipeline running locally first, then optimize retrieval quality before worrying about deployment infrastructure. The biggest chatbot failures in production come from poor retrieval, not poor generation — invest your time there.</p>
<p>If you&rsquo;re not ready to write code, CustomGPT.ai or Botpress can have you running in hours. If you need enterprise reliability, full data ownership, and custom integrations, build with LangChain and deploy on Cloud Run or AWS Lambda.</p>
<p>The organizations that ship useful, grounded chatbots now — rather than waiting for a perfect solution — will have a significant advantage as the technology matures through 2026 and beyond.</p>
<hr>
<h2 id="frequently-asked-questions">Frequently Asked Questions</h2>
<h3 id="what-is-rag-and-why-do-i-need-it-for-a-gpt-5-chatbot">What is RAG and why do I need it for a GPT-5 chatbot?</h3>
<p>RAG (Retrieval-Augmented Generation) lets your chatbot answer questions based on your specific documents, FAQs, or databases — not just GPT-5&rsquo;s training data. Without RAG, GPT-5 cannot access your proprietary knowledge and will hallucinate answers or give generic responses. RAG reduces hallucination rates by 40–60% compared to standalone LLMs (Pinecone, 2025), making it essential for any chatbot that needs to be accurate about your specific domain.</p>
<h3 id="do-i-need-to-fine-tune-gpt-5-to-build-a-custom-chatbot">Do I need to fine-tune GPT-5 to build a custom chatbot?</h3>
<p>No. For most chatbot use cases, RAG outperforms fine-tuning at a fraction of the cost and complexity. Fine-tuning is better suited to changing the model&rsquo;s tone, format, or reasoning style — not for adding new knowledge. Use RAG when you want the chatbot to answer from a specific, updatable knowledge base. Use fine-tuning only when RAG alone cannot achieve the response style you need.</p>
<h3 id="which-vector-database-should-i-use-for-a-gpt-5-rag-chatbot">Which vector database should I use for a GPT-5 RAG chatbot?</h3>
<p>For local development and prototyping, use FAISS or Chroma — both are free and require no account setup. For production, Pinecone is the most widely used managed option with excellent latency and scalability (starts ~$70/month). Weaviate is a strong alternative if you need hybrid keyword + semantic search or prefer self-hosting. Choose based on your scale requirements and whether you want a managed service or control over your infrastructure.</p>
<h3 id="how-much-does-it-cost-to-run-a-gpt-5-chatbot">How much does it cost to run a GPT-5 chatbot?</h3>
<p>A realistic production chatbot at 10,000 conversations per month costs approximately $175–$335/month including GPT-5 API costs, vector database hosting, and infrastructure. The biggest variable is GPT-5 API usage — optimize by caching common queries, routing simple questions to cheaper models like GPT-4o-mini, and compressing conversation history. No-code platforms like CustomGPT.ai start at $49/month but have usage limits that may become expensive at scale.</p>
<h3 id="can-i-use-a-different-llm-instead-of-gpt-5-for-this-tutorial">Can I use a different LLM instead of GPT-5 for this tutorial?</h3>
<p>Yes. The LangChain-based architecture in this tutorial works with any supported LLM. Replace <code>ChatOpenAI(model=&quot;gpt-5&quot;)</code> with the appropriate LangChain wrapper for your provider: <code>ChatAnthropic</code> for Claude 4, <code>ChatGoogleGenerativeAI</code> for Gemini, or <code>ChatOllama</code> for a local open-source model. Each provider has different pricing, context window sizes, and tool-calling capabilities — the RAG pipeline and vector database components remain the same regardless of which LLM you choose.</p>
]]></content:encoded></item></channel></rss>