> ## Documentation Index
> Fetch the complete documentation index at: https://docs.merchantai.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Training Your MerchantAI Agent with a Knowledge Base

> Build a knowledge base from your website, uploaded files, Q&A pairs, and Shopify catalogue so your agent always answers from approved sources.

Your agent's quality depends entirely on what it knows. MerchantAI lets you build a knowledge base from multiple source types — your live website, uploaded documents, hand-crafted Q\&A pairs, and your Shopify catalogue — so every answer the agent gives is grounded in content you have approved. The more complete and up-to-date your knowledge base is, the more confidently your agent can help visitors without escalating to a human.

## What the knowledge base contains

MerchantAI can index content from the following source types:

* **Website crawl** — MerchantAI crawls the pages on your domain and indexes their text content for retrieval.
* **Uploaded files** — PDF and DOCX files you upload directly from the dashboard, such as product manuals, internal policies, or FAQ documents.
* **Q\&A pairs** — Individual question-and-answer entries you write manually, useful for queries that need precise, controlled wording.
* **Shopify catalogue** — When your Shopify store is connected, MerchantAI automatically indexes your products, variants, collections, store policies, and blog posts, and keeps them in sync via webhooks whenever your catalogue changes.

## Adding website sources

You can point MerchantAI at any URL and it will crawl the pages it finds there, extracting text content and making it available for retrieval.

<Steps>
  <Step title="Enter your website URL">
    In the dashboard, navigate to **Knowledge → Website Sources** and paste the URL of the page or site section you want to index. You can add multiple URLs.
  </Step>

  <Step title="MerchantAI crawls your pages">
    MerchantAI follows links from the starting URL and indexes the text content it finds. Crawl time depends on the size of your site.
  </Step>

  <Step title="Review indexed content">
    Once the crawl completes, a list of indexed pages appears in the dashboard. Check that the right pages have been captured and that the content looks correct.
  </Step>

  <Step title="Approve sources for use">
    Toggle each page or the entire crawl to **Approved** status. Only approved sources are used when generating answers. You can remove or exclude pages at any time.
  </Step>
</Steps>

## Uploading files

If you have important content that lives in documents rather than on your website — such as detailed FAQs, product manuals, or internal policy guides — you can upload PDF and DOCX files directly from the dashboard. Navigate to **Knowledge → Files**, drag in your documents, and MerchantAI will parse and index them automatically. Uploaded files are treated the same as website sources: they must be approved before the agent can draw on them.

<CardGroup cols={2}>
  <Card title="PDF files" icon="file-pdf">
    Upload policy documents, product guides, and FAQ sheets in PDF format. Tables and structured text are extracted and indexed.
  </Card>

  <Card title="DOCX files" icon="file-word">
    Upload Word documents such as internal knowledge articles or manuals. Headings and body text are preserved during indexing.
  </Card>
</CardGroup>

## Creating Q\&A pairs

For questions that come up frequently and require a specific, carefully worded answer, Q\&A pairs give you the most control. Navigate to **Knowledge → Q\&A Pairs** and add an entry for each question. The agent will prefer a matching Q\&A pair over a retrieved passage when the visitor's query is a close match, making this the best tool for guaranteeing consistent answers on sensitive topics like returns, shipping timelines, or pricing.

## Shopify catalogue

When you connect your Shopify store to MerchantAI, your catalogue is indexed automatically. No manual uploads or crawl configuration are needed for Shopify content.

<Accordion title="What gets indexed from Shopify">
  * **Products** — titles, descriptions, tags, and metadata for every active product
  * **Variants** — size, colour, material, and other variant attributes
  * **Collections** — collection names, descriptions, and their product associations
  * **Policies** — your store's shipping, refund, and privacy policies
  * **Blog posts** — published blog content, useful for answering questions about guides, how-tos, and announcements
</Accordion>

Shopify content is kept in sync automatically. When you update a product description, change a policy, or publish a new blog post, MerchantAI receives a webhook notification and re-syncs the affected content. You do not need to trigger a manual re-index for Shopify changes.

<Note>
  Before you go live, always review your indexed content in the dashboard and confirm that all active sources are set to **Approved**. Content that has been crawled or uploaded but not approved will not be used by the agent, even if it appears in the source list.
</Note>

## Knowledge base limits by plan

The total size of your knowledge base — across all source types — is capped by your plan. File sizes, crawl content, and Q\&A pairs all count toward this limit.

| Plan       | Knowledge Base Size |
| ---------- | ------------------- |
| Free       | 1 MB                |
| Starter    | 15 MB               |
| Pro        | 30 MB               |
| Business   | 60 MB               |
| Enterprise | Custom              |

If you are approaching your limit, consider removing outdated sources, consolidating duplicate content, or upgrading your plan. Enterprise customers can discuss custom limits with the MerchantAI team.
