It's important to distinguish between reliable and unreliable URLs and documents to avoid spending time indexing content that cannot be ingested, indexed, or effectively used in the knowledge base. Poor-quality sources can also lead to subpar performance with Retrieval-Augmented Generation (RAG).

What Makes a Good Source?

Good sources are text-rich, accessible, well-formatted, and free of broken links or access restrictions.

What Makes a Bad Source?

Unreliable sources include URLs that primarily contain videos, maps, calendars, or tabular data—especially when the data is dynamically pulled from a backend system.

For example, here are some sources to avoid:

Interactive Maps

image.png

Databases

Database adjusted.jpg

Dynamic Tables

Interactive calendar adjusted.jpg

Dynamic Calendars

image.png