It's important to distinguish between reliable and unreliable URLs and documents to avoid spending time indexing content that cannot be ingested, indexed, or effectively used in the knowledge base. Poor-quality sources can also lead to subpar performance with Retrieval-Augmented Generation (RAG).
What Makes a Good Source?
Good sources are text-rich, accessible, well-formatted, and free of broken links or access restrictions.
What Makes a Bad Source?
Unreliable sources include URLs that primarily contain videos, maps, calendars, or tabular data—especially when the data is dynamically pulled from a backend system.
For example, here are some sources to avoid: