How to select the right sources to train your AI

Learn the impact of good and bad training data on AI with our guidelines for optimal results.

my imageSome great alternative text


Artificial Intelligence (AI) has become an integral part of various industries, revolutionizing the way we interact with technology. However, the effectiveness of AI systems heavily relies on the quality of the data used to train them. As the saying goes, "garbage in, garbage out." In this blog post, we will explore the significance of training data and provide guidelines for using good sources to achieve accurate and valuable AI outcomes.

To guarantee that AI systems receive the most effective and comprehensive training possible, we provide support for a variety of source types such as website content, PDFs, and FAQs.

Data sources you can use to train your AI

Website Content

Website content is an ideal resource for training AI since it tends to be clear and concise. The well-written and thought-out nature of website content makes it perfect for this task. The detailed analysis and descriptions present on websites provide valuable information that can be used to feed machine learning models.

Knowledge bases that contain help articles can be a valuable resource for training purposes. Like website content, they are also well-written and thought-out. Furthermore, they can be updated regularly with new information, ensuring that the AI always has access to the latest and most accurate training material.

Unless AI zone

Properly using headings, also known as H-tags in HTML, is crucial for creating structured and organized content. By using headings, you can segment your content into logical and easy-to-follow sections. This not only enhances the readability of your content but also improves the AI's ability to identify and extract the most relevant information. In other words, headings are an essential tool for optimizing your content for both human readers and search engines. So, it is important to understand the different types of headings and when to use them, to ensure that your content is well-structured and easy to navigate.

When using our AI, you have the option to expand your training set by manually adding individual pages or by allowing the AI to crawl your entire website through the sitemap.xml file. Our include/exclude feature gives you more control over which pages should be included or excluded during the crawling process, enabling you to tailor the training data according to your specific needs and preferences. By making these decisions, you can ensure that the AI is trained on the most relevant and accurate data, which can improve its overall performance and accuracy in the long run.

Add a training source


PDFs can be a valuable source of knowledge, similar to website content. Like web pages, PDFs should have appropriate headings and well-structured content for effective AI training. This means that the headings should be descriptive and accurately reflect the content that follows.

Moreover, it is important to ensure that the content is organized in a logical and coherent manner. This can be achieved through the use of subheadings, bullet points, and other formatting techniques to break up large chunks of text into more digestible pieces.

Keep in mind that images within PDFs will not be indexed, and tables with extensive data might not yield optimal results, as the AI thrives on written content with contextual meaning.

FAQ Entries

Our system offers the option to manually add FAQ entries directly into the training set. You can add, update, and delete these entries in real time. Adding a list of your frequently asked questions is an excellent way to instruct AI in offering specific answers to common queries.

The exact phrasing of the questions in the training data is not crucial, as the AI seeks to understand the context and meaning of the question rather than the specific words used.

FAQs can also be utilized to provide additional information or instructions, enhancing the accuracy and relevance of AI-generated responses.

Add an FAQ entry

In addition to the benefits of using FAQs for training AI, there is another valuable aspect worth mentioning. You have the flexibility to include temporary FAQs in the training set, providing a way to override specific responses temporarily under certain circumstances.

By adding temporary FAQs, you gain the ability to address transient situations effectively. For example, if a particular feature is currently experiencing issues and users are frequently asking about it, you can create a temporary FAQ entry explaining the situation. The AI will then take this information into account and provide appropriate responses until the issue is resolved.

Avoid User-Generated Content

While user-generated content, such as questions and answers from knowledge bases or help desks, may seem like valuable information to train AI, it comes with some significant drawbacks.

  • User questions may contain irrelevant context, leading to inaccurate AI responses. For example, if a user asks about using his own Mastercard for credit card payments and the AI is trained on this data, it might incorrectly assume that Mastercard is the preferred credit card used by the company.
  • User-generated questions may remain online for extended periods, leading to outdated information.
  • Bug reports might not accurately represent the actual behavior of the system, potentially resulting in misleading AI responses.


The quality of training data significantly impacts the performance of AI systems. By carefully selecting and curating sources such as website content, PDFs, and well-structured FAQs, you can ensure that your AI is trained with accurate and relevant information.

Avoiding user-generated content helps prevent misinformation and outdated data from influencing AI responses.

By following these guidelines, you can unleash the true potential of AI and enhance the user experience for your customers.

Related content

my imageSome great alternative text

Friendly support from real people

We’re here to help

We are known for our quick responses if you have an issue. Feel free to ask us anything. But you can also ask our conversational AI a question, of course!