Data Sources

Overview

The main purpose of a data source in SupSearch is to extract articles — small, searchable units of knowledge — from your connected information sources.
These articles can represent help topics, troubleshooting steps, policies, or sections of larger documents.

From the Knowledge Hub, you can:

  • Add and configure data sources

  • Extract and review articles from connected systems

  • Monitor synchronization or scraping progress

  • Connect extracted articles to Search Engines for indexing

2. Adding a New Data Source

  1. Go to Knowledge Hub → Data Sources.

  2. Click Add Data Source in the top-right corner.

  3. Select the type of data source (Puzzel, Web Scraper, Upload, or third-party).

  4. Fill in the required fields.

  5. Click Create Data Source to save.

Once created, the data source begins extracting articles when synced or scraped.
However, these articles only become searchable after you add the data source to a Search Engine.

Core Data Source Types

Most SupSearch environments use these three primary data sources:

  • Puzzel Knowledge Base (recommended)

  • Web Scraper (website extraction)

  • Upload (document-based extraction)


3.1 Puzzel Knowledge Base (Recommended)

The Puzzel connector integrates directly with your Puzzel Knowledge Base, automatically extracting published articles.
Each article becomes searchable after it’s linked to a Search Engine.

Required Fields

FieldDescription
NameA descriptive name (e.g. Puzzel Help KB)
LanguagePrimary content language
Customer KeyProvided by your Puzzel administrator
Username / PasswordAPI credentials
Use v2 Knowledge BaseToggle for latest KB API version

Steps

  1. Enter credentials and details.

  2. (Optional) Enable Use v2 Knowledge Base.

  3. Click Test Connection.

  4. Click Create Data Source to extract articles.


3.2 Web Scraper

The Web Scraper automatically extracts articles from public or internal web pages.
It’s ideal for external help centers, FAQ pages, and documentation portals.

Required Fields

FieldDescription
Namee.g. Broadband Support Website
DescriptionOptional details
DomainThe base URL or sitemap to scrape

💡 Tip: Limit the scope to specific subdomains or paths for faster results.
 


Steps

  1. Enter the domain and details.
    Fill in the name, description, and base domain for your website.
    You can also specify subdomains or paths to narrow the scope.

  2. Click Create Data Source.
    This saves your configuration and prepares the source for scraping.

  3. Click Scrape to start extraction.
    SupSearch uses a Large Language Model (LLM) to analyze your site and decide which pages should be scraped.
    The LLM automatically identifies relevant help articles, FAQs, or documentation pages.

     

  4. (Optional) Adjust the inclusion prompt.
    There is a default prompt that guides how the LLM decides which pages to include.
    The default works well in most cases, but you can edit it under Advanced Settings for finer control.

  5. Review the site tree.
    After crawling, a site tree is displayed showing all pages and subpages included in the scrape.
    You can override the LLM’s inclusion/exclusion decisions by clicking Include or Exclude next to each page.
     

  6. Confirm and complete the scrape.
    Once your selections are finalized, the web scrape completes and the extracted articles are ready for use.
     
  7. Your website content is now searchable through SupSearch.

Status Indicators

  • 🟠 Scraping – Extraction in progress

  • 🟢 Synced – Articles extracted and ready

  • No Data – No content found


3.3 Upload

The Upload data source lets you extract articles from local documents such as manuals, PDFs, or internal files.
SupSearch automatically splits documents into article-sized sections.

📸 [Screenshot – Upload Data Source view]

Supported File Types

  • PDF (.pdf)

  • Word (.docx)

  • CSV (.csv)

  • JSON (.json)


Creating an Upload Data Source

  1. Go to Knowledge Hub → Data Sources.

  2. Click Add Data Source → Upload.

  3. Enter:

    • Name – e.g. Product Manuals

    • Language – Primary language

    • Description – Optional

  4. Click Create Data Source.

 


Uploading and Converting Files

Converting Files into Articles (Refined)

  1. Click Upload Articles.
     

  2. Choose the file type.
    Select PDF/DOCX, CSV, or JSON.
     

  3. Choose the upload method.

    • URL: Enter the web address where the document is stored online, used for source links.

    • File: Click + Choose, select a file from your computer.
       

  4. Upload the document.

    • If using URL, paste the link and confirm.

    • If using File, select your file and click Upload.
      📸 Screenshot placeholder: “File chooser / URL input”

  5. Open the “Convert Files to Articles” modal.
    After the upload finishes, the Convert Files to Articles modal opens automatically.
     

  6. Add the Source URL (required).
    Enter a URL where the document can be found online. This is used for source reference in search results.

  7. Set how the document should be split into articles.

    • Header Font Sizes: Choose which heading levels to use for splitting (e.g., H2, H3).

    • Text Exclusion Conditions (optional): Define rules to remove unwanted sections (e.g., footers, disclaimers).

  8. Click Convert.
    SupSearch converts the document into articles using your selected rules.

  9. Review the results.
    After conversion, articles appear under the Articles tab for review and any edits (rename, merge, delete).
     

 

4. Third-Party Integrations

SupSearch can also extract articles from external systems through API-based connectors.
These work similarly but require authentication details for each platform.

ConnectorDescriptionRequired Fields
FreshdeskExtracts articles from Freshdesk Help CenterInstance Name, API Password
OneNoteExtracts notes from Microsoft OneNoteTenant ID, Client ID, Client Secret
Right AnswersConnects to RightAnswers APIEndpoint, Company Code, Username, Password
ServiceNowExtracts ServiceNow KB contentInstance Name, Username, Password, API Token
SharePointExtracts internal documentationTenant ID, Client ID, Client Secret
TopDeskExtracts IT/Service Desk knowledgeEndpoint, Username, Password
ZendeskExtracts Zendesk Help Center articlesEndpoint, API Token

💡 Note: These integrations also require being added to a Search Engine for their articles to become searchable.


5. Monitoring and Maintenance

The Data Sources list shows extraction progress and readiness for each source.

StatusMeaning
🟠 Scraping / SyncingExtracting articles in progress
🟢 SyncedExtraction complete and ready for indexing
No DataNo accessible content found

You can re-sync, edit, or delete data sources at any time.

 

 

Published

Last updated

0
0