Skip to main content

Connecting an AWS Knowledge Base to Confluence

This article describes how to populate an AWS Knowledge Base with the contents of a Confluence site

Updated over 2 weeks ago

Overview

This document covers the process of configuring AWS Knowledge Bases to collect and index information from a Confluence site. The setup process involves creating authentication credentials in Confluence, configuring an AWS Knowledge Base, and then connecting it to SurePath AI as a data source. Please note that none of the screens or setup steps are SurePath AI specific until the final section.

High level steps

The integration process requires admins to complete several key tasks. First, admins need to create a Confluence API token for authentication. Next, they will configure the AWS Knowledge Base with the Confluence credentials. After the initial sync completes, admins can configure content filters to control which spaces and pages are indexed. Finally, admins will add data contexts in SurePath AI using the new connector.

Confluence setup

Prerequisites

Admins will need the following to complete the Confluence configuration:

  • Access to a Confluence account with permissions to generate API tokens

  • The email address associated with the Confluence account

  • The base Confluence URL for the organization

Output checklist

Admins should collect the following information during this process:

  • Confluence email address (username)

  • Confluence API token (password)

  • Base Confluence URL

Create a Confluence API token

Confluence uses API tokens for authentication with external services like AWS Knowledge Bases. Admins need to create an API token that will be stored in AWS Secrets Manager.

  • Navigate to Atlassian Account Settings:

  • Select Create API token

  • Provide a descriptive label for the token (e.g., "AWS Knowledge Base Integration")

  • Select Create

  • Copy the API token immediately, as it will only be displayed once

  • Store the token securely along with the associated email address

The API token does not require any specific scopes to be configured. The token inherits the permissions of the user account that created it, so admins should ensure the user account has appropriate read access to the Confluence spaces that need to be indexed.

AWS setup

Prerequisites

Admins will need the following values to complete the AWS configuration:

  • Confluence email address (username)

  • Confluence API token (password)

  • Base Confluence URL

  • Admin rights to AWS with access to Secrets Manager and Bedrock

Output checklist

Admins should collect the following information during this process:

  • AWS Secrets Manager ARN for the Confluence credentials

  • AWS Knowledge Base ID

Create an AWS secret

Admins need to store the Confluence credentials in AWS Secrets Manager before configuring the Knowledge Base. The Knowledge Base will reference this secret for authentication.

  • Log in to AWS and navigate to Secrets Manager > Store a new secret

  • Select Other type of secret

  • Enter the two (2) key-value pairs into the UI. The text and case must match exactly:

    • username (value is the Confluence email address)

    • password (value is the Confluence API token created earlier)

  • Select Next

  • Enter a Secret name (spaces are not allowed)

  • Select Next and continue selecting Next and Store until the secret is saved

  • After creation, open the secret details and copy the full ARN (Amazon Resource Name) for use in the next section

Setup a new AWS Knowledge Base

Admins can now create the Knowledge Base that will connect to Confluence and index the content.

Configure the data source

The data source configuration defines which Confluence content will be indexed and how authentication will work.

  • Enter a data source Name or accept the default

  • Enter the Confluence host URL

    • This should be the base URL without any path components

    • Example: https://yourcompany.atlassian.net/wiki

  • Leave advanced options at their default settings

  • In the Authentication section, select Basic authentication

  • Enter the AWS Secrets Manager secret ARN that was created in the previous section

    • Paste the full ARN that was copied from Secrets Manager

  • For Chunking and parsing, select Amazon Bedrock default parser

  • For Chunking strategy, select Default chunking

  • Configure Metadata and filtering as needed (see the Content filtering section below for detailed guidance)

  • Select Next

Configure embeddings and vector store

The embeddings model converts the text content into vectors that enable semantic search capabilities.

  • For the Embeddings model, admins can choose between:

    • Cohere Embed English V3 - $0.0001 per 1000 tokens (recommended as the industry standard)

    • Amazon Titan Text Embeddings V2 - $0.00002 per 1000 tokens (lower cost option)

  • For Vector Store, select Quick create new vector store

    • Type: Amazon OpenSearch Serverless

    • Important: Set the OpenSearch capacity limits (OCU) to a value higher than 10. SurePath AI recommends values between 20 and 50 for optimal performance. While customers are only charged for the capacity they actually use, setting this limit too high could lead to significant charges if the Knowledge Base experiences high query volumes.

  • Select Next

  • Review all settings and select Create Knowledge Base

The Knowledge Base will be created and the initial sync will begin automatically. Depending on the amount of content in Confluence, the sync process may take several minutes to several hours.

Important: The Knowledge Base will not sync automatically after the initial creation. Admins must either trigger manual syncs or configure an automated refresh schedule. See the automatic KB refreshes documentation for setup instructions:

Content filtering

AWS Knowledge Bases support inclusion and exclusion filters to control which Confluence content gets indexed. Filters can target specific spaces or individual pages, allowing admins to create precise indexing rules.

Understanding filter behavior

The crawler behavior depends on how inclusion and exclusion filters are configured. When no inclusion filters are defined, the crawler will index the entire Confluence site except for any content specified in exclusion filters. When inclusion filters are defined, the crawler will only index the content explicitly listed in those filters, and exclusion filters can further refine what gets indexed within the included content.

Exclusion filters always take precedence over inclusion filters. If a space is included but specific pages within that space are excluded, those pages will not be indexed. Similarly, if no space inclusion filters are defined, space exclusion filters can block specific spaces from being indexed.

Space filters

Space filters use the Confluence Space Key rather than the space name or URL path. The Space Key is a system identifier that may differ from what appears in the URL.

To find the Space Key for a space, admins should navigate to the space settings page at https://{team-name}.atlassian.net/wiki/spaces/{space-name}/settings/details and locate the System Key value displayed on that page. This System Key is what should be entered in the space filter.

For example, if the Space Key is "ENG", admins would enter "ENG" in the space filter even if the URL shows a different value like "/spaces/engineering/".

Page filters

Page filters use the page title exactly as it appears at the top of the Confluence page. Unlike space filters, page filters do not use any hidden identifiers or system keys. Admins should copy the page title directly from the page and enter it into the filter.

Page titles are case-sensitive and must match exactly, including any special characters or spacing. If a page title is "Q4 Planning Document", the filter must specify that exact string.

Common filtering scenarios

Admins can implement several common filtering patterns depending on their indexing requirements:

Index specific spaces only: Define space inclusion filters for each desired space. Only content within those spaces will be indexed.

Index everything except specific spaces: Leave space inclusion filters empty, and define space exclusion filters for the spaces that should not be indexed.

Index a space but exclude specific pages: Define a space inclusion filter for the space, then add page exclusion filters for the specific page titles that should be skipped.

Index specific pages across the entire site: Leave space inclusion filters empty (to allow all spaces), and define page inclusion filters for the specific page titles to index. Note that this approach requires the page titles to be unique across spaces.

SurePath AI setup

Create a new connector

Before creating the data source in SurePath AI, admins need to ensure an AWS connector is configured. If an AWS connector already exists for private models or S3 buckets, that same connector can be used for Knowledge Bases.

For instructions on creating an AWS connector, see:

Create the data source

Once the AWS connector is in place, admins can create a data source that references the Knowledge Base. For detailed instructions on creating and configuring data sources in SurePath AI, including how to control user access, see:

Did this answer your question?