Skip to main content

HowTo - Data Source - Refresh data in an AWS Knowledge Base

This article will help you refresh the data in a Knowledge Base

Updated over 4 months ago

Overview

The AWS Knowledge Base does not refresh information from it's data sources automatically. If you want the information to refresh without manually doing clicking the button, you will need to do this using some form of automation.

High Level Steps

Prerequisites

You will need the following values to complete your configuration.

  • Admin rights to AWS

  • Knowledge Base ID (for the KB you want to refresh)

  • Data Source ID (for the data source of the KB you want to refresh)

AWS Lambda Function

Create the Lambda

This function will use variables and is capable of handling any multiple different KBs and Data Sources by passing in the appropriate values. You will do this when you create the EventBridge later.

  • Give the Function name

  • Change the Runtime to Python 3.xx

  • Select Create function

  • Replace existing code in the lamda_function.py with the following

import json
import boto3
import os
import logging

# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Initialize AWS client for Amazon Bedrock Knowledge Bases
bedrock_agent = boto3.client('bedrock-agent')

def lambda_handler(event, context):
logger.info(f"Received event: {json.dumps(event)}")

try:
# Extract KB ID and data source ID from the event
# These can be passed from QuickSight schedule
kb_id = event.get('kb_id')
data_source_id = event.get('data_source_id')

# Validate the required parameters
if not kb_id or not data_source_id:
error_msg = "Missing required parameters. Both 'kb_id' and 'data_source_id' are required."
logger.error(error_msg)
return {
'statusCode': 400,
'body': json.dumps({'error': error_msg})
}

# Start the data source refresh
response = bedrock_agent.start_ingestion_job(
knowledgeBaseId=kb_id,
dataSourceId=data_source_id
)

logger.info(f"Started ingestion job: {response}")

return {
'statusCode': 200,
'body': json.dumps({
'message': 'Data source refresh started successfully',
'ingestionJobId': response.get('ingestionJobId'),
'knowledgeBaseId': kb_id,
'dataSourceId': data_source_id
})
}

except Exception as e:
logger.error(f"Error refreshing data source: {str(e)}")
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}


​

Update the Permissions

  • Select Configuration > Permissions

  • Open the Role name by clicking it

  • Select Add permissions > Create inline policy

  • Select JSON

  • Paste the following into the Policy editor (replacing everything else) to allow access to the Lambda to refresh the data source.

{ 
"Version": "2012-10-17",
"Statement":
[
{ "Effect": "Allow",
"Action": [ "bedrock:StartIngestionJob" ],
"Resource": [ "arn:aws:bedrock:us-east-1:905418109363:knowledge-base/*" ]
}
]
}

  • Scroll down and click Next

  • Give it a name and click Create policy

  • The Role should be updated and shown on the screen

Create an AWS EventBridge

The EventBridge will allow you to both setup a schedule to run a labmda, and send it variables for the KB and Data Source. You can create multiple EventBridge configurations to refresh all of your KBs data sources.

  • Create a name and description for the schedule

  • Select Recurring schedule

  • Enter information into the cron fields. Shown is a schedule that will run every morning at 1:10AM Pacific time. Every day of the year.

    • Please note the "?" in Day of the week

    • Please don't setup a frequency under 24 hours without consideration of how long it might take to injest and refresh the information.

  • Scroll down

  • Verify with the Next 10 trigger dates that you have setup what you wanted for refresh times.

  • Adjust the cron times if needed

  • Select an option in the Flexible time window

    • I have 5 minutes selected

  • Click Next when ready

  • Select AWS Lambda Invoke

  • Select the Lambda function in the dropdown that you made in the previous steps

  • Scroll down until you can see the Payload screen and insert the following into the window.

    • Be sure to replace the two variables in green with your Knowledge Base ID and your Data Source ID.

  • Scroll down and click Next

  • On the Next screen click Next

  • On the final screen validate the Schedule details and Target details

  • When ready click Create schedule

  • Be sure to check the Knowledge Base and the Data Source after the next expected run to be sure it completed ok.

  • The time can be adjusted so that it's triggered a few minutes from when you are doing this work so you don't need to wait until the next day.

    • By adjusting the time you can be sure that timing and permissions are all as expected.

Did this answer your question?