Skip to main content

How SurePath AI assesses risk

Explains two-stage risk assessment for prompts and responses based on service, safety, intent, and data sensitivity. Covers Low/Medium/High levels and policy integration.

Updated over 2 weeks ago

Overview

SurePath AI evaluates risk at two distinct stages of every AI interaction to help organizations understand and manage potential security and compliance concerns. When users send prompts to AI services, SurePath AI intercepts these interactions and performs comprehensive risk assessments on both the input (the user's prompt) and the output (the AI service's response). Each assessment produces an overall risk level that admins can use to inform policy decisions, trigger alerts, or block high-risk interactions.

The risk assessment framework analyzes multiple factors including the security posture of the AI service being used, the presence of harmful content or prompt injection attempts, the intent behind the request, and the sensitivity of any data detected in the interaction. By understanding these risk scores, admins can make informed decisions about which AI services to allow, what types of interactions require additional review, and how to protect sensitive organizational data from unauthorized exposure.

Risk levels

Individual risk signals are classified into one of three levels:

Level

Description

Low

Minimal risk; standard business operations

Medium

Moderate risk; may require monitoring or review

High

Significant risk; may trigger policy actions

Note: The Overall Risk Level for an interaction is calculated as Low, Medium, or High. The Critical level applies only to individual risk signals (such as data sensitivity).

Input risk assessment

When users send prompts to AI services, SurePath AI analyzes five key factors to determine the risk level of the interaction.

Service risk

Service risk represents the inherent risk level of the AI service being used. SurePath AI maintains risk classifications for known AI services based on their security posture, data handling practices, and whether they train models on user-submitted data. Public AI services that retain or train on user data typically receive higher service risk classifications, while enterprise services with strong data protection guarantees receive lower classifications.

When organizations use the SurePath AI Portal with private models, service risk is automatically classified as low since data remains within the organization's control and is not exposed to external AI providers.

Safety risk

Safety risk is a composite score derived from two threat detection systems that analyze the content and structure of user prompts. The first factor, harmful content detection (also called toxicity detection), identifies offensive, inappropriate, or harmful language in prompts. The second factor, prompt injection detection, identifies attempts to manipulate or jailbreak the AI system through carefully crafted instructions designed to bypass safety controls.

Maximum of (Harmful Content, Prompt Injection)

Intent risk

Intent risk represents the risk level based on what users are trying to accomplish with their AI requests. SurePath AI analyzes the intent behind each prompt and classifies it into business domains such as Finance, Legal, Health, Engineering, and others. Each domain is mapped to an appropriate risk level based on the sensitivity of the information typically involved and the potential impact of errors or data exposure in that domain.

Intent domain classifications and their associated risk levels are continuously refined based on emerging threats, regulatory changes, and customer feedback. The specific mappings are managed internally and as risk calculations evolve through data analysis and customer feedback, the mappings are subject to change.

Data sensitivity

Data sensitivity classifies any sensitive information detected in user prompts. SurePath AI scans prompts for various types of sensitive data and assigns a classification level based on the potential impact if that data were to be exposed or mishandled.

Classification

Description

Examples

Public

Commonly available information with lower sensitivity

Published website URLs, public business contact information

Internal

Internal business data not meant for public disclosure

IP addresses, vehicle registrations

Confidential

Sensitive data requiring protection

Email addresses, driver's licenses, tax IDs, cryptocurrency addresses

Critical

Highly sensitive data requiring strict controls

Social Security Numbers, credit cards, bank accounts, passport numbers, healthcare IDs

Data exposure impact

Data exposure impact represents the potential consequences of data being exposed to the AI service. This calculation takes into account the base sensitivity of detected data, whether the service trains on user data, and whether the intent of the request amplifies the exposure risk.

The base impact is determined by the highest sensitivity classification of data found in the prompt:

Data Sensitivity

Base Impact

Critical or Confidential

High

Internal

Medium

Public or None

Low

The base impact is then modified by two factors. First, if the AI service trains on user data, the exposure impact increases because the data may be retained and potentially incorporated into the service's models. This modifier elevates Low impact to Medium and any Medium or High impact to High. Second, if the intent risk is High, the exposure impact also increases because high-risk domains typically involve more sensitive contexts where data exposure could have greater consequences. This modifier elevates Low impact to Medium and Medium impact to High.

Output risk assessment

When AI services return responses, SurePath AI analyzes the output to identify potential risks in the content being delivered to users.

Service risk

Service risk for output assessment is the same as for input assessment, representing the inherent risk level of the AI service that generated the response.

Service risk calculation is covered in this article.

Safety risk

Safety risk for outputs is a composite score derived from two content detection systems. The first factor, harmful content detection (toxicity), identifies offensive, inappropriate, or harmful language in the AI-generated response. The second factor, bias detection, identifies biased or discriminatory content in the response that could violate organizational policies or create legal exposure.

The safety risk calculation follows the same logic as input safety risk:

Maximum of (Harmful Content, Bias Detection)

Overall risk calculation

SurePath AI combines the individual risk factors into an overall risk level that represents the complete risk profile of the interaction. This calculation uses different methodologies for input and output assessments.

Input overall risk

Input risk calculation follows a two-step process that first calculates intermediate values and then combines them using a risk matrix.

Step 1: Calculate intermediate risk values

Before applying the risk matrix, SurePath AI calculates two intermediate values:

Interaction Risk is determined by taking the maximum (highest) value among three factors:

  • Service Risk (the inherent risk of the AI service)

  • Safety Risk (composite of toxicity and prompt injection detection)

  • Intent Risk (the risk level of the business domain)

For example, if Service Risk is Low, Safety Risk is Medium, and Intent Risk is Low, the Interaction Risk would be Medium (the highest of the three values).

Data Exposure Impact is determined by the data sensitivity classification and then modified by two amplification factors:

  • Base impact is determined by the highest sensitivity classification detected

  • If the service trains on user data, the impact increases

  • If Intent Risk is High, the impact increases again

For example, if Confidential data is detected (base impact: High) and the service trains on user data, the Data Exposure Impact remains High. If only Internal data is detected (base impact: Medium) but the service does not train on data and Intent Risk is Low, the Data Exposure Impact remains Medium.

Step 2: Apply the Risk Matrix

Once the intermediate values are calculated, they are combined using the following Risk Matrix to determine the overall input risk:

Data Exposure: Low

Data Exposure: Medium

Data Exposure: High

Interaction Risk: Low

Low

Low

Medium

Interaction Risk: Medium

Low

Medium

High

Interaction Risk: High

Medium

High

High

To use the matrix, locate the row corresponding to the Interaction Risk level and the column corresponding to the Data Exposure Impact level. The intersecting cell shows the overall input risk level.

For example:

  • Interaction Risk: Medium, Data Exposure Impact: High β†’ Overall Input Risk: High

  • Interaction Risk: Low, Data Exposure Impact: Medium β†’ Overall Input Risk: Low

  • Interaction Risk: High, Data Exposure Impact: Low β†’ Overall Input Risk: Medium

This matrix ensures that even low-risk interactions receive elevated risk scores when sensitive data is involved, and that high-risk interactions with any level of data exposure receive appropriate attention.

Output overall risk

Output risk assessment is simpler than input assessment because outputs do not include data exposure considerations. The calculation follows a single step:

Overall Output Risk = Maximum of (Service Risk, Safety Risk)

The overall output risk is determined by taking the highest value between the Service Risk (inherent risk of the AI service) and the Safety Risk (composite of toxicity and bias detection in the response). For example, if Service Risk is Low and Safety Risk is Medium, the overall output risk would be Medium.

Policy integration

Admins can use risk scores to configure automated policy actions that protect organizations from risky AI interactions. Risk levels serve as triggers for policy rules defined in the Default Policy or Group Policies.

To configure policy actions based on risk levels, click Default Policy or Group Policies in the GOVERN section of the Admin UI. Within policy rules, admins can specify actions to take when interactions meet certain risk thresholds, such as blocking the interaction, requiring additional approval, redacting sensitive data, or logging the interaction for review.

Currently, the risk calculation thresholds and the underlying risk factor weightings are managed by SurePath AI and cannot be customized by admins. However, the ability for organizations to customize risk thresholds and adjust risk calculations to match their specific risk tolerance is part of the product roadmap.

Viewing risk scores

Admins can view risk assessments for individual interactions and analyze aggregated risk trends across the organization.

To view risk scores for individual user interactions, click User Activity in the OBSERVE section. The User Activity view displays each intercepted AI interaction along with its calculated risk scores, detected data types, and any policy actions that were triggered.

To view aggregated risk analytics and trends over time, click Insights in the OBSERVE section. The Insights dashboard provides visualizations of risk patterns, helping admins identify high-risk users, services, or domains that may require additional policy controls or user training.

Examples

The following examples illustrate how different interaction scenarios result in different overall risk levels by walking through the complete risk calculation process.

Example: Low risk interaction

A user asks a low-risk AI service to summarize a public news article. The prompt contains no sensitive data and uses professional language appropriate for business use.

Individual risk factors:

  • Service Risk: Low

  • Safety Risk: Low (no toxicity or injection detected)

  • Intent Risk: Low (general-purpose request)

  • Data Sensitivity: None detected

Calculation step 1 - Determine Interaction Risk: Interaction Risk = Maximum of (Service Risk: Low, Safety Risk: Low, Intent Risk: Low) = Low

Calculation step 2 - Determine Data Exposure Impact: Base impact from data sensitivity = Low (no sensitive data detected) No modifiers apply (service doesn't train on data, Intent Risk is not High) Data Exposure Impact = Low

Calculation step 3 - Apply Risk Matrix: Looking up the intersection of Interaction Risk: Low and Data Exposure Impact: Low in the Risk Matrix yields Low.

Overall Input Risk: Low

Example: Elevated risk from sensitive data

A user asks a low-risk AI service to help draft internal documentation, but the prompt includes employee email addresses for a distribution list.

Individual risk factors:

  • Service Risk: Low

  • Safety Risk: Low (no toxicity or injection detected)

  • Intent Risk: Low (general-purpose request)

  • Data Sensitivity: Confidential (email addresses detected)

Calculation step 1 - Determine Interaction Risk: Interaction Risk = Maximum of (Service Risk: Low, Safety Risk: Low, Intent Risk: Low) = Low

Calculation step 2 - Determine Data Exposure Impact: Base impact from data sensitivity = High (Confidential data classification) No modifiers apply (service doesn't train on data, Intent Risk is not High) Data Exposure Impact = High

Calculation step 3 - Apply Risk Matrix: Looking up the intersection of Interaction Risk: Low and Data Exposure Impact: High in the Risk Matrix yields Medium.

Overall Input Risk: Medium

This example demonstrates how the Risk Matrix elevates the overall risk when sensitive data is present, even when the interaction itself appears benign. The Medium risk score would alert admins that confidential data was shared with an AI service and may trigger policies requiring data redaction or additional review.

Example: High risk interaction

A user sends a prompt containing Social Security Numbers to a public AI service that trains on user data. Additionally, the prompt includes language that triggers prompt injection detection, suggesting an attempt to manipulate the AI system.

Individual risk factors:

  • Service Risk: Medium

  • Safety Risk: High (prompt injection detected)

  • Intent Risk: Low (general-purpose request)

  • Data Sensitivity: Critical (SSN detected)

Calculation step 1 - Determine Interaction Risk: Interaction Risk = Maximum of (Service Risk: Medium, Safety Risk: High, Intent Risk: Low) = High

Calculation step 2 - Determine Data Exposure Impact: Base impact from data sensitivity = High (Critical data classification) Service trains on user data modifier: High remains High Data Exposure Impact = High

Calculation step 3 - Apply Risk Matrix: Looking up the intersection of Interaction Risk: High and Data Exposure Impact: High in the Risk Matrix yields High.

Overall Input Risk: High

This interaction represents a severe security concern combining multiple risk factors: highly sensitive data (SSNs), a potential security threat (prompt injection), and a service that retains user data. The High risk score would typically trigger blocking policies to prevent the exposure of critical data to an external AI service and require immediate security review.

Did this answer your question?