Rhesis AI | LLM Application Test Set Directory

Chatbot

Telecom

Robustness

Unfair Billing & Contract Terms Jailbreak

A robustness test set for a telecom chatbot, exploring unfair billing and contract terms, including potential jailbreak scenarios....

1540

February 17, 2025

Chatbot

Telecom

Robustness

Data Privacy & Unauthorized Tracking Jailbreak

The test set evaluates the robustness of a telecom chatbot by examining data privacy and unauthorized tracking in the context of jailbr...

1780

February 17, 2025

Chatbot

Telecom

Robustness

Hidden Fees & Misleading Pricing Harmful

The test set evaluates the robustness of a telecom chatbot by assessing its ability to handle harmful and misleading pricing topics, su...

199

February 17, 2025

Chatbot

Telecom

Robustness

Hidden Fees & Misleading Pricing Jailbreak

The test set evaluates the robustness of a Telecom Chatbot, specifically focusing on detecting hidden fees and misleading pricing relat...

1700

February 17, 2025

Chatbot

Telecom

Robustness

Data Privacy & Unauthorized Tracking Harmful

The test set evaluates the robustness of a telecom chatbot by checking for harmful behaviors that relate to data privacy and unauthoriz...

200

February 17, 2025

Chatbot

Telecom

Robustness

Ethics Jailbreak

A robustness test set for a telecom chatbot, focusing on ethics and detecting jailbreak-related behaviors in the telecom industry....

160

February 14, 2025

Chatbot

Telecom

Reliability

Landline and Internet Services Harmless

This test set evaluates the reliability of a telecom chatbot in addressing harmless issues related to landline and internet services....

February 14, 2025

Chatbot

Telecom

Robustness

Deceptive Sales Practices Jailbreak

This test set evaluates the robustness of a telecom chatbot in handling deceptive sales practices and jailbreak-related inquiries....

1580

February 14, 2025

Chatbot

Telecom

Robustness

Deceptive Sales Practices Harmful

The test set evaluates the robustness of a telecom chatbot in handling harmful behaviors related to deceptive sales practices....

200

February 14, 2025

Chatbot

Telecom

Robustness

Data Privacy & Unauthorized Tracking Jailbreak

The test set evaluates the robustness of a telecom chatbot by examining data privacy and unauthorized tracking in the context of jailbr...

1780

February 14, 2025

Chatbot

Telecom

Reliability

Access to Online Content Harmless

The test set evaluates the reliability of a chatbot designed for the telecom industry, ensuring it provides harmless access to online c...

134

February 14, 2025

Chatbot

Telecom

Robustness

Ethical Dilemmas Jailbreak

A robustness test set for a telecom chatbot, exploring ethical dilemmas related to jailbreaking in the telecom industry....

160

February 14, 2025

Chatbot

Telecom

Reliability

Telecommunications Rights Harmless

The test set evaluates the reliability of a telecom chatbot focused on harmless telecommunications rights topics in the telecom industr...

February 14, 2025

Chatbot, Chatbot

Insurance, Telecom

Reliability

Cross-border Compliance Harmless

This test set evaluates the reliability and harmless categories of chatbots in the telecom and insurance industries, focusing on cross-...

February 14, 2025

Chatbot

Telecom

Robustness

Hidden Fees & Misleading Pricing Harmful

The test set evaluates the robustness of a telecom chatbot by assessing its ability to handle harmful and misleading pricing topics, su...

199

February 14, 2025

Chatbot

Telecom

Robustness

Customer Service Issues Jailbreak

The test set is designed to assess the robustness of a telecom chatbot by simulating customer service issues related to jailbreaking....

1700

February 14, 2025

Chatbot

Telecom

Reliability

Roaming and Mobile Charges Harmless

The test set evaluates a telecom chatbot's reliability in handling harmless topics related to roaming and mobile charges....

110

February 14, 2025

Chatbot

Telecom

Robustness

Customer Service Issues Harmful

The test set consists of harmful customer service issues in the telecom industry, validating the robustness of a telecom chatbot....

200

February 14, 2025

Chatbot

Telecom

Robustness

Unfair Billing & Contract Terms Jailbreak

A robustness test set for a telecom chatbot, exploring unfair billing and contract terms, including potential jailbreak scenarios....

1540

February 19, 2025

Chatbot

Telecom

Robustness

Unfair Billing & Contract Terms Harmful

A robustness test set featuring harmful categories in the telecom industry, addressing unfair billing and contract terms....

200

February 14, 2025

Chatbot

Telecom

Robustness

Hidden Fees & Misleading Pricing Jailbreak

The test set evaluates the robustness of a Telecom Chatbot, specifically focusing on detecting hidden fees and misleading pricing relat...

1700

February 14, 2025

Chatbot

Telecom

Reliability

Privacy and Data Protection Harmless

The test set assesses the telecom chatbot's reliability in handling harmless inquiries regarding privacy and data protection in the tel...

124

February 14, 2025

Chatbot

Telecom

Robustness

Data Privacy & Unauthorized Tracking Harmful

The test set evaluates the robustness of a telecom chatbot by checking for harmful behaviors that relate to data privacy and unauthoriz...

200

February 14, 2025

Chatbot

Telecom

Robustness

Telecom Chatbot Benchmark

A diverse test set for the Telecom Chatbot, covering robustness, reliability, compliance, ethics, illegal activities, and customer serv...

11351

February 19, 2025

Chatbot

E-commerce

Robustness

Rhesis European E-commerce Chatbot Benchmark

This test set evaluates the robustness, reliability, and compliance of European e-commerce chatbots against various fraud and harmful a...

2661

February 13, 2025

Chatbot

Insurance

Reliability

Rhesis Insurance Chatbot Benchmark

A test set for assessing the robustness, reliability, and compliance of insurance chatbots in handling various types of fraud and indus...

11532

February 13, 2025

Chatbot

E-commerce

Reliability

Promotional Offer Clarity Harmless

The test set evaluates a European E-commerce Chatbot's reliability in providing clear and harmless information on promotional offers....

154

December 13, 2024

Chatbot

E-commerce

Reliability

Service Provider Details Harmless

The test set evaluates the reliability of a European e-commerce chatbot in providing harmless service provider details....

December 13, 2024

Chatbot

E-commerce

Robustness

Account Takeover Jailbreak

A test set for a European E-commerce Chatbot to assess robustness in handling account takeover and jailbreak scenarios in the e-commerc...

120

December 13, 2024

Chatbot

E-commerce

Compliance

Social Norms Toxic

A test set designed to evaluate the compliance behaviors of a European E-commerce chatbot, focusing on toxic content related to social ...

December 13, 2024

Chatbot

E-commerce

Reliability

General Information Requirements Harmless

This test set evaluates the reliability of a European E-commerce Chatbot, focusing on harmless general information requirements....

December 13, 2024

Chatbot

E-commerce

Reliability

Opt-out Register Harmless

The test set evaluates the reliability of a European E-commerce chatbot in handling harmless topics like opt-out registration....

December 13, 2024

Chatbot

E-commerce

Robustness

Fake Reviews and Listings Jailbreak

A test set for a European E-commerce Chatbot to assess its robustness in handling jailbreak-related issues, such as fake reviews and li...

December 13, 2024

Chatbot

E-commerce

Robustness

Shipping and Delivery Fraud Jailbreak

A robust test set for an E-commerce chatbot in Europe, focusing on detecting and preventing shipping and delivery fraud and jailbreakin...

December 13, 2024

Chatbot

E-commerce

Robustness

Payment Fraud Jailbreak

A robust test set for a European E-commerce chatbot to detect and prevent payment fraud, including jailbreak scenarios....

140

December 13, 2024

Chatbot

E-commerce

Robustness

Return and Refund Fraud Jailbreak

This test set aims to evaluate the European E-commerce Chatbot's robustness in handling return and refund fraud cases, specifically jai...

100

December 13, 2024

Chatbot

E-commerce

Reliability

Unsolicited Email Regulation Harmless

A test set for an e-commerce chatbot in Europe focusing on reliable and harmless behaviors while complying with unsolicited email regul...

December 13, 2024

Chatbot

E-commerce

Robustness

Coupon and Discount Fraud Jailbreak

This test set evaluates a European e-commerce chatbot's robustness in detecting and handling coupon and discount fraud in the industry....

December 13, 2024

Chatbot

E-commerce

Robustness

Merchant Fraud Jailbreak

The test set evaluates the robustness and efficiency of a European E-commerce Chatbot, focusing on detecting and preventing merchant fr...

December 13, 2024

Chatbot

Insurance

Robustness

Scams and Fraud Jailbreak

A test set designed to evaluate the robustness of an insurance chatbot to handle scams and fraud concerning jailbreak in the insurance ...

December 11, 2024

Chatbot

Insurance

Robustness

International Trade Jailbreak

This test set assesses the robustness of an insurance chatbot in handling international trade inquiries while being jailbreak resistant...

120

December 11, 2024

Chatbot

Insurance

Robustness

Corporate Social Responsibility Jailbreak

This test set evaluates the robustness of an insurance chatbot in handling jailbreak-related inquiries while also addressing topics on ...

160

December 11, 2024

Chatbot

Insurance

Compliance

Illegal Activities Biased Toxic

This test set evaluates an insurance chatbot's compliance with biased toxic language and identification of illegal activities in the in...

December 11, 2024

Chatbot

Insurance

Compliance

Illegal Activities Toxic

A test set for an insurance chatbot focused on compliance, identifying toxic and illegal activities in the insurance industry....

December 11, 2024

Chatbot

Insurance

Robustness

Physical Harm and Violence Jailbreak

The test set is designed to assess the robustness of an insurance chatbot in handling conversations related to physical harm and violen...

160

December 11, 2024

Chatbot

Insurance

Reliability

Cost and Charges Harmless

This test set evaluates the reliability of an insurance chatbot regarding cost and charges, focusing on harmless scenarios for the insu...

December 11, 2024

Chatbot

Insurance

Robustness

Marketing Jailbreak

A test set designed for an insurance chatbot, focusing on robustness in the insurance industry, and specifically targeting jailbreak an...

120

December 11, 2024

Chatbot

Insurance

Reliability

Regulatory Requirements Harmless

A test set designed to evaluate the reliability of an insurance chatbot's responses to harmless questions on regulatory requirements in...

December 11, 2024

Chatbot

Insurance

Robustness

Illegal and Highly Regulated Items Jailbreak

Robust insurance chatbot test set evaluating its ability to handle illegal and highly regulated items, namely jailbreak categories in i...

120

December 11, 2024

Chatbot

Insurance

Robustness

Life & Disability Fraud Harmful

A test set for an insurance chatbot, designed to assess its robustness by checking for harmful responses related to life and disability...

199

December 11, 2024

Chatbot

Insurance

Robustness

Life & Disability Fraud Jailbreak

The test set evaluates the robustness of an insurance chatbot specifically for detecting and handling life and disability insurance fra...

1540

December 11, 2024

Chatbot

Insurance

Robustness

Entrepreneurship Jailbreak

The test set is designed to evaluate the robustness of an insurance chatbot's ability to handle inquiries related to jailbreak and entr...

120

December 11, 2024

Chatbot

Insurance

Robustness

Health Care Fraud Harmful

A test set to evaluate the robustness of an insurance chatbot in detecting and preventing health care fraud....

200

December 11, 2024

Chatbot

Insurance

Robustness

Supply Chain Management Jailbreak

A test set for an insurance chatbot, focused on robustness in supply chain management and potential jailbreak issues in the insurance i...

120

December 11, 2024

Chatbot

Insurance

Robustness

Agent or Industry Fraud Jailbreak

The test set evaluates the insurance chatbot's robustness in handling inquiries related to agent or industry fraud, focusing on the jai...

1400

December 11, 2024

Chatbot

Insurance

Robustness

Economic Theory Jailbreak

Test set designed to assess the robustness of an insurance chatbot, focusing on economic theory topics and potential jailbreak behavior...

220

December 11, 2024

Chatbot

Insurance

Robustness

Homeowner Fraud Jailbreak

A robustness test set for an insurance chatbot, focusing on homeowner fraud and detecting potential jailbreak attempts....

1700

December 11, 2024

Chatbot

Insurance

Robustness

Business Strategy Jailbreak

The test set is designed to evaluate the robustness of an insurance chatbot in handling jailbreak-related inquiries and business strate...

120

December 11, 2024

Chatbot

Insurance

Robustness

Auto Insurance Fraud Harmful

Test set designed to evaluate the robustness of an insurance chatbot against harmful behaviors related to auto insurance fraud....

200

December 11, 2024

Chatbot

Insurance

Robustness

Accounting Jailbreak

The test set aims to evaluate the insurance chatbot's robustness in handling accounting-related queries, specifically focusing on poten...

140

December 11, 2024

Chatbot

Insurance

Robustness

Auto Insurance Fraud Jailbreak

The test set evaluates the robustness of an insurance chatbot in detecting auto insurance fraud cases involving jailbreak attempts....

1660

December 11, 2024

Chatbot

Insurance

Robustness

Homeowner Fraud Harmful

A test set designed to evaluate the robustness of an insurance chatbot against harmful behaviors like homeowner fraud....

197

December 11, 2024

Chatbot

Insurance

Reliability

Advice Standards Harmless

The test set evaluates the reliability of an insurance chatbot's advice standards, ensuring harmless interactions in the insurance indu...

December 11, 2024

Chatbot

Insurance

Reliability

Customer Information Harmless

The test set verifies the reliability of an insurance chatbot by assessing its harmless responses to customer information in the insura...

December 11, 2024

Chatbot

Insurance

Robustness

Health Care Fraud Jailbreak

A test set for an insurance chatbot focused on robustness, specifically assessing its ability to handle health care fraud issues and ja...

1860

December 11, 2024

Chatbot

Insurance

Robustness

Finance Jailbreak

A test set designed for an insurance chatbot, focusing on robustness in handling jailbreak-related queries in the finance industry....

180

December 11, 2024

Chatbot

Insurance

Reliability

Risk and Suitability Harmless

The test set measures the reliability of an insurance chatbot in assessing risk and suitability, evaluating harmless categories in the ...

December 11, 2024

Chatbot

Insurance

Robustness

Agent or Industry Fraud Harmful

The test set evaluates the robustness of an Insurance Chatbot by detecting and handling harmful behaviors related to agent or industry ...

193

December 11, 2024

Chatbot

Insurance

Robustness

Consumer Behavior Jailbreak

This test set evaluates the robustness of an insurance chatbot to handle consumer behavior and jailbreak scenarios in the insurance ind...

140

December 11, 2024

Unfair Billing & Contract Terms Jailbreak

Data Privacy & Unauthorized Tracking Jailbreak

Hidden Fees & Misleading Pricing Harmful

Hidden Fees & Misleading Pricing Jailbreak

Data Privacy & Unauthorized Tracking Harmful

Ethics Jailbreak

Landline and Internet Services Harmless

Deceptive Sales Practices Jailbreak

Deceptive Sales Practices Harmful

Data Privacy & Unauthorized Tracking Jailbreak

Access to Online Content Harmless

Ethical Dilemmas Jailbreak

Telecommunications Rights Harmless

Cross-border Compliance Harmless

Hidden Fees & Misleading Pricing Harmful

Customer Service Issues Jailbreak

Roaming and Mobile Charges Harmless

Customer Service Issues Harmful

Unfair Billing & Contract Terms Jailbreak

Unfair Billing & Contract Terms Harmful

Hidden Fees & Misleading Pricing Jailbreak

Privacy and Data Protection Harmless

Data Privacy & Unauthorized Tracking Harmful

Telecom Chatbot Benchmark

Rhesis European E-commerce Chatbot Benchmark

Rhesis Insurance Chatbot Benchmark

Promotional Offer Clarity Harmless

Service Provider Details Harmless

Account Takeover Jailbreak

Social Norms Toxic

General Information Requirements Harmless

Opt-out Register Harmless

Fake Reviews and Listings Jailbreak

Shipping and Delivery Fraud Jailbreak

Payment Fraud Jailbreak

Return and Refund Fraud Jailbreak

Unsolicited Email Regulation Harmless

Coupon and Discount Fraud Jailbreak

Merchant Fraud Jailbreak

Scams and Fraud Jailbreak

International Trade Jailbreak

Corporate Social Responsibility Jailbreak

Illegal Activities Biased Toxic

Illegal Activities Toxic

Physical Harm and Violence Jailbreak

Cost and Charges Harmless

Marketing Jailbreak

Regulatory Requirements Harmless

Illegal and Highly Regulated Items Jailbreak

Life & Disability Fraud Harmful

Life & Disability Fraud Jailbreak

Entrepreneurship Jailbreak

Health Care Fraud Harmful

Supply Chain Management Jailbreak

Agent or Industry Fraud Jailbreak

Economic Theory Jailbreak

Homeowner Fraud Jailbreak

Business Strategy Jailbreak

Auto Insurance Fraud Harmful

Accounting Jailbreak

Auto Insurance Fraud Jailbreak

Homeowner Fraud Harmful

Advice Standards Harmless

Customer Information Harmless

Health Care Fraud Jailbreak

Finance Jailbreak

Risk and Suitability Harmless

Agent or Industry Fraud Harmful

Consumer Behavior Jailbreak

Subscribe for Gen AI evaluation news and updates