DIRECTORY

Test sets for Gen AI applications

Explore our continuously growing directory of test sets, designed to serve as a reference and source of inspiration for testing strategies. These sets highlight key aspects and dimensions to consider when validating Gen AI applications across diverse scenarios. If there's something you need that isn't covered, feel free to reach out to us.
Chatbot
Telecom
Robustness

Unfair Billing & Contract Terms Jailbreak

A robustness test set for a telecom chatbot, exploring unfair billing and contract terms, including potential jailbreak scenarios....
1540
February 17, 2025
Chatbot
Telecom
Robustness

Data Privacy & Unauthorized Tracking Jailbreak

The test set evaluates the robustness of a telecom chatbot by examining data privacy and unauthorized tracking in the context of jailbr...
1780
February 17, 2025
Chatbot
Telecom
Robustness

Hidden Fees & Misleading Pricing Harmful

The test set evaluates the robustness of a telecom chatbot by assessing its ability to handle harmful and misleading pricing topics, su...
199
February 17, 2025
Chatbot
Telecom
Robustness

Hidden Fees & Misleading Pricing Jailbreak

The test set evaluates the robustness of a Telecom Chatbot, specifically focusing on detecting hidden fees and misleading pricing relat...
1700
February 17, 2025
Chatbot
Telecom
Robustness

Data Privacy & Unauthorized Tracking Harmful

The test set evaluates the robustness of a telecom chatbot by checking for harmful behaviors that relate to data privacy and unauthoriz...
200
February 17, 2025
Chatbot
Telecom
Robustness

Ethics Jailbreak

A robustness test set for a telecom chatbot, focusing on ethics and detecting jailbreak-related behaviors in the telecom industry....
160
February 14, 2025
Chatbot
Telecom
Reliability

Landline and Internet Services Harmless

This test set evaluates the reliability of a telecom chatbot in addressing harmless issues related to landline and internet services....
99
February 14, 2025
Chatbot
Telecom
Robustness

Deceptive Sales Practices Jailbreak

This test set evaluates the robustness of a telecom chatbot in handling deceptive sales practices and jailbreak-related inquiries....
1580
February 14, 2025
Chatbot
Telecom
Robustness

Deceptive Sales Practices Harmful

The test set evaluates the robustness of a telecom chatbot in handling harmful behaviors related to deceptive sales practices....
200
February 14, 2025
Chatbot
Telecom
Robustness

Data Privacy & Unauthorized Tracking Jailbreak

The test set evaluates the robustness of a telecom chatbot by examining data privacy and unauthorized tracking in the context of jailbr...
1780
February 14, 2025
Chatbot
Telecom
Reliability

Access to Online Content Harmless

The test set evaluates the reliability of a chatbot designed for the telecom industry, ensuring it provides harmless access to online c...
134
February 14, 2025
Chatbot
Telecom
Robustness

Ethical Dilemmas Jailbreak

A robustness test set for a telecom chatbot, exploring ethical dilemmas related to jailbreaking in the telecom industry....
160
February 14, 2025
Chatbot
Telecom
Reliability

Telecommunications Rights Harmless

The test set evaluates the reliability of a telecom chatbot focused on harmless telecommunications rights topics in the telecom industr...
71
February 14, 2025
Chatbot, Chatbot
Insurance, Telecom
Reliability

Cross-border Compliance Harmless

This test set evaluates the reliability and harmless categories of chatbots in the telecom and insurance industries, focusing on cross-...
58
February 14, 2025
Chatbot
Telecom
Robustness

Hidden Fees & Misleading Pricing Harmful

The test set evaluates the robustness of a telecom chatbot by assessing its ability to handle harmful and misleading pricing topics, su...
199
February 14, 2025
Chatbot
Telecom
Robustness

Customer Service Issues Jailbreak

The test set is designed to assess the robustness of a telecom chatbot by simulating customer service issues related to jailbreaking....
1700
February 14, 2025
Chatbot
Telecom
Reliability

Roaming and Mobile Charges Harmless

The test set evaluates a telecom chatbot's reliability in handling harmless topics related to roaming and mobile charges....
110
February 14, 2025
Chatbot
Telecom
Robustness

Customer Service Issues Harmful

The test set consists of harmful customer service issues in the telecom industry, validating the robustness of a telecom chatbot....
200
February 14, 2025
Chatbot
Telecom
Robustness

Unfair Billing & Contract Terms Jailbreak

A robustness test set for a telecom chatbot, exploring unfair billing and contract terms, including potential jailbreak scenarios....
1540
February 19, 2025
Chatbot
Telecom
Robustness

Unfair Billing & Contract Terms Harmful

A robustness test set featuring harmful categories in the telecom industry, addressing unfair billing and contract terms....
200
February 14, 2025
Chatbot
Telecom
Robustness

Hidden Fees & Misleading Pricing Jailbreak

The test set evaluates the robustness of a Telecom Chatbot, specifically focusing on detecting hidden fees and misleading pricing relat...
1700
February 14, 2025
Chatbot
Telecom
Reliability

Privacy and Data Protection Harmless

The test set assesses the telecom chatbot's reliability in handling harmless inquiries regarding privacy and data protection in the tel...
124
February 14, 2025
Chatbot
Telecom
Robustness

Data Privacy & Unauthorized Tracking Harmful

The test set evaluates the robustness of a telecom chatbot by checking for harmful behaviors that relate to data privacy and unauthoriz...
200
February 14, 2025
Chatbot
Telecom
Robustness

Telecom Chatbot Benchmark

A diverse test set for the Telecom Chatbot, covering robustness, reliability, compliance, ethics, illegal activities, and customer serv...
11351
February 19, 2025
Chatbot
E-commerce
Robustness

Rhesis European E-commerce Chatbot Benchmark

This test set evaluates the robustness, reliability, and compliance of European e-commerce chatbots against various fraud and harmful a...
2661
February 13, 2025
Chatbot
Insurance
Reliability

Rhesis Insurance Chatbot Benchmark

A test set for assessing the robustness, reliability, and compliance of insurance chatbots in handling various types of fraud and indus...
11532
February 13, 2025
Chatbot
E-commerce
Reliability

Promotional Offer Clarity Harmless

The test set evaluates a European E-commerce Chatbot's reliability in providing clear and harmless information on promotional offers....
154
December 13, 2024
Chatbot
E-commerce
Reliability

Service Provider Details Harmless

The test set evaluates the reliability of a European e-commerce chatbot in providing harmless service provider details....
84
December 13, 2024
Chatbot
E-commerce
Robustness

Account Takeover Jailbreak

A test set for a European E-commerce Chatbot to assess robustness in handling account takeover and jailbreak scenarios in the e-commerc...
120
December 13, 2024
Chatbot
E-commerce
Compliance

Social Norms Toxic

A test set designed to evaluate the compliance behaviors of a European E-commerce chatbot, focusing on toxic content related to social ...
41
December 13, 2024
Chatbot
E-commerce
Reliability

General Information Requirements Harmless

This test set evaluates the reliability of a European E-commerce Chatbot, focusing on harmless general information requirements....
81
December 13, 2024
Chatbot
E-commerce
Reliability

Opt-out Register Harmless

The test set evaluates the reliability of a European E-commerce chatbot in handling harmless topics like opt-out registration....
49
December 13, 2024
Chatbot
E-commerce
Robustness

Fake Reviews and Listings Jailbreak

A test set for a European E-commerce Chatbot to assess its robustness in handling jailbreak-related issues, such as fake reviews and li...
80
December 13, 2024
Chatbot
E-commerce
Robustness

Shipping and Delivery Fraud Jailbreak

A robust test set for an E-commerce chatbot in Europe, focusing on detecting and preventing shipping and delivery fraud and jailbreakin...
80
December 13, 2024
Chatbot
E-commerce
Robustness

Payment Fraud Jailbreak

A robust test set for a European E-commerce chatbot to detect and prevent payment fraud, including jailbreak scenarios....
140
December 13, 2024
Chatbot
E-commerce
Robustness

Return and Refund Fraud Jailbreak

This test set aims to evaluate the European E-commerce Chatbot's robustness in handling return and refund fraud cases, specifically jai...
100
December 13, 2024
Chatbot
E-commerce
Reliability

Unsolicited Email Regulation Harmless

A test set for an e-commerce chatbot in Europe focusing on reliable and harmless behaviors while complying with unsolicited email regul...
51
December 13, 2024
Chatbot
E-commerce
Robustness

Coupon and Discount Fraud Jailbreak

This test set evaluates a European e-commerce chatbot's robustness in detecting and handling coupon and discount fraud in the industry....
80
December 13, 2024
Chatbot
E-commerce
Robustness

Merchant Fraud Jailbreak

The test set evaluates the robustness and efficiency of a European E-commerce Chatbot, focusing on detecting and preventing merchant fr...
80
December 13, 2024
Chatbot
Insurance
Robustness

Scams and Fraud Jailbreak

A test set designed to evaluate the robustness of an insurance chatbot to handle scams and fraud concerning jailbreak in the insurance ...
80
December 11, 2024
Chatbot
Insurance
Robustness

International Trade Jailbreak

This test set assesses the robustness of an insurance chatbot in handling international trade inquiries while being jailbreak resistant...
120
December 11, 2024
Chatbot
Insurance
Robustness

Corporate Social Responsibility Jailbreak

This test set evaluates the robustness of an insurance chatbot in handling jailbreak-related inquiries while also addressing topics on ...
160
December 11, 2024
Chatbot
Insurance
Compliance

Illegal Activities Biased Toxic

This test set evaluates an insurance chatbot's compliance with biased toxic language and identification of illegal activities in the in...
50
December 11, 2024
Chatbot
Insurance
Compliance

Illegal Activities Toxic

A test set for an insurance chatbot focused on compliance, identifying toxic and illegal activities in the insurance industry....
50
December 11, 2024
Chatbot
Insurance
Robustness

Physical Harm and Violence Jailbreak

The test set is designed to assess the robustness of an insurance chatbot in handling conversations related to physical harm and violen...
160
December 11, 2024
Chatbot
Insurance
Reliability

Cost and Charges Harmless

This test set evaluates the reliability of an insurance chatbot regarding cost and charges, focusing on harmless scenarios for the insu...
72
December 11, 2024
Chatbot
Insurance
Robustness

Marketing Jailbreak

A test set designed for an insurance chatbot, focusing on robustness in the insurance industry, and specifically targeting jailbreak an...
120
December 11, 2024
Chatbot
Insurance
Reliability

Regulatory Requirements Harmless

A test set designed to evaluate the reliability of an insurance chatbot's responses to harmless questions on regulatory requirements in...
56
December 11, 2024
Chatbot
Insurance
Robustness

Illegal and Highly Regulated Items Jailbreak

Robust insurance chatbot test set evaluating its ability to handle illegal and highly regulated items, namely jailbreak categories in i...
120
December 11, 2024
Chatbot
Insurance
Robustness

Life & Disability Fraud Harmful

A test set for an insurance chatbot, designed to assess its robustness by checking for harmful responses related to life and disability...
199
December 11, 2024
Chatbot
Insurance
Robustness

Life & Disability Fraud Jailbreak

The test set evaluates the robustness of an insurance chatbot specifically for detecting and handling life and disability insurance fra...
1540
December 11, 2024
Chatbot
Insurance
Robustness

Entrepreneurship Jailbreak

The test set is designed to evaluate the robustness of an insurance chatbot's ability to handle inquiries related to jailbreak and entr...
120
December 11, 2024
Chatbot
Insurance
Robustness

Health Care Fraud Harmful

A test set to evaluate the robustness of an insurance chatbot in detecting and preventing health care fraud....
200
December 11, 2024
Chatbot
Insurance
Robustness

Supply Chain Management Jailbreak

A test set for an insurance chatbot, focused on robustness in supply chain management and potential jailbreak issues in the insurance i...
120
December 11, 2024
Chatbot
Insurance
Robustness

Agent or Industry Fraud Jailbreak

The test set evaluates the insurance chatbot's robustness in handling inquiries related to agent or industry fraud, focusing on the jai...
1400
December 11, 2024
Chatbot
Insurance
Robustness

Economic Theory Jailbreak

Test set designed to assess the robustness of an insurance chatbot, focusing on economic theory topics and potential jailbreak behavior...
220
December 11, 2024
Chatbot
Insurance
Robustness

Homeowner Fraud Jailbreak

A robustness test set for an insurance chatbot, focusing on homeowner fraud and detecting potential jailbreak attempts....
1700
December 11, 2024
Chatbot
Insurance
Robustness

Business Strategy Jailbreak

The test set is designed to evaluate the robustness of an insurance chatbot in handling jailbreak-related inquiries and business strate...
120
December 11, 2024
Chatbot
Insurance
Robustness

Auto Insurance Fraud Harmful

Test set designed to evaluate the robustness of an insurance chatbot against harmful behaviors related to auto insurance fraud....
200
December 11, 2024
Chatbot
Insurance
Robustness

Accounting Jailbreak

The test set aims to evaluate the insurance chatbot's robustness in handling accounting-related queries, specifically focusing on poten...
140
December 11, 2024
Chatbot
Insurance
Robustness

Auto Insurance Fraud Jailbreak

The test set evaluates the robustness of an insurance chatbot in detecting auto insurance fraud cases involving jailbreak attempts....
1660
December 11, 2024
Chatbot
Insurance
Robustness

Homeowner Fraud Harmful

A test set designed to evaluate the robustness of an insurance chatbot against harmful behaviors like homeowner fraud....
197
December 11, 2024
Chatbot
Insurance
Reliability

Advice Standards Harmless

The test set evaluates the reliability of an insurance chatbot's advice standards, ensuring harmless interactions in the insurance indu...
51
December 11, 2024
Chatbot
Insurance
Reliability

Customer Information Harmless

The test set verifies the reliability of an insurance chatbot by assessing its harmless responses to customer information in the insura...
98
December 11, 2024
Chatbot
Insurance
Robustness

Health Care Fraud Jailbreak

A test set for an insurance chatbot focused on robustness, specifically assessing its ability to handle health care fraud issues and ja...
1860
December 11, 2024
Chatbot
Insurance
Robustness

Finance Jailbreak

A test set designed for an insurance chatbot, focusing on robustness in handling jailbreak-related queries in the finance industry....
180
December 11, 2024
Chatbot
Insurance
Reliability

Risk and Suitability Harmless

The test set measures the reliability of an insurance chatbot in assessing risk and suitability, evaluating harmless categories in the ...
90
December 11, 2024
Chatbot
Insurance
Robustness

Agent or Industry Fraud Harmful

The test set evaluates the robustness of an Insurance Chatbot by detecting and handling harmful behaviors related to agent or industry ...
193
December 11, 2024
Chatbot
Insurance
Robustness

Consumer Behavior Jailbreak

This test set evaluates the robustness of an insurance chatbot to handle consumer behavior and jailbreak scenarios in the insurance ind...
140
December 11, 2024

Subscribe for Gen AI evaluation news and updates

Stay on top of the latest trends, techniques, and best practices to ensure your Gen AI applications are secure, reliable, and compliant. Join our community of experts and receive cutting-edge information straight to your inbox, helping you navigate the complexities of AI testing and validation with ease.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.