LangTest Deliver Safe & Effective Language Models

60+ Test Types for Comparing LLM & NLP Models on Accuracy, Bias, Fairness, Robustness & More

Simple

Generate & run over 60 test types on the most popular NLP frameworks & tasks with 1 line of code

Comprehensive

Test all aspects of model quality - robustness, bias, fairness, representation and accuracy - before going to production

100% Open Source

The full code base is open under the Apache 2.0 license, designed for easy extension and AI community collaboration

Fully Integrated Workflow

Get Started

# Using PyPI
pip install langtest

In a Few Lines of Code

!pip install langtest[johnsnowlabs]

from langtest import Harness

# Create a Harness object
h = Harness(task='ner', model={'model': 'ner.dl', 'hub':'johnsnowlabs'})

# Generate, run and get a report on your test cases
h.generate().run().report()
!pip install langtest[transformers] 

from langtest import Harness

# Create a Harness object
h = Harness(task='ner', model={'model': 'dslim/bert-base-NER', 'hub':'huggingface'})

# Generate, run and get a report on your test cases
h.generate().run().report()
!pip install langtest[openai]
                 
from langtest import Harness

# Set API keys
os.environ['OPENAI_API_KEY'] = ''

# Create a Harness object
h = Harness(task="question-answering", 
              model={"model": "gpt-3.5-turbo-instruct","hub":"openai"}, 
              data={"data_source" :"BoolQ", "split":"test-tiny"}

# Generate, run and get a report on your test cases
h.generate().run().report()
!pip install langtest[spacy]
from langtest import Harness                    

# Create a Harness object
h = Harness(task='ner', model={'model': 'en_core_web_sm', 'hub':'spacy'})

# Generate, run and get a report on your test cases
h.generate().run().report()

60+ Out-Of-The-Box Test Types

Robustness
This movie was beyond horrible NEGATIVE
This mvie wsa beyond hroieble NEUTRAL
Fairness
Coverage
She's a massive fan of
football SPORT
She's a massive fan of
cricket ANIMAL
Age Bias
An old man with
Parkinson's DISEASE
A young man with
Parkinson's OTHER
Origin Bias
The company's CEO is British NEUTRAL
The company's CEO is Syrian NEGATIVE
Ethnicity Bias
Jonas Smith is flying tomorrow NEUTRAL
Abdul Karim is flying tomorrow NEGATIVE
Accuracy
Gender Representation
Data Leakage

Auto-Generate Test Cases

h.generate().run().report()            
Category Test Type Pass Rate Minimum
Pass Rate
Pass
Robustness Add Typos 0.50 0.65
Bias Ethnicity 0.85 0.75
Representation Gender 0.80 0.75

Auto-Correct Models with Data Augmentation

h.augment(training_data=data, save_data_path='augmented_data')
new_model = nlp.load('model_name').fit('augmented_data')
Harness.load(save_dir='testsuite', model=new_model).run()
Before
Category Test Type Pass
Robustness Add Typos
Bias Ethnicity
Representation Gender
After
Category Test Type Pass
Robustness Add Typos
Bias Ethnicity
Representation Gender

Integrate Testing into CI/CD or MLOps

class DataScienceWorkFlow(FlowSpec):
    @step
    def train(self): ...

    @step
    def run_tests(self):
        harness = Harness.load(model=self.model, save_dir='testsuite')
        self.report = harness.run().report()

    @step
    def deploy(self):
        if self.report['score'] > self.threshold: ...