Cisco Research

HomeOur TeamPublications
Contact us
Cisco Logo
  • Help
  • Cookies
  • Terms & Conditions
  • Trademarks
  • Contact

© 2026 Cisco Systems, Inc.

    Cisco Research

    HomeOur TeamPublications
    Quantum Research
    About Quantum LabsCisco Universal Quantum SwitchQuantum Publications
    Outshift Quantum Blogs
    Explore Quantum in action

    Research topics

    Quantum Networking

    Quantum Digital Twin

    Quantum Security

    Quantum Data CenterQuantum Resistence

    Research topics

    Security for AIAI for security
    FlameResponsible AIModelSmithMultiWorld
    BlazeDeep VisionLionPolygraph LLM

    Learn more

    Research FundingOpen RFPs
    Contact us
    powered byoutshift

    PolygraphLLM

    Open-Source Hallucination Detection & Factuality Evaluation Toolkit for LLMs

    What is polygraphLLM?

    PolygraphLLM is an open-source toolkit designed to detect hallucinations and evaluate the factuality of outputs from Large Language Models (LLMs). Hallucinations are plausible-sounding but incorrect or fabricated statements generated by LLMs, posing risks in high-stakes applications such as healthcare, law, and finance. This toolkit offers state-of-the-art methods to benchmark, visualize, and improve the reliability of LLM-generated content.

    Why Hallucination Detection & Factuality Evaluation Matter

    Financial and Reputational Liability

    Air Canada faced legal penalties when its chatbot hallucinated a refund policy. Financial services risk regulatory fines from incorrect LLM analyses. The 2024 LM-Polygraph benchmark found hallucination rates of 3–10% in critical domains—equivalent to thousands of high-risk errors monthly at enterprise scale.

    Regulatory Compliance Imperatives

    With regulations like the EU AI Act and SEC guidance, organizations must demonstrate hallucination mitigation. polygraphLLM helps meet these mandates through robust detection and evaluation capabilities.

    Erosion of Public Trust

    Survey data shows over 60% of consumers distrust AI due to hallucination risks. Incidents like the 2023 Bing chatbot missteps have tangible reputational impact, increasing user churn and undermining adoption.

    Operational Efficiency Demands

    Verifying outputs manually is costly—estimated at $27.50/hour. polygraphLLM helps reduce these overheads, enabling scalable and trusted LLM deployment.

    How Does polygraphLLM Work?

    • Implements both black-box and white-box hallucination detection methods.
    • Supports diverse NLP tasks: QA, summarization, translation, and more.
    • Provides a unified Python API and interactive visualization dashboard.
    • Includes Semantic Nearest Neighbor Entropy (SNNE) and Weighted SNNE (WSNNE) for robust semantic uncertainty quantification.
    pip install polygraphllm

    from polygraph import SemanticValidator

    validator = SemanticValidator (model='gpt-4')
    results = validator. detect_hallucinations (prompt, generations)

    Related Paper

    Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity Dang Nguyen (UCLA), Ali Payani (Cisco Systems), Baharan Mirzasoleiman (UCLA)

    This paper introduces SNNE, a method that generalizes semantic entropy by using pairwise semantic similarity between generations. It significantly improves hallucination detection in LLMs and can be extended to white-box models.

    Demo

    Learn More & Get Involved

    • GitHub: https://github.com/cisco-open/polygraphLLM
    • Research Paper: Beyond Semantic Entropy on arXiv