• Home /Exam Details (QP Included) / Does AI still hallucinate or is it improving?
  • Does AI still hallucinate or is it improving?
    Posted on April 19th, 2025 in Exam Details (QP Included)

    AI Models’ Hallucinations

    • AI models often make up convincing but often inaccurate responses when faced with untrained questions.

    • Google’s ‘AI Overviews’ feature in May 2024 provided bizarre answers, including suggesting adding non-toxic glue to pizza sauce and recommending drinking urine to pass kidney stones.

    ChatGPT’s Hallucinations

    • A 2023 study found that 55% of ChatGPT v3.5’s references were fabricated, while ChatGPT-4 had an 18% improvement.

    • This makes AI models unreliable and limits their applications.

    Defining Reliability

    • Consistency and factuality are two criteria used to evaluate the reliability of an AI model.

    • When an AI model hallucinates, it compromises on factuality by generating an incorrect response and claiming it to be correct.

    Case Study: ChatGPT’s Hallucination

    • OpenAI’s DALL-E, an AI model, generated two more images of a room with no elephants when asked to generate a picture of a room with no elephants.

    • This inaccurate but confident response indicates that the model fails to understand negation, a concept not used in the data used to train generative AI models.

    AI Model Development

    • AI models are developed in two phases: the training phase and the testing phase.

    • In the training phase, the model learns to associate a set of features with the word “elephant.”

    • In the testing phase, the model is provided with inputs not part of its training dataset.

    • AI models don’t understand language the way humans do, leading to factually incorrect outputs.

    AI Model Reliability and Hallucinations

    • AI development and use are growing rapidly, but their reliability is questioned due to hallucinations and benchmarks that can be gamed.

    • AI developers often report model performance using benchmarks that are not foolproof and can be manipulated.

    • The HumanEval benchmark, created by OpenAI, suggests that while a model might perform well on benchmarks, its performance might drop in real-world applications.

    • Despite this, the frequency of hallucination in popular AI models is reducing for common queries due to newer versions being trained with more data.

    • Despite more training data, popular AI models like ChatGPT will not reach a stage where they won’t hallucinate.

    • Shifting how AI models are built and trained could help reduce hallucinations.

    • Techniques such as developing models for specialized tasks, retrieval-augmented generation (RAG), and curriculum learning could help reduce hallucinations.

    • Despite these techniques, none guarantee that hallucinations will be completely eliminated in AI models.

    • Systems that can verify AI-generated outputs, including human oversight, will remain necessary.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

     WBCS Foundation Course Classroom Online 2024 2025 WBCS Preliminary Exam Mock Test WBCS Main Exam Mock Test WBCS Main Language Bengali English Nepali Hindi Descriptive Paper