How to reduce hallucinations and improve accuracy of LLMs?

Introduction

Suppose we have written a blog or a piece of code. Any creative work need not be perfect on the first attempt. We will iterate multiple times to come up with finished work. To iterate and do multiple rounds of modifications, a critical part is feedback. Feedback can come from various sources—people, self-reflection, tools, etc.

Like humans, LLMs also need feedback to improve their performance. We can use another LLM to review, reflect and provide feedback on the work of the first LLM. This will significantly enhance the quality of the output and ensure that we get the most out of LLMs. This is termed as Eval in agentic workflows. Feedback is one of the way to perform eval.

Architecture

Suppose we want to write an essay on the topic - “Impact of AI on Software Development.” Initially, we will ask the Essay Writer model to create the first draft of the essay.

Prompt:

Write an essay on "Impact of AI on Software Development" for Software Developers.

Response:

"title":"The Impact of AI on Software Development",
"content":"Artificial Intelligence (AI) has profoundly transformed the landscape of software development, introducing unprecedented advancements and challenges...

To improve on this draft we will ask Essay Reviewer model to give constructive feedback on the essay.

Prompt:

You are a critical reviewer. 
Review the following essay and provide feedback: {draft_essay}
For your information, target audience is young programmers.

Response:

The essay titled "The Impact of AI on Software Development" is well-structured and provides a comprehensive overview of how AI is transforming the software development field, from automation and enhanced decision-making to collaboration and quality assurance. It also thoughtfully addresses potential challenges such as ethics and workforce impact.

To improve the essay's accessibility, especially for young programmers, it could benefit from incorporating more engaging examples and real-world use cases of AI tools commonly used in software development. Simplifying technical terms like "predictive analytics" and "AI governance" with clear explanations will help ensure better understanding.

Adding a brief call to action or practical advice aimed at young programmers—encouraging them to explore AI tools, understand AI fundamentals, and be mindful of ethical issues—would make the essay more engaging and actionable.

Overall, the essay successfully meets the goal but can be enhanced with more personalization and practical guidance tailored to its audience.

Finally, we will provide the review to Essay Writer model to get final version of the essay.

Prompt:

You are a creative writer. Use the following review to improve the essay.

Draft of essay: {draft_essay}
Review: {review}

Response:

"title": "The Impact of AI on Software Development",
"content": "Artificial Intelligence (AI) has profoundly transformed the landscape of software development, ushering in a new era of innovation, efficiency, and creativity. The integration of AI technologies into software development processes has accelerated coding, enhanced testing, and improved deployment, fundamentally changing how developers design and build software...

To compare the quality of the essay with and without review, examine both versions.

As seen from the comparison, the essay after review is more engaging and actionable for young programmers, with additional examples and practical advice tailored to the target audience.

Implementation

Reference implementation of the above architecture can be found here.

Implementation is done using Embabel framework. You can find more details about Embabel framework in my previous blog posts here. We have implemented two agents - Essay Writer and Essay Reviewer. Essay Writer is responsible for writing the essay based on the given topic and target audience. Essay Reviewer is responsible for reviewing the essay and providing constructive feedback to improve the essay. Essay Writer uses gpt-4.1-mini model and Essay Reviewer uses gpt-4.1 model which is trained for evaluation and reasoning tasks.

Essay Writer agent has two actions - writeEssay and improveEssay. writeEssay action is responsible for writing the initial draft of the essay based on the given topic. improveEssay action is responsible for improving the essay based on the feedback provided by Essay Reviewer.

Essay Reviewer agent has one action - reviewEssay. reviewEssay action is responsible for reviewing the essay and providing constructive feedback to improve the essay.

Conclusion

We see that like humans, LLMs also need feedback to improve their performance. By using another LLM for feedback, we can significantly enhance the quality of the output and ensure that we get the most out of LLMs. Like feedback, there are various other ways to perform eval such as using tools, rubrics, LLM as Judge etc. However, central idea is to use another LLM to evaluate the work of the first LLM and provide feedback for improvement.

Introduction#

Architecture#

Implementation#

Conclusion#

Introduction

Architecture

Implementation

Conclusion