Test Case Generation, Automation and Execution Using GenAI

Monday, 25 November 2024

From Concept to Execution

Innovation is a core part of the Prolifics Value Proposition to our customers. As part of our ongoing efforts to leverage generative AI in software testing, our Innovation Centre team set out to build a platform that could automate the generation of manual test cases directly from a web page and subsequently automate the execution of those cases.

Our research is aimed at allowing our team to work with our clients more efficiently and taking advantage of the new technology - we’re aiming to reduce the time and resources required for creating and running tests manually.

Our team explored different ways of capturing functionality – we used APIs to access the OpenAI LLM, via both GPT4 and GPT4 Vision, to provide flexibility in the way information was captured about the application under test.

The initial target source used was straight-forward page source HTML – we opted for this as an simple PoC, with Javascript / Typescript in our sights for future iterations, to support Angular and React implementations that are more common.

The team utilised Llama Index as a framework on which to develop the PoC, on a Python code-base. Initially we were able to design test cases for a simple online calculator for percentages, targeting a simple webform. Input was provided, utilising a prompt to identify a minimum number of test cases and iterate until all the test cases were found and de-duplicated. We identified a number of challenges early on, which we’ll cover below.

Solution

We implemented a layered approach using AI-driven tools for different stages of the test process, now focusing on a more complex (realistic) web application, utilising the internal Prolifics PPM tool for project management and executive reporting.

The solution comprised three main components: chunking and consolidating the source to identify test cases, deduplication of tests and test automation / execution using Selenium. These are dealt with in detail below.

Step 1: Chunking and Consolidating HTML Code

The first hurdle was managing HTML code for larger web pages. Initial attempts to feed entire pages to our LLM exceeded token limits, which capped the volume of data that could be processed at once. We experimented using a vector database for temporary storage, but this idea was ultimately unsuccessful as it returned partial data fragments, leading to incomplete and inaccurate test cases.

For manageable processing within token limits, we began by dividing large HTML sources into overlapping segments. By segmenting page code in this way, we could work within the limitations of our LLM, while maintaining sufficient context for test case generation. The overlapping design ensured that any functionality split between segments could be fully captured, with adjacent segments providing additional context.

Using Llama Index allowed us to store and retrieve the HTML segments efficiently. This process involved storing each chunk as an entry, then querying the LLM to generate relevant test cases from each segment individually.

Step 2: De-Duplication

In early implementations, we encountered high levels of duplication. The AI model generated redundant test cases across overlapping segments of HTML code, which created inefficiencies in the testing process.

While generating manual test cases from each segment, we implemented a chat memory buffer that tracked previously generated cases. The AI was instructed to deduplicate by comparing each new test case with those already stored, ensuring only unique cases were retained.

An example of one of the manual test cases identified by the LLM is shown in the example below:

Step 3: Automating Test Case Execution with Selenium

As we moved into automating the identified tests, generating reliable Selenium code was crucial. This introduced challenges with locators, where precise identification of HTML elements was needed to avoid errors in script execution. This also highlighted the advantage of combining page source code with an image of the relevant web page, to provide the maximum context to our model.

To ensure the accuracy of identified page objects, the team embedded locators directly from the HTML code, ensuring precise interaction with page elements and input fields.

An example of an automated test generated by the LLM is shown below:

Next Steps

We’re already taking the learnings from this PoC and applying them to enhance our test generation framework. We’re adding a UI and the ability to upload pictures and web page source code, as well as an autonomous crawler to identify all elements, objects and metadata for each of the relevant pages within a business process, which will allow tests to be designed and executed across multiple screens.

We’re planning to incorporate the automate handling of authentication and adding a function to test APIs, allowing for complete end-to-end testing for applications that include API and UI components.

In conclusion, by harnessing the power of GenAI, it was possible to build an automation solution that not only saves time but also scales intelligently with application demands. This journey is a testament to AI’s growing role in software testing, where adaptable, self-sustaining solutions are now within reach and getting better all the time.

This is far from the only GenAI solution Prolifics have developed – we are actively working with our customers with GenAI technology in a range of use cases / industries. Please get in touch if you would like a demo of our AI testing capabilities, or advice on how to test GenAI applications, which is the subject of a previous blog.

We’ll post a followup to showcase the improvements to the pack when ready.

Get in touch with us to see how our tailored testing solutions can help you build fast, and reliable GenAI applications.

Jonathan Binks - Head of Delivery
Prolifics Testing UK

Quality Engineering for AI

White Paper

Featured Blog

Featured Case Study

Featured White Paper

GenAI Test Automation Concept