Hey, engineer here who worked on this.
I understand your concerns. The examples we provided are indeed trivial, but they are just the starting point. Our goal is to leverage LLMs to generate mutants that closely resemble real-world bugs with better context. While traditional mutation tools are excellent, we believe LLMs can bring an additional layer of sophistication and versatility.
As you rightly pointed out, standard mutation tools often lack the context to prioritize issues effectively. We’re currently working on using LLMs to analyze the output of survived mutants to provide better guidance on which issues should be addressed first. This way, an off-by-one error that could potentially cause significant problems is highlighted more prominently during the code review PR process.
As someone who has used mutation testing, I’ve always wondered about the sheer amount of useless mutants being generated. Going through all these mutants manually to improve test cases is quite cumbersome. If we can reduce the number of mutants generated, produce higher quality mutants, and analyze them automatically to highlight weaknesses in the tests during PRs, wouldn’t that be cool? We’re aiming to achieve just that.
Moreover, this approach can theoretically work for any programming language or testing framework, making it a versatile solution across different development environments.
We’re also developing a QA system to more accurately define and identify “higher quality mutants,” as discussed in the research paper here. Our aim is to enhance the overall mutation testing process, making it more efficient and insightful.
Hey, all in all, we want mutation testing to be adopted and widely spread. We really do appreciate the feedback. I hope you try it out as you sound like you know a thing or two about mutation testing.
Thanks again for your perspective.
True