Great write-up! I have found similar gains myself, and have also faced some of the shortcomings.
Having the LLM write tests can be a time saver, but I sometimes find that while it writes descriptions for what would be impactful tests (it(‘does thing’, …)), sometimes the test steps themselves don’t actually test the thing being described, so I’ve started adding the instruction to have the agent make sure the steps within the test case actually test what the description statement claims to.
Agreed. Tests are a place where Claude doesn't always shine, I think in part because it doesn't have enough context to know what a useful test would be without more direction, and partly because it's too eager to get tests working. I've had it write tests that did nothing and just asserted `true`! So it's a good point; you really have to pay careful attention to the tests it writes lest you get ones that test nothing!
Great write-up! I have found similar gains myself, and have also faced some of the shortcomings.
Having the LLM write tests can be a time saver, but I sometimes find that while it writes descriptions for what would be impactful tests (it(‘does thing’, …)), sometimes the test steps themselves don’t actually test the thing being described, so I’ve started adding the instruction to have the agent make sure the steps within the test case actually test what the description statement claims to.
Agreed. Tests are a place where Claude doesn't always shine, I think in part because it doesn't have enough context to know what a useful test would be without more direction, and partly because it's too eager to get tests working. I've had it write tests that did nothing and just asserted `true`! So it's a good point; you really have to pay careful attention to the tests it writes lest you get ones that test nothing!