This note will be updated periodically.
Quite a few of my colleagues are using dify to build chatbot and tech agent to facilitate the traditional Q&A and customer service jobs. I was do some AI security tests on these bots and systems and found a couple of problems you would be interested if you are doing similar stuff.
No.1 Chatbot Memory Settings.
When testing these bots, as I would like to do with the web application red teaming services, I found a configuration which is quite critical in dify workflow that affect the tests you carry out.

It is the memory setting for the LLM. The setting can be on and off with a further window size sub setting. How does this affect the testing? When switched on without setting a window size, the chatbot will carry all the history prompts from the session to the LLM you use in the work flow. This can severely alter the tests especially when you are doing unit tests. Although the memory itself is essential towards the use of chatbot as a great way to feed contextual data to the underlying LLM.
An example is that when testing the deepseek model, because the history prompt which ask the model to act as a gpt model. The following tests based on the role are all affected as deepseek is mocking gpt.
Next time, when doing chatbot testing, be sure to set a proper value for the memory. As I was doing a blackbox testing, this problem is found through debugging.
Top comments (0)