While debugging Papersy in chat mode, I discovered that the reason function calling wasn't working properly was the returning result.messages of LangChain. The last message that is broken always has content = ''. Checking its additional_kwargs.reasoning_content, it showed something like:
I can see from the retrieved content that [...]. Let me search for more specific mentions.
<tool_call>
<function=retrieve>
<parameter=query>
FID in this paper
</parameter>
</function>
</tool_call>
Note that LangChain ignores additional_kwargs.
The function calling was happening inside <think> block. Sometimes in XML, sometimes in JSON, sometimes not at all. I suspected llama.cpp was not supporting my model, Nemotron 3 Nano, very well, so I swapped out for Qwen3.5 instead. Preliminary tests showed that it function called normally, multiple times even (because I forgot to switch on embedding model for vector search on Qwen run).
I will test again with Qwen model and embedding on this time. Hopefully this is resolved.