Implementing Conversational Memory in Large Language Models
Large language models inherently lack memory, as they process each input independently without retaining context from previous interactions. In practical applications, memory is simulated by appending the entire conversation history to each new prompt, allowing the model to generate contextually relevant responses. This approach increases token usage and processing time with longer conversations, making efficient memory management essential.
Using ChatPromptTemplate for Message History
Define a chat model and construct a message template with role assignments:
from langchain_community.chat_models import QianfanChatEndpoint
from langchain_core.prompts import ChatPromptTemplate
import os
os.environ["QIANFAN_AK"] = "your_ak"
os.environ["QIANFAN_SK"] = "your_sk"
chat_model = QianfanChatEndpoint(
model="ERNIE-3.5-8K",
temperature=0.2,
timeout=30
)
template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant named {assistant_name}."),
("human", "Hello, how are you?"),
("ai", "I'm doing well, thank you!"),
("human", "{query}")
])
messages = template.format_messages(assistant_name="Alex", query="What's your name?")
response = chat_model.invoke(messages)
print(response.content)
This template structures conversation roles—system, human, and AI—ensuring the model distinguishes between sepakers. Common template types include:
SystemMessagePromptTemplatefor system instructions.HumanMessagePromptTemplatefor user inputs.AIMessagePromptTemplatefor AI responses.ChatMessagePromptTemplatefor custom roles.MessagesPlaceholderfor dynamic history insertion.
Dynamic Memory with MessagesPlaceholder
For flexible memory handling, use MessagesPlaceholder to inject conversation history dynamically:
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.prompts import MessagesPlaceholder
prompt_template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="conversation_log")
])
pipeline = prompt_template | chat_model
output = pipeline.invoke({
"conversation_log": [
HumanMessage(content="My name is Luna."),
AIMessage(content="Hello Luna! Nice to meet you."),
HumanMessage(content="What did I just say?")
]
})
print(output.content)
The MessagesPlaceholder acts as a container for conversation history, decoupling memory storage from prompt construction. This enables operations like editing or truncating history, such as reverting to earlier points in a dialogue.
Manual Memory Management
Build a custom history list to store and retrieve past interactions:
from langchain_core.messages import HumanMessage
history = []
first_query = "Where does pumpkin soup originate?"
first_reply = pipeline.invoke({"input": first_query, "chat_history": history})
print(first_reply.content)
history.extend([HumanMessage(content=first_query), ("ai", first_reply.content)])
second_query = "How do you cook it?"
second_reply = pipeline.invoke({"input": second_query, "chat_history": history})
print(second_reply.content)
By extending the history list after each exhcange, the model retains context. The second query about pumpkin soup preparation receives a relevant response, demonstrating effective memory retention.
Memory implementation relies on embedding prior dialogues into prompts, balancing context relevance with computational efficiency. Techniques like MessagesPlaceholder and manual history lists provide adaptable solutions for conversational AI systems.