What is RAG
RAG (Retrieval-Augmented Generation) is a technique that extends the capability of a language model by giving it access to external information at the moment of generating a response. Instead of relying exclusively on what was learned during training, the system queries a database before responding and injects the retrieved context alongside the user’s query. The result: the model can respond with specific, up-to-date or private information that was never part of its training data.
The most well-known applications are virtual support assistants that know each customer’s history, chatbots that access real-time updated catalogs, or tutors that remember which concepts they already explained. They all share the same base structure: retrieve relevant context before responding.
How RAG works: embeddings, vectors and retrieval
The process has three steps. First, each piece of information to be stored — a paragraph, a conversation, a document — is converted into a numerical vector using an embeddings model. That vector captures the semantic meaning of the text: phrases with similar meaning produce nearby vectors in multidimensional space.
Second, when a query arrives, the system generates the vector for that query using the same model. Third, it searches the vector database for which stored vectors are closest to the query vector (cosine similarity, typically). The retrieved fragments are injected into the LLM’s context along with the original question. The model responds with access to that additional context.
The vector database does not store plain text. It stores numerical representations of meaning and searches them by geometric proximity. That differentiates it from conventional text search: it finds what is semantically related even if it shares no exact words with the query.
A customer service assistant that knows each user’s ticket history is the most direct example of RAG in production. Each ticket is a vectorized fragment. When the user types, the system retrieves the most relevant tickets and the LLM responds with that available context.
RAG outside the assistant: the 2023 experiment
In 2023, a research team published a paper on generative agents simulating a virtual community. The experiment placed 25 LLM-controlled characters in a The Sims-like environment and let them interact autonomously. The agents formed bonds, organized social events, propagated information among themselves, and reacted to what they knew about their neighbors — all without a predefined script.
Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language.
What made the behavior coherent was an early form of memory: agents stored summaries of their experiences and retrieved them when relevant. The parallel with RAG was direct. In 2023, however, vector databases were not as accessible nor RAG pipelines as mature as today. The current ecosystem — Qdrant, Weaviate, cheap embedding models via API, local LLMs capable of running on consumer hardware — opens that same type of experiment to any developer with modest resources.
IA-Ville starts from that question: if you build real RAG inside a video game and let characters use it to remember, relate, and make decisions, what kind of behavior emerges?
IA-Ville: a town of ten NPCs with real memory
IA-Ville is a social simulator in pixel art built on Godot 4 with C# logic. The game places ten autonomous characters in a town where the physical environment is static but the social world is dynamic. The NPCs have vector memory, configurable personality, relationships that evolve with each interaction, and an external LLM that generates their dialogue and behavior in real time.
The player does not control any character. They interact with the system: talk to NPCs, plant rumors, observe how relationships change as a consequence of their actions and of the characters’ own interactions. The goal is not to win, but to orchestrate social circumstances.
The architecture connects five central systems: the LLM (which generates each response’s text), vector memory (RAG, which provides past context), the relationship graph (which defines how each pair of characters gets along), the rumor system (which propagates information organically), and the simulation scheduler (which decides when and with whom NPCs interact autonomously).
Personality as configuration: the values that define reaction
Each NPC is defined by a JSON file that describes their identity, their history, their current purpose, and a set of scalar values between 0 and 1: happiness, peace of mind, hatred, revenge, charisma, curiosity, sociability, honesty, distrust, among others.
Those values are not decorative. The system uses them to calculate behavior probabilities. An NPC with high curiosity is more likely to go seek the player on their own initiative. One with high sociability generates conversations with other NPCs more frequently. One with low honesty is more likely to share rumors, including those they know are false or unverified.
The values are mutable during the game. A confrontational conversation can reduce a character’s peace of mind. An act of help can raise their happiness. The initial personality is the starting point; the accumulation of experiences modifies it. Without this layer, all NPCs would react the same way to the same stimulus. Value-based configuration is what makes planting a rumor about a distrustful and vengeful character have a radically different effect than planting it about one with high honesty and low accumulated tension.
The vector memory that makes behavior coherent
Without memory, every conversation starts from zero. The NPC that received an insult yesterday responds today as if nothing happened. The player who introduced themselves by name does not exist in the context of the next sentence. The resulting behavior is superficially convincing but incoherent over time.
To understand how vector memory works, it helps to think about how a person searches for information on a web page. An exact word is typed and the engine returns matches: results that contain that term literally. If a single word changes, the results change. The search is rigid because it operates on exact text.
A vector database works differently. It does not search for exact words but for similarity of meaning. You can ask about “fight with the player” and retrieve a memory tagged as “tense argument in the plaza” even though it shares no words with the query. It searches by proximity in a space of meanings, not by literal match.
That difference matters because it is exactly how human memory works. Humans do not store transcriptions. They abstract the world through the five senses: what they see, hear, feel, smell, and touch enters the brain and is compressed into representations. A memory is not a recording; it is a synthesis loaded with emotion, context, and relevance. When something in the present triggers a memory, it does so not by exact word match but by pattern similarity: a smell, a tone of voice, a situation that resembles something experienced.
The construction of IA-Ville’s memory system follows that same logic because the game’s objective is to replicate that mechanism, not simulate it superficially. Each significant interaction — a conversation, a received rumor, an observed event — is converted into an embedding vector and stored in a per-NPC store. Before generating each response, the system searches that store for the most relevant memories to the current message, combining cosine similarity, keyword matching, semantic tag matching, and importance assigned to each memory. The retrieved memories are injected into the LLM’s prompt along with personality, current emotional state, and known rumors.
Each NPC maintains up to 100 memories. When the limit is exceeded, the least important and oldest ones are discarded. The most significant and most recent events are the ones that weigh most in present behavior. Just like in people.
Relationships, rumors and autonomous socialization
The relationship graph stores the connection between each pair of entities across five independent dimensions: trust, affinity, respect, familiarity, and tension. It is not a single number of “good or bad relationship.” An NPC can have high affinity and low trust with another, which produces a recognizable type of bond: they get along with someone they do not fully believe. These dimensions change with each interaction based on the content of the exchange.
Rumors travel through the system organically. When two NPCs meet, there is a probability that one shares information about a third party. Each time a rumor changes hands it loses 20% credibility. The receiving NPC stores it in their memory and can use it in future conversations. A negative rumor about the player can deteriorate their relationship with characters who have never seen them directly.
Autonomous socialization is managed by a scheduler that periodically evaluates, for each free NPC, the probability of seeking another one for conversation. The topic of that conversation is generated by the LLM based on the relationship between the two NPCs, the memories they share, circulating rumors, and the time of day within the game. The two characters generate a dialogue of up to six turns while the player can observe from outside or approach to join.
The game does not include a language model. It is designed to connect to any provider compatible with the OpenAI API: a local model running on LM Studio or a remote provider like OpenRouter. This keeps inference logic under the user’s control and allows running the game without API costs if sufficient hardware is available.
The system’s challenges
Building such a system has concrete frictions. Not all of them are architectural; several are about how the LLM interprets what it is given.
Getting the AI to understand its role and its tools. The language model does not know that a map exists, that there is an A* pathfinding system that moves the character, that it has a vector database with its memories, or that its conversations feed a relationship graph. All of that must be declared to it.
There is a protocol called MCP (Model Context Protocol), developed by Anthropic, that formalizes exactly this problem in other contexts: how to tell an LLM what tools it has available, what it can invoke, and under what conditions. An MCP server exposes capabilities to the model — read files, execute searches, query APIs — and the model learns to use them because the protocol defines their shape, parameters, and purpose in a structured way.
IA-Ville does not use MCP directly, but the problem is the same. The system prompt has to function as that contract: describing to the NPC what tools constitute their existence in the game. Not as a technical manual, but as part of their identity. A* pathfinding is not “a navigation algorithm”; it is the reason the character can go find someone when they decide to. The vector database is not “an embeddings store”; it is their memory. The dialogue tree is not “a state system”; it is the structure within which they have conversations. If the prompt does not translate those systems into terms the model can associate with natural behavior, the LLM generates responses that ignore the environment or contradict what the systems already executed.
Personality sheets: how many emotions are too many. It is tempting to give a character thirty emotions and feelings. The problem is that with thirty values in the prompt, the LLM loses coherence: it does not know which to prioritize, mixes contradictory signals, and produces responses that sound generic instead of specific. Practice shows that fewer values with more weight produce more consistent characters. It is not necessary to model every possible emotion, only those that can actually play a role in the game’s social dynamics: those that generate conflict, those that generate attraction, those that predispose holding grudges or forgiving. The rest is noise the model does not know how to resolve coherently.
The relationship graph and coexistence scores. Abstracting the social world into numbers has a cost: you must decide what level of detail to inject into the prompt without the context becoming unmanageable. If all dimensions of all relationships of all neighbors are included, the prompt grows disproportionately and token spending skyrockets. The solution is filtering: the character only receives information about their most relevant relationships to the current context, not a complete dump of the graph. Defining that filter is a design decision that directly affects how much the model understands about its social situation and how much each response costs.
Bugs the LLM generates. When the model gets confused, the errors are not code errors but interpretation errors. An NPC can respond as if it were the player, mix another character’s name with the speaker’s, ignore a memory that was in the context, or contradict its personality for no apparent reason. These bugs are hard to reproduce because they depend on the exact prompt, the accumulated history, and the model being used. They are not fixed in code; they are fixed by adjusting how the prompt is built, in what order information arrives, and what instructions are added to delimit the character’s role.
The memory tree: what to preserve and what to discard. With a limit of 100 memories per NPC, the system needs to decide which to keep when that cap is exceeded. The current metric combines assigned importance and age, but defining importance is not trivial. A low emotional intensity memory can be critical for the character to maintain coherence in a long-term relationship. A very intense but one-off memory can lose relevance quickly. Finding that balance is ongoing work: there is no universal formula because it depends on what type of narrative emerges in each playthrough.
How the AI understands its own personality. The hardest challenge of the above. A character with high distrust should respond with suspicion even when the player is friendly. One with high curiosity should ask questions, explore topics, go off the main thread. But the LLM, if not instructed precisely, tends to behave in a cordial and cooperative manner by default, which is exactly the opposite of what a complex character requires. Making personality express itself consistently in dialogue, without sounding forced or generated, is the system’s finest prompting problem.
What the system models of human behavior
The combination of configurable personality, vector memory, multidimensional relationship graph, and rumor propagation with degradation replicates the basic mechanisms by which information and perception operate in real communities. Memory conditions reaction: what a character experienced yesterday affects how they respond today. Relationships modulate tone: you do not speak the same way with someone you trust as with an acquaintance with accumulated tension. Information degrades as it passes from mouth to mouth: what arrived as fact can become assumption. Personality determines the probability of action: not everyone reacts the same way to the same stimulus.
This opens possible experiments within the game. Configure an NPC with high revenge and plant the conditions for them to perceive a betrayal. Observe whether negative information about the player propagates organically without further intervention. Measure how many positive interactions are needed to reverse a deteriorated relationship. These are small experiments about social dynamics in a controlled environment, not about real people, but the underlying model is informed by the same principles studied by social psychology.
The game does not pretend to be a scientific simulation. It is an experience: emergent behavior is the content. There is a component of achievements and orchestration goals to incentivize exploration, but the core is observing what each action produces in a system with real memory and relationships.
Project status
IA-Ville is in active development. The systems described in this article operate: vector memory, the relationship graph, rumor propagation, autonomous socialization between NPCs, and characters seeking the player on their own initiative. The prototype works end to end.
The game will be released for download soon. The first public version is not a finished product but a playable lab: an environment where you can observe or manipulate the social dynamics of ten characters with real memory and personality. Development will continue guided by social psychology and group behavior concepts, adjusting system parameters as work progresses.
The field that makes this project possible continues to grow. What in 2023 was an academic experiment with expensive infrastructure is today replicable with a small team, local models, and open-source vector databases. IA-Ville is one more experiment in that direction, with the difference that the results are playable.