
GraphRAG method: How it works, advantages and uses cases
With the rapid evolution of artificial intelligence technologies, companies are constantly looking to improve the accuracy and relevance of the responses provided by chatbots and response engines. New approaches are emerging to overcome the limitations of traditional RAG solutions. One of these innovations, developed by Microsoft’s search teams, is GraphRAG.
This method combines retrieval-augmented generation (RAG) with graph-based indexing to answer questions on private text corpora. Unlike conventional RAG methods, which focus on retrieving specific text segments, GraphRAG builds a text index in graph form, enabling both global and query-specific summarisation.
GraphRAG: understranding how it works
Text extraction and chunking
GraphRAG’s first step is to extract source documents, such as articles and reports, and divide them into manageable segments of text, known as “text chunks”.
- For example, a search report of 3,000 tokens can be divided into five chunks of 600 tokens each, with an overlay of 100 tokens to ensure contextual continuity between the chunks.
- For instance, a document on artificial intelligence initiatives from Microsoft could be divided into sections covering different aspects such as partnerships, internal projects and official statements.
Element instances
After chunking the text, each chunk is analysed by a large language model (LLM) to extract the fundamental elements of the text, which include entities, relationships and covariates.
- For example, in a text excerpt on Microsoft and artificial intelligence, entities such as “Microsoft”, “Satya Nadella” and “OpenAI” would be extracted. Relationships could include “Satya Nadella is the CEO of Microsoft” and “Microsoft collaborates with OpenAI”.
Covariates, such as “Announcement made on 15 June 2024”, provide additional contextual information. All these elements are formatted into tuples, such as (“Satya Nadella”, “CEO of”, “Microsoft”, “since 2014”), creating a rich, interconnected database.
Element summary
Once the entities, relationships and covariates have been extracted and formatted as tuples, the method generates summaries for each element. The language model creates concise, informative summaries that capture the main features and interactions of the detected elements.
- For example, for the tuple (“Satya Nadella”, “CEO of”, “Microsoft”, “since 2014”), the summary might be: “Satya Nadella has been Microsoft’s CEO since 2014, playing a key role in the company’s artificial intelligence initiatives.”
For a relationship like “Microsoft collaborates with OpenAI”, the summary might be: “Microsoft and OpenAI are collaborating on the development of advanced artificial intelligence technologies, aiming to integrate these solutions into Microsoft products.”
Graph communities
After summarising the entities, relationships and covariates, GraphRAG constructs a knowledge graph where the nodes represent the entities and the edges represent the relationships between them.
- For example, a node for “Microsoft” will be connected to a node for “Satya Nadella” by an edge indicating that he is the company’s CEO. Similarly, an edge can link “Microsoft” to “OpenAI” to indicate collaboration. Community detection algorithms such as Leiden’s algorithm partition this graph into communities of closely related nodes.
- For example, a community could group together all entities linked to artificial intelligence, including companies such as Microsoft and OpenAI, as well as concepts such as “machine learning” and “neural networks”. This creates well-defined, hierarchical partitions, facilitating analysis and summarisation.
Community summaries
For each community detected, summaries are generated, encapsulating information on all entities and relationships within each community. The language model uses domain-specific prompts to ensure complete and relevant data coverage.
- For example, a technology community grouping together “Microsoft”, “Satya Nadella”, “OpenAI”, and “Artificial Intelligence” might have a summary generated as follows: “This community deals with Microsoft’s artificial intelligence initiatives, led by Satya Nadella, in collaboration with OpenAI. Key topics include the development of advanced solutions and the impact on the company’s future products.” In the healthcare field, a community could group entities such as “Cancer Treatment”, “Clinical Trials”, “Dr Smith”, and “University of Medicine”, with a summary stating: “This community focuses on clinical trials for cancer treatment, led by Dr Smith at the University of Medicine.”
Community responses
When GraphRAG receives a question, it uses summaries of the relevant communities to generate partial responses. Each community provides a response based on its own internal elements and relationships, ensuring that all relevant perspectives are taken into account.
- For example, for a question about AI initiatives at Microsoft, relevant communities are identified, and partial responses are generated from their summaries. If the question is “What are the main developments in artificial intelligence at Microsoft?”, partial responses might include: “Microsoft has launched several artificial intelligence initiatives, led by Satya Nadella, and is working in collaboration with OpenAI.” and “Developments include advanced artificial intelligence solutions, resulting from collaboration between Microsoft and OpenAI.”
Overall response
After generating partial responses, GraphRAG combines them into an overall response via a “map-reduce” summarisation process. In the “map” phase, summaries of each relevant community are used to generate context-specific partial responses. Then, in the “reduce” phase, these partial responses are combined to form a coherent overall response.
- For example, partial responses about Microsoft’s AI initiatives are synthesised to provide a complete, integrated response: “Microsoft, under the leadership of Satya Nadella, has launched several artificial intelligence initiatives, in partnership with OpenAI. These developments include advanced AI solutions that will play a crucial role in Microsoft’s future products.” This process ensures that the final response is exhaustive, covering all the important aspects of the question asked, while being clear and easy to understand.
RAG vs. GraphRAG comparison example
To demonstrate GraphRAG’s effectiveness, Microsoft used the VIINA dataset (Violent Incident Information From News Articles), made up of thousands of Russian and Ukrainian press articles from June 2023, translated into English.
In the first comparison example, Microsoft teams asked the following query:
What is Novorossiya?
Basic RAG | GraphRAG |
The term “Novorossiya” is not mentioned in the provided context. However, to provide some information, Novorossiya, or New Russia, is a historical term referring to a region in the south of Ukraine that was conquered by the Russian Empire in the 18th century. In recent years, the term has been revived by Russian-backed separatists in Eastern Ukraine to refer to the territories they claim, but it is not officially recognized as a separate entity by international bodies. | Novorossiya, also known as New Russia, is a term with historical and contemporary significance. Historically, it referred to a region in the southern part of the Russian Empire that is now part of Ukraine. In the contemporary context, it has been used by pro-Russian separatists to refer to certain areas of Ukraine [Data: Entities (6494, 912, 6493)]. The term Novorossiya has been used in the context of a political movement in Ukraine, particularly by Russian nationalists. This movement has been associated with efforts to destabilize Ukraine and create a so-called “Novorossiya” [Data: Entities (912); Relationships (15211, 15206)]. The term has also been used to denote a separatist state in Ukraine, known as the Federal Republic of Novorossiya (FRN) [Data: Entities (7690)]. |
Basic RAG
Provides a basic response explaining that “Novorossiya” is a historical term for a region of Ukraine conquered by Russia in the 18th century, and used recently by pro-Russian separatists.
GraphRAG
Offers a more detailed response, including the historical and modern context of the term, and mentions its political implications and associations with destabilising activities in Ukraine.
What has Navorossiya done?
In the second example, the query requires connecting points:
Basic RAG | GraphRAG |
The text does not provide specific information on what Novorossiya has done. | Novorossiya, a political movement in Ukraine, has been involved in a series of destructive activities, particularly targeting various entities in Ukraine [Entities (6494, 912)]. The movement has been linked to plans to destroy properties of several Ukrainian entities, including Rosen, the Odessa Canning Factory, the Odessa Regional Radio Television Transmission Center, and the National Television Company of Ukraine [Relationships (15207, 15208, 15209, 15210)]. In addition to these, Novorossiya has also been associated with plans to destroy properties of Roshen, the Odessa Conservatory Factory, the Odesa Regional Radio Television Transmission Center, and the National Telecommunications Company of Ukraine [Relationships (8329, 8330, 8331, 8332)]. Furthermore, Novorossiya has been implicated in plans to blow up ATMs of PrivatBank, the largest state bank in Ukraine [Relationships (8335, 8335)]. The Office of the General Prosecutor in Ukraine has reported on the creation of Novorossiya, indicating the government’s awareness and potential concern over the activities of this movement [Relationships (8264, 8264)]. The destructive activities of Novorossiya have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. In summary, Novorossiya has been involved in a series of destructive activities targeting various entities in Ukraine. These activities have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. The Ukrainian government, through the Office of the General Prosecutor, has acknowledged the existence and activities of Novorossiya, indicating a level of concern over the movement’s actions. |
Basic RAG
Fails to provide a precise response because no specific text segment deals with Novorossiya’s actions.
GraphRAG
Provides a detailed response describing Novorossiya’s destructive activities in Ukraine, including plans to destroy property and targeted attacks, while citing accurate and verifiable sources.
What are the top 5 themes in the data?
In the last example, the entire dataset is queried:
Basic RAG | GraphRAG |
The text does not provide specific information on what Novorossiya has done. | Novorossiya, a political movement in Ukraine, has been involved in a series of destructive activities, particularly targeting various entities in Ukraine [Entities (6494, 912)]. The movement has been linked to plans to destroy properties of several Ukrainian entities, including Rosen, the Odessa Canning Factory, the Odessa Regional Radio Television Transmission Center, and the National Television Company of Ukraine [Relationships (15207, 15208, 15209, 15210)]. In addition to these, Novorossiya has also been associated with plans to destroy properties of Roshen, the Odessa Conservatory Factory, the Odesa Regional Radio Television Transmission Center, and the National Telecommunications Company of Ukraine [Relationships (8329, 8330, 8331, 8332)]. Furthermore, Novorossiya has been implicated in plans to blow up ATMs of PrivatBank, the largest state bank in Ukraine [Relationships (8335, 8335)]. The Office of the General Prosecutor in Ukraine has reported on the creation of Novorossiya, indicating the government’s awareness and potential concern over the activities of this movement [Relationships (8264, 8264)]. The destructive activities of Novorossiya have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. In summary, Novorossiya has been involved in a series of destructive activities targeting various entities in Ukraine. These activities have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. The Ukrainian government, through the Office of the General Prosecutor, has acknowledged the existence and activities of Novorossiya, indicating a level of concern over the movement’s actions. |
Basic RAG
List of generic and irrelevant topics related to the conflict in Ukraine, due to an inefficient vector search.
GraphRAG
Precisely identifies the five main themes related to the conflict and situation in Ukraine, including military activities, political entities, environmental concerns, community analyses, and humanitarian concerns. Each theme is accompanied by relevant details and references to source reports.
Microsoft explains that the traditional RAG method is limited to searching for the most similar text segments, which often leads to misleading responses due to superficial matches. In contrast, GraphRAG builds a knowledge graph that connects points across the entire data set, providing more accurate and contextual responses by considering the entire corpus and the relationships between entities.
What are the advantages and limitations of the GraphRAG method?
What are the advantages of using the GraphRAG method?
Improved accuracy
GraphRAG improves response accuracy by building a knowledge graph that represents the relationships between entities in the data corpus. This makes it possible to provide detailed, contextual responses based on a global understanding of the information, unlike the traditional RAG method, which is limited to superficial textual matches.
Global synthesis capability
By using hierarchical summaries of the communities detected in the graph, GraphRAG can effectively answer complex questions that require an overview. This includes questions that require a synthesis of the main themes or connections between different concepts within the dataset, which the RAG method cannot reliably do.
Data sourcing
GraphRAG ensures the transparency and verifiability of responses by providing information on the origin of the data used. Each statement is supported by links to the source documents, enabling users to easily verify the accuracy of the responses generated by the model.
What are the limitations of the GraphRAG method?
Complexity of implementation
Building and maintaining a knowledge graph requires considerable computational and technical resources. This includes the use of advanced language models to extract and summarise entities and relationships, as well as the application of sophisticated algorithms to detect communities in the graph.
Scalability
Although GraphRAG is designed to handle large amounts of data, scalability can become a challenge as dataset size increases. Managing very large graphs may require additional optimisations to maintain high performance.
Language model dependency
GraphRAG relies heavily on the capabilities of language models to accurately extract and summarise information. Any limitations or biases in these models can affect the quality of the knowledge graphs and, consequently, the responses generated.
GraphRAG represents a major advance in the field of artificial intelligence, particularly for chatbotsand response engines. By combining retrieval-augmented generation approaches with graph-based indexing, GraphRAG overcomes the limitations of traditional RAG methods. It provides a global understanding and synthesis of data, offering more precise, contextualised and verifiable responses.
Although its implementation can be complex and demanding in terms of resources, the advantages it brings in terms of quality of responses and ability to deal with complex issues fully justify its adoption. GraphRAG opens up new perspectives for the use of unstructured data, making interactions with AI systems more reliable and relevant.