Skip to main content
The graphdb container runs Neo4j, which stores code structure as nodes and relationships.

Why a Graph Database?

Code structure is inherently a graph:
  • Classes depend on other classes
  • Methods call other methods
  • Files import other files
  • Namespaces contain types
Graph databases treat relationships as first-class citizens.

Key Concepts

ConceptDescription
SchemaNode and edge structure
QueriesCommon Cypher query patterns

Relational vs Graph

Find all classes that depend on UserService: SQL requires multiple joins and gets complex fast. Graph queries handle arbitrary depth in a single line:
MATCH (n)-[:DEPENDS_ON*]->(target {name: 'UserService'})
RETURN DISTINCT n.name

Current Implementation

DatabaseStatus
Neo4jSupported (default)
MemgraphPlanned
Amazon NeptunePossible
ArangoDBPossible
The IGraphStorage port abstracts the database.

Database-Agnostic Design

The core doesn’t know which database it’s using:
public interface IGraphStorage
{
    Task<IEnumerable<GraphNode>> GetNodesAsync(CancellationToken ct);
    Task<IEnumerable<GraphEdge>> GetEdgesAsync(CancellationToken ct);
    Task SaveAnalysisResultAsync(...);
    Task UpdateNodeAttributesAsync(...);
}
Benefits:
  • Run CodeGraph with different databases
  • Tests can use in-memory implementation
  • Future databases easy to add

Performance

Graph databases excel at:
  • Traversals: Following relationships is O(1) per hop
  • Pattern matching: Finding structural patterns
  • Variable-depth queries: “Find all transitive dependencies”
They may be slower for:
  • Aggregations: Full graph scans
  • Simple lookups: Overkill for basic queries