Skip to main content
CodeGraph stores code structure in a graph database. This enables powerful queries that are difficult or impossible with traditional relational databases.

Why a Graph Database?

Code structure is inherently a graph:
  • Classes depend on other classes
  • Methods call other methods
  • Files import other files
  • Namespaces contain types
These are relationships, and graph databases treat relationships as first-class citizens.

Relational vs Graph

Find all classes that depend on UserService:
-- Multiple joins, gets complex fast
SELECT DISTINCT c1.name
FROM classes c1
JOIN dependencies d ON c1.id = d.source_id
JOIN classes c2 ON d.target_id = c2.id
WHERE c2.name = 'UserService'
UNION
SELECT DISTINCT c1.name
FROM classes c1
JOIN dependencies d1 ON c1.id = d1.source_id
JOIN classes c2 ON d1.target_id = c2.id
JOIN dependencies d2 ON c2.id = d2.source_id
JOIN classes c3 ON d2.target_id = c3.id
WHERE c3.name = 'UserService'
-- ... continues for each level of depth
The graph query handles arbitrary depth traversal in a single line.

Data Model

Nodes

Code entities are stored as nodes with properties:
{
  "id": "MyApp.Services.UserService",
  "name": "UserService",
  "type": "Class",
  "fullName": "MyApp.Services.UserService",
  "filePath": "/src/Services/UserService.cs",
  "namespace": "MyApp.Services",
  "visibility": "Public",
  "isAbstract": false,
  "isStatic": false
}

Node Types

TypeDescription
NamespaceLogical grouping
ClassClass definition
InterfaceInterface definition
StructStruct definition
EnumEnum definition
MethodMethod definition

Edges (Relationships)

Connections between nodes:
TypeDescriptionExample
CONTAINSNamespace contains typeMyAppUserService
DEPENDS_ONUses/references another nodeOrderServiceUserService
INHERITSClass inheritanceAdminUserUser
IMPLEMENTSInterface implementationUserServiceIUserService
CALLSMethod invocationCreateOrder()GetUser()

Common Query Patterns

Find Dependencies

// Direct dependencies
MATCH (n {name: 'UserService'})-[:DEPENDS_ON]->(dep)
RETURN dep.name, dep.type

// Transitive dependencies (any depth)
MATCH (n {name: 'UserService'})-[:DEPENDS_ON*]->(dep)
RETURN DISTINCT dep.name, dep.type

Find Dependents

// What depends on this class?
MATCH (dependent)-[:DEPENDS_ON*]->(n {name: 'UserService'})
RETURN DISTINCT dependent.name, dependent.type

Shortest Path

// How are two classes connected?
MATCH path = shortestPath(
  (a {name: 'OrderController'})-[*]-(b {name: 'Database'})
)
RETURN path

Circular Dependencies

// Find cycles in the dependency graph
MATCH path = (n)-[:DEPENDS_ON*2..10]->(n)
RETURN path
LIMIT 10

Most Connected Nodes

// Potential god classes
MATCH (n {type: 'Class'})-[:DEPENDS_ON]->(dep)
RETURN n.name, count(dep) as dependencies
ORDER BY dependencies DESC
LIMIT 20

Orphan Classes

// Potentially dead code
MATCH (n {type: 'Class'})
WHERE NOT ()-[:DEPENDS_ON]->(n)
RETURN n.name, n.filePath

Current Implementation

CodeGraph currently uses Neo4j as its graph database, but the architecture allows for other implementations:
DatabaseStatusNotes
Neo4jSupportedCurrent default, mature graph database
MemgraphPlannedNeo4j-compatible, in-memory performance
Amazon NeptunePossibleCloud-native option
ArangoDBPossibleMulti-model database
The IGraphStorage port abstracts the database, making it possible to swap implementations without changing core logic.

Database-Agnostic Design

CodeGraph’s core doesn’t know which database it’s using. It only knows about the IGraphStorage interface:
public interface IGraphStorage
{
    Task<IEnumerable<GraphNode>> GetNodesAsync(CancellationToken ct);
    Task<IEnumerable<GraphEdge>> GetEdgesAsync(CancellationToken ct);
    Task SaveAnalysisResultAsync(...);
    Task UpdateNodeAttributesAsync(...);
}
This means:
  • You can run CodeGraph with different databases
  • Tests can use an in-memory implementation
  • Future database options are easy to add

Performance Considerations

Graph databases excel at:
  • Traversals: Following relationships is O(1) per hop
  • Pattern matching: Finding structural patterns in the graph
  • Variable-depth queries: “Find all transitive dependencies”
They may be slower for:
  • Aggregations over all data: Full graph scans
  • Simple key-value lookups: Overkill for basic queries
For code visualization, the traversal benefits far outweigh the costs.

What’s Next?