Why a Graph Database?
Code structure is inherently a graph:- Classes depend on other classes
- Methods call other methods
- Files import other files
- Namespaces contain types
Relational vs Graph
Find all classes that depend on UserService:- SQL (Relational)
- Graph Query
Data Model
Nodes
Code entities are stored as nodes with properties:Node Types
| Type | Description |
|---|---|
Namespace | Logical grouping |
Class | Class definition |
Interface | Interface definition |
Struct | Struct definition |
Enum | Enum definition |
Method | Method definition |
Edges (Relationships)
Connections between nodes:| Type | Description | Example |
|---|---|---|
CONTAINS | Namespace contains type | MyApp → UserService |
DEPENDS_ON | Uses/references another node | OrderService → UserService |
INHERITS | Class inheritance | AdminUser → User |
IMPLEMENTS | Interface implementation | UserService → IUserService |
CALLS | Method invocation | CreateOrder() → GetUser() |
Common Query Patterns
Find Dependencies
Find Dependents
Shortest Path
Circular Dependencies
Most Connected Nodes
Orphan Classes
Current Implementation
CodeGraph currently uses Neo4j as its graph database, but the architecture allows for other implementations:| Database | Status | Notes |
|---|---|---|
| Neo4j | Supported | Current default, mature graph database |
| Memgraph | Planned | Neo4j-compatible, in-memory performance |
| Amazon Neptune | Possible | Cloud-native option |
| ArangoDB | Possible | Multi-model database |
IGraphStorage port abstracts the database, making it possible to swap implementations without changing core logic.
Database-Agnostic Design
CodeGraph’s core doesn’t know which database it’s using. It only knows about theIGraphStorage interface:
- You can run CodeGraph with different databases
- Tests can use an in-memory implementation
- Future database options are easy to add
Performance Considerations
Graph databases excel at:- Traversals: Following relationships is O(1) per hop
- Pattern matching: Finding structural patterns in the graph
- Variable-depth queries: “Find all transitive dependencies”
- Aggregations over all data: Full graph scans
- Simple key-value lookups: Overkill for basic queries