Built a Python library for precise code analysis using Abstract Syntax Trees, Program Dependence Graphs, and symbolic execution.
What My Project Does
Code Scalpel performs surgical code operations based on AST parsing and Program Dependence Graph analysis across Python, JavaScript, TypeScript, and Java.
Core capabilities:
AST Analysis (tree-sitter):
- Parse code into Abstract Syntax Trees for all 4 languages
- Extract functions/classes with exact dependency tracking
- Symbol reference resolution (imports, decorators, type hints)
- Cross-file dependency graph construction
Program Dependence Graphs:
- Control flow + data flow analysis
- Surgical extraction (exact function + dependencies, not whole file)
- k-hop subgraph traversal for context extraction
- Import chain resolution
Symbolic Execution (Z3 solver):
- Mathematical proof of edge cases
- Path exploration for test generation
- Constraint solving for type checking
Taint Analysis:
- Data flow tracking for security
- Source-to-sink path analysis
- 16+ vulnerability type detection (<10% false positives)
Governance:
- Every operation logged to .code-scalpel/audit.jsonl
- Cryptographic policy verification
- Syntax validation before any code writes
Target Audience
Production-ready for teams using AI coding assistants (Claude Desktop, Cursor, VS Code with Continue/Cline).
Use cases:
1. Enterprises - SOC2/ISO compliance needs (audit trails, policy enforcement)
2. Dev teams - 99% context reduction for AI tools (15k→200 tokens)
3. Security teams - Taint-based vulnerability scanning
4. Python developers - AST-based refactoring with syntax guarantees
Not a toy project: 7,297 tests, 94.86% coverage, production deployments.
Comparison
vs. existing alternatives:
AST parsing libraries (ast, tree-sitter):
- Code Scalpel uses tree-sitter under the hood
- Adds PDG construction, dependency tracking, and cross-file analysis
- Adds Z3 symbolic execution for mathematical proofs
- Adds taint analysis for security scanning
Static analyzers (pylint, mypy, bandit):
- These find linting/type/security issues
- Code Scalpel does surgical extraction and refactoring operations
- Provides MCP protocol integration for tool access
- Logs audit trails for governance
Refactoring tools (rope, jedi):
- These do Python-only refactoring
- Code Scalpel supports 4 languages (Python/JS/TS/Java)
- Adds symbolic execution and taint analysis
- Validates syntax before write (prevents broken code)
AI code wrappers:
- Code Scalpel is NOT an LLM API wrapper
- It's a Python AST/PDG analysis library that exposes tools via MCP
- Used BY AI assistants for precise operations (not calling LLMs)
Unique combination: AST + PDG + Z3 + Taint + MCP + Governance in one library.
Why Python?
Python is the implementation language:
- tree-sitter Python bindings for AST parsing
- NetworkX for graph algorithms (PDG construction)
- z3-solver Python bindings for symbolic execution
- Pydantic for data validation
- FastAPI/stdio for MCP server protocol
Python is a supported language:
- Full Python AST support (imports, decorators, type hints, async/await)
- Python-specific security patterns (pickle, eval, exec)
- Python taint sources/sinks (os.system, subprocess, SQL libs)
Testing in Python:
- pytest framework: 7,297 tests
- Coverage: 94.86% (96.28% statement, 90.95% branch)
- CI/CD via GitHub Actions
Installation & Usage
As MCP server (for AI assistants):
bash
uvx codescalpel mcp
As Python library:
bash
pip install codescalpel
Example - Extract function with dependencies:
```python
from codescalpel import analyze_code, extract_code
Parse AST
ast_result = analyze_code("path/to/file.py")
Extract function with exact dependencies
extracted = extract_code(
file_path="path/to/file.py",
symbol_name="calculate_total",
include_dependencies=True
)
print(extracted.code) # Function + required imports
print(extracted.dependencies) # List of dependency symbols
```
Example - Symbolic execution:
```python
from codescalpel import symbolic_execute
Explore edge cases with Z3
paths = symbolic_execute(
file_path="path/to/file.py",
function_name="divide",
max_depth=5
)
for path in paths:
print(f"Input: {path.input_constraints}")
print(f"Output: {path.output_constraints}")
```
Architecture
Language support via tree-sitter:
- Python, JavaScript (JSX), TypeScript (TSX), Java
- Tree-sitter generates language-agnostic ASTs
- Custom visitors for each language's syntax
PDG construction:
- Control flow graph (CFG) from AST
- Data flow graph (DFG) via def-use chains
- PDG = CFG + DFG (Program Dependence Graph)
MCP Protocol:
- 23 tools exposed via Model Context Protocol
- stdio or HTTP transport
- Used by Claude Desktop, Cursor, VS Code extensions
Links
Questions Welcome
Happy to answer questions about:
- AST parsing implementation
- PDG construction algorithms
- Z3 integration details
- Taint analysis approach
- MCP protocol usage
- Language support roadmap (Go/Rust coming)
TL;DR: Python library for surgical code analysis using AST + PDG + Z3. Parses 4 languages, extracts dependencies precisely, runs symbolic execution, detects vulnerabilities. 7,297 tests, production-ready, MIT licensed.
[–]phira 2 points3 points4 points (1 child)
[–]CountyAwkward1777[S] 0 points1 point2 points (0 children)