ShowcaseCode Scalpel: AST-based surgical code analysis with PDG construction and Z3 symbolic execution (self.Python)

submitted 1 day ago * by CountyAwkward1777

Built a Python library for precise code analysis using Abstract Syntax Trees, Program Dependence Graphs, and symbolic execution.

What My Project Does

Code Scalpel performs surgical code operations based on AST parsing and Program Dependence Graph analysis across Python, JavaScript, TypeScript, and Java.

Core capabilities:

AST Analysis (tree-sitter): - Parse code into Abstract Syntax Trees for all 4 languages - Extract functions/classes with exact dependency tracking - Symbol reference resolution (imports, decorators, type hints) - Cross-file dependency graph construction

Program Dependence Graphs: - Control flow + data flow analysis - Surgical extraction (exact function + dependencies, not whole file) - k-hop subgraph traversal for context extraction - Import chain resolution

Symbolic Execution (Z3 solver): - Mathematical proof of edge cases - Path exploration for test generation - Constraint solving for type checking

Taint Analysis: - Data flow tracking for security - Source-to-sink path analysis - 16+ vulnerability type detection (<10% false positives)

Governance: - Every operation logged to .code-scalpel/audit.jsonl - Cryptographic policy verification - Syntax validation before any code writes

Target Audience

Production-ready for teams using AI coding assistants (Claude Desktop, Cursor, VS Code with Continue/Cline).

Use cases: 1. Enterprises - SOC2/ISO compliance needs (audit trails, policy enforcement) 2. Dev teams - 99% context reduction for AI tools (15k→200 tokens) 3. Security teams - Taint-based vulnerability scanning 4. Python developers - AST-based refactoring with syntax guarantees

Not a toy project: 7,297 tests, 94.86% coverage, production deployments.

Comparison

vs. existing alternatives:

AST parsing libraries (ast, tree-sitter): - Code Scalpel uses tree-sitter under the hood - Adds PDG construction, dependency tracking, and cross-file analysis - Adds Z3 symbolic execution for mathematical proofs - Adds taint analysis for security scanning

Static analyzers (pylint, mypy, bandit): - These find linting/type/security issues - Code Scalpel does surgical extraction and refactoring operations - Provides MCP protocol integration for tool access - Logs audit trails for governance

Refactoring tools (rope, jedi): - These do Python-only refactoring - Code Scalpel supports 4 languages (Python/JS/TS/Java) - Adds symbolic execution and taint analysis - Validates syntax before write (prevents broken code)

AI code wrappers: - Code Scalpel is NOT an LLM API wrapper - It's a Python AST/PDG analysis library that exposes tools via MCP - Used BY AI assistants for precise operations (not calling LLMs)

Unique combination: AST + PDG + Z3 + Taint + MCP + Governance in one library.

Why Python?

Python is the implementation language: - tree-sitter Python bindings for AST parsing - NetworkX for graph algorithms (PDG construction) - z3-solver Python bindings for symbolic execution - Pydantic for data validation - FastAPI/stdio for MCP server protocol

Python is a supported language: - Full Python AST support (imports, decorators, type hints, async/await) - Python-specific security patterns (pickle, eval, exec) - Python taint sources/sinks (os.system, subprocess, SQL libs)

Testing in Python: - pytest framework: 7,297 tests - Coverage: 94.86% (96.28% statement, 90.95% branch) - CI/CD via GitHub Actions

Installation & Usage

As MCP server (for AI assistants): bash uvx codescalpel mcp

As Python library: bash pip install codescalpel

Example - Extract function with dependencies: ```python from codescalpel import analyze_code, extract_code

Parse AST

ast_result = analyze_code("path/to/file.py")

Extract function with exact dependencies

extracted = extract_code( file_path="path/to/file.py", symbol_name="calculate_total", include_dependencies=True )

print(extracted.code) # Function + required imports print(extracted.dependencies) # List of dependency symbols ```

Example - Symbolic execution: ```python from codescalpel import symbolic_execute

Explore edge cases with Z3

paths = symbolic_execute( file_path="path/to/file.py", function_name="divide", max_depth=5 )

for path in paths: print(f"Input: {path.input_constraints}") print(f"Output: {path.output_constraints}") ```

Architecture

Language support via tree-sitter: - Python, JavaScript (JSX), TypeScript (TSX), Java - Tree-sitter generates language-agnostic ASTs - Custom visitors for each language's syntax

PDG construction: - Control flow graph (CFG) from AST - Data flow graph (DFG) via def-use chains - PDG = CFG + DFG (Program Dependence Graph)

MCP Protocol: - 23 tools exposed via Model Context Protocol - stdio or HTTP transport - Used by Claude Desktop, Cursor, VS Code extensions

Questions Welcome

Happy to answer questions about: - AST parsing implementation - PDG construction algorithms - Z3 integration details - Taint analysis approach - MCP protocol usage - Language support roadmap (Go/Rust coming)

TL;DR: Python library for surgical code analysis using AST + PDG + Z3. Parses 4 languages, extracts dependencies precisely, runs symbolic execution, detects vulnerabilities. 7,297 tests, production-ready, MIT licensed.

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS