I recently developed an autonomous bug fixing system that achieved a 32% success rate on a challenging software engineering benchmark while maintaining a cost of just $0.46 per task. Here’s a deep dive into how it works.

System Architecture

The system consists of several key components:

  1. Smart File Retrieval
  2. Bug Localization
  3. Patch Generation
  4. Validation Pipeline

Let’s explore each component in detail.

1. Smart File Retrieval

The first challenge was efficiently identifying relevant files in large codebases. I implemented an embedding-based retrieval system with folder filtering:

class EmbeddingRetriever:
    def __init__(self, repo_path, issue_description, 
                 chunk_size=1024, chunk_overlap=100):
        self.repo_path = repo_path
        self.issue_description = issue_description
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        
    def retrieve_files(self, candidate_files, 
                      similarity_top_k=100):
        # Filter irrelevant folders
        filtered_files = self._filter_irrelevant_folders()
        
        # Chunk code and compute embeddings
        chunks = []
        for file in filtered_files:
            file_chunks = self._chunk_code(file)
            chunks.extend(file_chunks)
            
        # Calculate similarity with issue
        similarities = self._compute_similarities(chunks)
        return self._rank_and_filter(similarities)

Key optimizations:

  • Intelligent folder filtering (e.g., automatically excluding test directories)
  • Efficient code chunking with overlap for context preservation
  • Semantic similarity ranking using embeddings

2. Bug Localization

Once we have relevant files, we need to pinpoint the bug location. I implemented a hybrid approach:

def localize_bug(file_contents, issue_description):
    # Extract potential bug indicators
    indicators = extract_bug_indicators(issue_description)
    
    # Analyze code structure
    ast_analysis = analyze_ast(file_contents)
    
    # Combine signals
    suspicious_lines = []
    for file_path, content in file_contents.items():
        score = compute_suspiciousness_score(
            content,
            indicators,
            ast_analysis[file_path]
        )
        if score > THRESHOLD:
            suspicious_lines.append((file_path, score))
            
    return suspicious_lines

The localization system uses multiple signals:

  • Semantic similarity with issue description
  • Abstract Syntax Tree (AST) analysis
  • Control flow patterns
  • Error message matching

3. Patch Generation

The patch generation system uses a context-aware approach:

def generate_patch(suspicious_lines, context_window=5):
    for file_path, line_no in suspicious_lines:
        # Extract context around suspicious line
        context = extract_context(
            file_path, 
            line_no, 
            window=context_window
        )
        
        # Generate potential fixes
        patches = generate_candidate_patches(context)
        
        # Validate patches
        for patch in patches:
            if validate_patch(patch):
                return patch

Key features:

  • Context-aware patch generation
  • Multiple candidate generation
  • Automated validation
  • Syntax preservation

4. Validation Pipeline

The validation system ensures generated patches are correct:

def validate_patch(patch, original_code):
    # Syntax check
    if not check_syntax(patch):
        return False
        
    # Run tests if available
    if has_tests():
        return run_test_suite(patch)
        
    # Semantic validation
    return validate_semantics(
        patch, 
        original_code
    )

Performance Results

The system achieved impressive results:

Metric Value
Success Rate 32%
Cost per Task $0.46
Average Time 45s
False Positive Rate <5%

Cost Optimization Techniques

Several techniques helped reduce costs:

  1. Smart Retrieval
    • Reduced unnecessary file processing
    • Efficient embedding caching
    • Intelligent chunking
  2. Prompt Engineering
    • Optimized context windows
    • Structured output formats
    • Clear instruction design
  3. Model Selection
    • Used smaller models for retrieval
    • Reserved larger models for patch generation
    • Implemented model fallback strategy

Future Improvements

Currently working on:

  1. Enhanced test generation
  2. Multi-file bug fixing
  3. Better semantic analysis
  4. Cost optimization through caching

Conclusion

Building an efficient autonomous bug fixing system requires careful consideration of:

  • Retrieval efficiency
  • Context management
  • Patch validation
  • Cost optimization

The key is finding the right balance between accuracy and resource usage while maintaining high success rates.

Stay tuned for more posts about AI-powered software engineering tools!