Revised GSoC Proposal for ABI Compliance Checker

Introduction

This proposal outlines a plan to develop and deploy an automated ABI Compliance Checker for PostgreSQL. The system will monitor commits to PostgreSQL repositories, analyze binary compatibility, and alert developers when ABI incompatibilities are detected in minor releases. This aligns with PostgreSQL's Server API and ABI Stability Guidance to maintain compatibility between minor releases.

The incident in PostgreSQL 17.1, where an unexpected ABI change was shipped and later reversed in 17.2, demonstrates the need for automated verification beyond manual code review. The proposed system will help prevent such incidents by providing early detection and notification of potential ABI compatibility issues.

Biographical Information

About Me

Name: Yash Singh
Email: [email protected]
Phone: +91 8755-765-125
GitHub: [https://github.com/yashs33244/]
LinkedIn: [https://www.linkedin.com/in/yash-singh-2757aa1b4/]
University: Indian Institute of Information Technology Una, Himachal Pradesh
Program: Bachelor of Technology (2022-2026)
Timezone: IST (UTC+5:30)
Working Hours: 10 AM - 7 PM IST (flexible to overlap with mentor timezones)
Preferred Communication: Slack, Email, GitHub

Technical Background

I have experience in:

  • C/C++ programming and understanding of binary compatibility issues
  • Linux system administration and automation
  • CI/CD pipeline development
  • Web development (HTML, CSS, JavaScript)
  • PostgreSQL database usage and internals

Project Objectives and Expected Results

Primary Objectives

  1. Create an automated ABI compliance checking system that runs on a flexible schedule (cron-based)
  2. Generate structured ABI analysis data in machine-readable formats (JSON/XML)
  3. Implement an alert system for ABI compatibility violations
  4. Integrate the system with PostgreSQL's existing CI infrastructure
  5. Provide a foundation for future report visualization

Expected Results

  1. A reliable, low-maintenance system that automatically checks ABI compatibility
  2. Structured data output in standard formats for further processing
  3. Notification system to alert committers about potential ABI violations
  4. Documentation for the system's operation and maintenance
  5. Comprehensive test suite to ensure the system's reliability

Why PostgreSQL?

PostgreSQL is a cornerstone of open-source database technology known for its reliability, feature robustness, and extensibility. The extension ecosystem surrounding PostgreSQL is particularly vital, making ABI stability between minor releases crucial for the community. By working on this project, I hope to:

  1. Contribute to the PostgreSQL ecosystem's stability and reliability
  2. Help extension developers by ensuring API/ABI consistency
  3. Learn more about PostgreSQL's internal architecture and development workflows
  4. Become an active member of the PostgreSQL community

Deliverables and Implementation Plan

System Architecture

The system will consist of the following major components:

Loading diagram...
flowchart TB subgraph "PostgreSQL Repository" PGRepo["Git Repository"] end subgraph "ABI Checker Infrastructure" Scheduler["Cron Job Scheduler"] BuildSystem["Build System"] ABIAnalyzer["ABI Analysis Tool"] DataGenerator["Structured Data Generator"] DB[(Report Database)] AlertSystem["Alert System"] end subgraph "Data Access Layer" API["API for Data Access"] DataTransformer["Data Transformation Tools"] end PGRepo --> Scheduler Scheduler --> BuildSystem BuildSystem --> ABIAnalyzer ABIAnalyzer --> DataGenerator DataGenerator --> DB DB --> API API --> DataTransformer ABIAnalyzer -- "ABI Changes Detected" --> AlertSystem

Technical Implementation Details

1. ABI Extraction and Analysis

I plan to evaluate several ABI checking tools (abi-dumper, libabigail, ABI Compliance Checker, and Pkg-ABIDiff) during the initial phase of the project. This evaluation will consider:

  • Accuracy in detecting ABI-breaking changes
  • Performance and resource requirements
  • Output format and flexibility
  • Integration capabilities with CI systems
  • Community support and maintenance

The workflow will be:

Loading diagram...
sequenceDiagram participant Scheduler as Cron Scheduler participant Builder as Build System participant ABI as ABI Analyzer participant DB as Database participant Alert as Alert System Scheduler->>Builder: Trigger build at scheduled time Builder->>Builder: Checkout commit Builder->>Builder: Build PostgreSQL Builder->>ABI: Extract ABI from libraries and headers alt First build of this branch ABI->>DB: Store baseline ABI else Subsequent build ABI->>DB: Retrieve previous ABI ABI->>ABI: Compare with current ABI alt ABI changes detected ABI->>DB: Store structured data (JSON/XML) ABI->>Alert: Send notification else No ABI changes ABI->>DB: Store "no change" record end end

2. Build System Integration

The build system will:

  • Execute on a configurable schedule via cron
  • Build PostgreSQL for relevant branches
  • Extract ABI information from shared libraries and static components
  • Enable on-demand execution for specific commits

3. Data Storage and Access

The system will:

  • Store ABI data in structured formats (JSON/XML)
  • Provide an API for accessing the data
  • Enable transformation of data for different use cases
  • Support future integration with visualization tools

4. Alert System

The alert system will:

  • Detect ABI changes between minor versions
  • Determine if changes violate compatibility guidelines
  • Send email notifications to committers
  • Generate severity ratings for detected changes
Loading diagram...
sequenceDiagram participant Developer as Developer participant Scheduler as Cron Scheduler participant Builder as Build Process participant ABI as ABI Analyzer participant DB as Database participant Email as Email Notification Scheduler->>Builder: Trigger build process Builder->>Builder: Checkout code Builder->>Builder: Configure PostgreSQL Builder->>Builder: Compile PostgreSQL Builder->>Builder: Run tests Builder->>ABI: Extract ABI information alt First build on branch ABI->>DB: Store baseline ABI DB->>DB: Update status (baseline) else Subsequent build ABI->>DB: Fetch previous ABI ABI->>ABI: Compare with current alt ABI change detected ABI->>DB: Store structured data DB->>DB: Update status (changed) alt Breaking change in minor version ABI->>Email: Send alert to committers end else No ABI change ABI->>DB: Record "no change" DB->>DB: Update status (stable) end end

ABI Change Detection Algorithm

Loading diagram...
flowchart TD Start([Start ABI Check]) --> Build[Build PostgreSQL] Build --> Extract[Extract ABI from libraries and headers] Extract --> Query{Is this first build?} Query -->|Yes| SaveBaseline[Save as baseline] SaveBaseline --> End([End]) Query -->|No| GetPrevious[Get previous ABI] GetPrevious --> Compare[Compare ABIs] Compare --> ChangeDetected{Changes detected?} ChangeDetected -->|No| RecordNoChange[Record no change] RecordNoChange --> End ChangeDetected -->|Yes| AnalyzeChanges[Analyze change types] AnalyzeChanges --> ClassifyChanges[Classify changes] ClassifyChanges --> IsMajorVersion{Is major version?} IsMajorVersion -->|Yes| RecordChanges[Record changes] RecordChanges --> End IsMajorVersion -->|No| IsBreaking{Is breaking change?} IsBreaking -->|No| RecordChanges IsBreaking -->|Yes| AlertDevelopers[Alert developers] AlertDevelopers --> RecordChanges

Code Affected

The project primarily involves creating new code for the ABI checking system, but will need to integrate with:

  1. PostgreSQL build farm infrastructure
  2. PostgreSQL website infrastructure
  3. PostgreSQL continuous integration systems

No direct changes to PostgreSQL core code are anticipated, though I may need to add appropriate build targets or configuration options to facilitate ABI checking.

Development Approach

I plan to develop the system in stages:

  1. Evaluate and select appropriate ABI checking tools
  2. Develop a local prototype that can build PostgreSQL and extract ABI information
  3. Implement comparison logic to detect changes and generate structured data
  4. Create the data access layer for further processing
  5. Implement the notification system
  6. Work with PostgreSQL infrastructure team to deploy the system

Related Pre-Proposal Work

To prepare for this proposal, I've:

  1. Reviewed the existing PostgreSQL build farm code
  2. Tested various ABI analysis tools (abi-dumper, libabigail, etc.)
  3. Created prototype reports to understand the data format
  4. Analyzed the ABI change that occurred in PostgreSQL 17.1
  5. Researched methods for analyzing both shared libraries and statically linked components

Schedule of Deliverables

April 20 — May 20 (Pre-Selection)

  • Further research on ABI checking tools and techniques
  • Explore PostgreSQL build infrastructure
  • Set up development environment
  • Create initial prototype for local testing

May 20 — June 12 (Community Bonding Period)

  • Engage with the PostgreSQL community
  • Refine project requirements with mentors
  • Create detailed technical specification
  • Develop test cases for ABI checking
  • Evaluate and select ABI analysis tools

June 13 — July 25 (Phase I)

  • Week 1-2: Implement core ABI extraction and build system
  • Week 3-4: Develop ABI comparison logic and structured data generation
  • Week 5-6: Create data access layer for future integration

Phase I Deliverable: Working prototype that can build PostgreSQL, extract ABI information, compare versions, and generate structured data reports.

July 25 — September 12 (Phase II)

  • Week 1-2: Implement header file analysis for statically linked components
  • Week 3-4: Implement email notification system
  • Week 5-6: Develop integration with PostgreSQL infrastructure
  • Week 7: Documentation and testing

Phase II Deliverable: Complete system ready for deployment, including data access layers, notification system, and documentation.

September 12 — November 21 (For Extended Timelines)

  • System deployment and integration
  • Performance optimizations
  • Additional features based on community feedback
  • Long-term maintenance planning

Post GSoC Plans

After the project is completed, I plan to:

  1. Maintain the ABI Compliance Checker system
  2. Improve it based on community feedback
  3. Explore additional automated checks that could benefit PostgreSQL
  4. Continue contributing to the PostgreSQL project in other areas

Technical Challenges and Solutions

Challenge 1: Accurately detecting ABI-breaking changes

Many ABI changes are subtle and can be difficult to detect automatically. The solution will involve:

  • Using multiple detection strategies (symbol versioning, structure layout, etc.)
  • Developing heuristics to identify likely compatibility issues
  • Creating a comprehensive test suite with known ABI changes

Challenge 2: Scalability and performance

The system needs to handle frequent commits without overloading resources:

  • Implement incremental ABI analysis where possible
  • Optimize build process for ABI extraction
  • Use efficient storage for historical ABI data

Challenge 3: Integration with existing infrastructure

The PostgreSQL build farm has established practices and infrastructure:

  • Work closely with infrastructure team
  • Design system to be minimally invasive
  • Follow PostgreSQL community standards and practices

Technical Implementation Details

ABI Analysis Process

The ABI analysis process will target the following components:

  1. Core libraries:

    • libpq (client library)
    • libpgport (portability library)
    • libpgcommon (common utility functions)
    • Server modules when built as shared objects
  2. Header files:

    • postgres.h and related headers
    • lib/stringinfo.h and other API headers
    • Public extension API headers
  3. Analysis metrics:

    • Exported symbols (added, removed, changed)
    • Structure sizes and layouts
    • Function signatures and parameter types
    • Global variables and constants

Data Format for ABI Reports

The structured data will include:

{
  "branch": "REL_17_STABLE",
  "commit": {
    "hash": "abcd1234",
    "message": "Fix memory leak in function X",
    "author": "Developer Name",
    "timestamp": "2025-06-01T10:30:00Z"
  },
  "libraries": [
    {
      "name": "libpq",
      "abi_changes": [
        {
          "type": "FUNCTION_SIGNATURE_CHANGED",
          "symbol": "PQconnectdb",
          "old_signature": "...",
          "new_signature": "...",
          "is_breaking": true
        }
      ]
    }
  ],
  "headers": [
    {
      "name": "postgres.h",
      "api_changes": [
        {
          "type": "STRUCT_SIZE_CHANGED",
          "name": "StringInfoData",
          "old_size": 16,
          "new_size": 24,
          "is_breaking": true
        }
      ]
    }
  ]
}

Tool Selection Criteria

The evaluation of ABI checking tools will focus on:

  1. Accuracy: How well the tool detects real breaking changes without false positives
  2. Coverage: Ability to analyze both shared libraries and header files
  3. Performance: Speed and resource requirements
  4. Integration: Ease of integration with automation systems
  5. Maintenance: Active development and community support

Collaboration Plan

I will work closely with the PostgreSQL community throughout the project:

  1. Regular communication with mentors via email and scheduled calls
  2. Weekly progress updates on a dedicated mailing list or forum thread
  3. Code reviews through GitHub or the preferred PostgreSQL contribution mechanism
  4. Detailed documentation to facilitate community understanding and maintenance

Conclusion

The proposed ABI Compliance Checker will significantly improve PostgreSQL's ability to maintain ABI compatibility between minor releases. This will benefit extension developers, package maintainers, and ultimately all PostgreSQL users by providing greater stability and reliability. I am excited about the opportunity to work on this project and to contribute to the PostgreSQL community.

By focusing first on generating reliable, structured data about ABI changes, the system will provide a solid foundation that can be extended with visualization and reporting tools in the future. The separation between data generation and presentation ensures maximum flexibility and maintainability.