CircuitSense
A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process

1Northeastern University, 2Brookhaven National Laboratory
CircuitSense Overview

Abstract

Engineering design requires translating visual representations into mathematical models across hierarchical levels. While Multi-modal Large Language Models (MLLMs) excel at natural image tasks, their ability to extract equations from technical diagrams remains untested. We present CircuitSense, a benchmark of 8,006+ problems evaluating circuit understanding across three tasks—Perception, Analysis, and Design—with emphasis on deriving symbolic equations from visual inputs. Additionally, we propose a hierarchical synthetic pipeline that generates schematics and block diagrams with guaranteed ground-truth equations. Evaluating six state-of-the-art MLLMs reveals a critical gap: models achieve 85%+ accuracy on component recognition but fall below 19% on equation derivation. This performance collapse exposes the fundamental barrier to engineering AI. The correlation between equation derivation capability and design task performance confirms that mathematical understanding, not pattern recognition, defines engineering competence.

Model Performance Across Tasks

Performance comparison of six MLLMs across Perception, Analysis, and Design tasks. The chart reveals strong perception capabilities but catastrophic failure in mathematical analysis tasks.

Key Findings

Perception task results

Model Component
Detec. (%)
Connection
Ident. (%)
Function
Class. (%)
GPT-4o1009495
Gemini-2.5-Pro10010095
Claude-Sonnet-41008886
InternVL3-72B957612
Qwen2.5-VL956820
GLM-4.5V1007826

Design task results

Model Schematic-
level (%)
Block-
level (%)
Hierarchical-
Design (%)
GPT-4o10.5236.3618.92
Gemini-2.5-Pro36.3867.2751.35
Claude-Sonnet-417.5451.8329.83
InternVL3-72B7.0152.7329.73
Qwen2.5-VL8.7630.9118.92
GLM-4.5V15.7950.9132.35

Curated Analysis Task Results

Results of curated problems for Analysis task with multiple choice and open-ended format.

Model Level 0
(Resistor)
Level 1
(RLC)
Level 2
(Small Signal)
Level 3
(Transistor)
Level 4
(Block)
Overall
Accuracy
Multiple Choice Format (%)
GPT-4o 39.8049.5832.8848.8039.5845.07
Claude-Sonnet-4 66.7271.2261.6472.0166.6769.67
Gemini-2.5-Pro 74.0487.3978.0881.7289.5880.71
InternVL3-78B 23.1620.5913.7013.1114.5818.06
Qwen2.5-VL-72b-instruct 29.5341.6030.1435.9429.1734.90
GLM-4.5V 24.6329.209.5917.2831.2522.42
Open-ended format (%)
GPT-4o 29.5929.8319.1813.9617.8122.84
Claude-Sonnet-4 35.5650.2112.3327.0433.3334.76
Gemini-2.5-Pro 76.9884.8773.9755.8572.9270.32
InternVL3-78B 20.7919.546.8514.4710.4217.26
Qwen2.5-VL-72B-Instruct 28.7331.3016.4413.7122.9222.85
GLM-4.5V 34.4439.7113.7019.5025.0028.83

Synthetic Problems Performance

Performance comparison on our hierarchical synthetic problems with symbolic equation ground truth.

Model Level 0
(Resistor)
Level 1
(RLC)
Level 2
(Small Signal)
Level 4
(Block)
Level 5
(System)
Overall
GPT-4o 1.503.335.807.339.654.98
Claude-Sonnet-4 2.835.165.8011.647.896.29
Gemini-2.5-Pro 3.4911.6738.0012.3335.9619.06
InternVL3-78B 1.503.676.683.720.443.50
Qwen2.5-VL-72B-Instruct 0.834.176.036.6410.094.96
GLM-4.5V 0.337.334.004.505.704.09

Benchmark Examples

BibTeX

@misc{akbari2025circuitsensehierarchicalcircuitbenchmark,
      title={CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process}, 
      author={Arman Akbari and Jian Gao and Yifei Zou and Mei Yang and Jinru Duan and Dmitrii Torbunov and Yanzhi Wang and Yihui Ren and Xuan Zhang},
      year={2025},
      eprint={2509.22339},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.22339}, 
}
Northeastern Monogram
Brookhaven National Laboratory