Conventional cameras generate a lot of data that can be challenging to process in resource-constrained applications. Usually, cameras generate data streams on the order of the number of pixels in the image. However, most of this captured data is redundant for many downstream computer vision algorithms. We propose a novel camera design, which we call SuperCam, that adaptively processes captured data by performing superpixel segmentation on the fly. We show that SuperCam performs better than current state-of-the-art superpixel algorithms under memory-constrained situations. We also compare how well SuperCam performs when the compressed data is used for downstream computer vision tasks. Our results demonstrate that the proposed design provides superior output for image segmentation, object detection, and monocular depth estimation in situations where the available memory on the camera is limited. We posit that superpixel segmentation will play a crucial role as more computer vision inference models are deployed in edge devices. SuperCam would allow computer vision engineers to design more efficient systems for these applications.
We compare memory resitricted SNIC (we artificially reduce the image resolution to fit the algorithm into the various memory budgets shown) to SuperCam images using the same amount of memory. Below are results for three chosen memory budgets 68KB, 205KB and 615KB.
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
We show the performance of SuperCam on three downstream tasks: Image Segmentation, Object Detection and Monocular Depth Estimation.
Comparison of the performance of Segment Anything Model v2 on the BSD500 dataset
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
Comparison of the performance of YOLOv12 model on the COCO dataset
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
Comparison of the performance of Depth Anything v2 model on the DIODE dataset
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
SNIC
SuperCam
We show hardware emulation results of our proposed SuperCam design on downstream computer vision tasks; namely image segmentation performance of Segment Anything Model v2, object detection performance of YOLOv12 model and monocular depth estimation performance of Depth Anything v2 model.
Figure: Hardware Emulation results of SuperCam
@article{cvpr2026supercam,
title={Computer Vision with a Superpixelation Camera},
author={Mahalingam, Sasidharan and Brown, Rachel and Ingle, Atul},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}