Facebook-owned Meta released an artificial intelligence model that can detect individual items within a picture and the world’s largest dataset of image annotations. The company hopes to democratize a critical part of computer vision.
The capacity to recognize which pixels in an image corresponding to a given object is called “segmentation.” The business announced the release of its new Segment Anything Model, dubbed “SAM,” and a Segment Anything 1-Billion mask dataset, dubbed “SA-1B,” to the research community. The goal is to stimulate further study into the foundation models needed to create computer vision.
Segmentation is an essential challenge in computer vision that allows AI models to distinguish objects in a given image. It’s utilized in a variety of applications, from scientific picture analysis to photo editing. However, most AI researchers are unable to create an accurate segmentation model for specific use cases because it requires highly specialized work by technical experts as well as access to extremely powerful AI training infrastructure and massive volumes of annotated and domain-specific data.
To construct an accurate segmentation model for a specific purpose, technical specialists having access to AI-trained infrastructure and significant volumes of data must often do highly specialized work. As part of the new initiative, Meta’s general Segment Anything Model (SAM) and Segment Anything 1-Billion mask dataset (SA-1B) have been made available. Both intend to enable a wide range of applications and promote more studies into establishment models for computer vision.
The SAM model has attained a general understanding of what objects are. It can create “masks” for any object in any image or video, including things and images it has never seen before. Masking is a technique that includes detecting and separating an object from the rest of the scene based on variations in contrast around its boundaries. SAM, according to Meta’s researchers, is sufficiently generic to cover a wide range of use cases and can be utilized right away on any type of picture domain, whether it’s cell microscopy, underwater photos, or something else, with no additional training necessary.
There are two types of segmentation: interactive segmentation, in which a person directs the model by repeatedly improving a mask, and automatic segmentation, in which the model performs it on its own after receiving training on hundreds or thousands of annotated objects.
The SA-1B image dataset, used to train SAM, has around 1.1 billion segmentation masks collected from 11 million authorized and privacy-preserving images, which is 400 times more masks than any other dataset. SAM’s ability to apply a generalization to new types of images and objects beyond what it was trained on is due to the large dataset size. As a result, AI practitioners will no longer be required to gather their own segmentation data in order to fine-tune SAM for individual use cases.
SAM can be effective in any domain that needs locating and segmenting any object in any image. It might, for example, be employed as part of bigger AI systems that seek to understand the world, such as a model that can identify both the visual and text content of a webpage.
According to CEO Mark Zuckerberg, embedding generative AI “creative aids” into Meta’s programs is a focus this year. Internally, Meta already uses SAM-like technologies for tasks such as tagging photographs, censoring forbidden content, and choosing which posts to promote to Facebook and Instagram users. According to the corporation, the distribution of SAM will increase accessibility to that type of technology.
Both the SAM model and dataset will be made accessible for non-commercial use. Users who contribute their own photographs to an accompanying prototype must also agree to solely use it for research purposes.