What is computer vision (or machine vision)?

We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!


The process of recognizing objects and understanding the world through images collected from digital cameras is often referred to as “computer vision” or “machine vision”. It is one of the most complex and challenging areas of Artificial Intelligence (AI), partly due to the complexity of the many scenes captured from the real world.

The area depends on the digital version of the area viewed by the camera on a combination of geometry, statistics, optics, machine learning and sometimes lighting. Many algorithms deliberately focus on a very narrow and focused goal, such as identifying and reading license plates.

Key Areas of Computer Vision

AI scientists often focus on specific targets, and these specific challenges have evolved into important subdivisions. Often, this focus leads to better performance because algorithms have more clearly defined functions. The general goal of Machine Vision may be insurmountable, but it is possible to answer simple questions like reading every license plate behind a toll booth.

Some important areas are:

  • Facial recognition: Identifying people using the ratio of the distance between facial features and facial features in images can help organize the collection of photos and videos. In some cases, it may provide accurate identification enough to provide security.
  • Object Identity: Finding boundaries between objects helps to segment images, guide world inventory, and automation. Sometimes algorithms are strong enough to accurately identify objects, animals, or plants, a genius that forms the basis for application in industrial plants, farms, and other fields.
  • Structured Identity: While the setting is predictable and easily simplified, something that often happens on an assembly line or industrial plant, the algorithm can be more accurate. Computer Vision algorithms provide a good way to ensure quality control and improve safety, especially for repetitive tasks.
  • Structured lighting: Some algorithms often use specific patterns of light generated by lasers to simplify the task and to give more accurate answers than can be produced from a scene with many, often unpredictable, diffused lighting from sources.
  • Statistical analysis: In some cases, statistics about the scene can help track people’s belongings. For example, a person can be identified by tracking the speed and length of a person’s steps.
  • Color analysis: Careful analysis of the colors in the image can answer the questions. For example, a person’s heart rate can be measured by tracking a slight red wave around the skin with each beat. Many species of birds can be identified by the distribution of colors. Some algorithms rely on sensors that can detect light frequencies outside the range of human vision.

The best application for computer vision

While the challenge of teaching a computer to see the world remains a big one, it is well understood enough to deploy some narrow applications. They may not give complete answers but they are good enough to be useful. They achieve a level of reliability that is good enough for users.

  • Facial recognition: Many websites and software packages for organizing photos provide some method for sorting the image by the people inside them. Makes it possible to find all the images with a specific face, they might say. The algorithms are accurate enough for this task, as users do not need full accuracy and the results of incorrectly classified photos are very low. Algorithms are finding some application in the areas of law enforcement and security, but many are concerned that their accuracy is not accurate enough to support criminal proceedings.
  • 3D Object Reconstruction: It is common practice for manufacturers, game designers and artists to scan objects to create three-dimensional models. When lighting is controlled, often using a laser, the results are accurate enough to accurately reproduce many simple objects. Some models feed into a 3D printer, sometimes with some editing, to effectively create three-dimensional reproduction. Reconstruction results vary widely without controlled lighting.
  • Mapping and modeling: Some are using images of planes, drones and automobiles to create accurate models of roads, buildings and other parts of the world. Accuracy depends on the accuracy of the camera sensor and the lighting captured that day. Digital maps are already precise enough for travel planning and are constantly refined, but often require human editing for complex scenes. Building models are often accurate enough for the construction and remodeling of buildings. Roofers, for example, often bid for jobs based on the size of the automated digital model.
  • Autonomous vehicles: It is normal for a car to follow a lane and maintain a distance below. Getting enough details to accurately track all objects in street migration and unpredictable lighting, however, leads many to use structured lighting, which is more expensive, larger and more elaborate.
  • Automatic Retail: Shop owners and mall operators typically use machine vision algorithms to track shopping patterns. Some are experimenting with automatic charging of customers who pick up an item and do not return it. Robots with mounted scanners also track inventory to measure damage.

[Related: Researchers find that labels in computer vision datasets poorly capture racial diversity]

How established players are facing computer vision

The big technology companies all offer products with certain machine vision algorithms, but these are mostly focused on narrow and highly applicable tasks, such as sorting photo collections or moderating social media posts. Some, like Microsoft, retain a large research staff exploring new topics.

Google, Microsoft and Apple, for example, offer photography websites for their clients that store and list users’ photos. Using facial recognition software to sort the collection is a valuable feature that makes it easy to find specific photos.

Some of these features are sold directly to other companies as APIs for implementation. Microsoft also offers a database of celebrity facial features that can be used to organize images collected over the years by news media. People looking for their “celebrity twin” can also find the closest match in the collection.

Some of these tools provide more detailed details. Microsoft’s API, for example, provides a “describe image” feature that will search multiple databases for identifiable details in an image, such as the appearance of a major landmark. The algorithm will also give a confidence score to measure the object’s descriptions as well as how accurate the description can be.

Google’s cloud platform gives users the option to either train their own models or rely on a large collection of pre-trained models. There is also a prebuilt system focused on delivering visual product search for companies organizing their listings.

AWS’s identification service focuses on classifying images with facial metrics and trained object models. It also offers celebrity tagging and content moderation options for social media applications. A prebuilt application has been designed to enforce workplace safety rules by watching video footage to make sure every visible employee wears personal protective equipment (PPE).

Major computing companies are also heavily involved in the search for autonomous travel, a challenge that relies on many AI algorithms, but especially machine vision algorithms. Google and Apple, for example, are widely reported to be developing cars that use multiple cameras to plan routes and avoid obstacles. They rely on a combination of traditional cameras as well as some that use laser-like structured lighting.

Machine Vision Startup Scene

Many machine vision startups are focusing on applying the theme for the construction of autonomous vehicles. Startups such as Waymo, Pony AI, Wayve, Aeye, Cruise Automation and Argo are some of the significantly funded startups that are building software and sensor systems that will allow cars and other platforms to navigate the streets themselves.

Some manufacturers are using algorithms to help enhance their product line by guiding robotic assembly or checking parts for errors. Saked Vision, for example, creates three-dimensional scans of products to see defects. VO Robotics has created a visual system to monitor “workcells” to observe dangerous interactions between humans and robotic devices.

Tracking humans as they move around the world is a great opportunity, whether for safety, security or compliance reasons. For example, VergeSense is developing a “workplace analytics” solution that hopes to optimize how companies use shared offices and hot desks. Kairos builds privacy-savvy facial recognition tools that help companies get to know their customers and enhance the experience with options like more aware kiosks. AiCure recognizes patients by their face, distributes the correct medications and monitors them to make sure they are taking medication. TruFace looks to customers and employees to detect high temperatures and apply mask requirements.

Other machine vision companies are focusing on small works. Remini, for example, offers “AI Photo Enhancer” as an online service that will add detail to enhance images by increasing their apparent resolution.

What machine vision can’t do

The gap between AI and human capability is, perhaps, greater than other areas such as voice recognition for machine vision algorithms. Algorithms succeed when they are asked to identify objects that are largely immutable. For example, people’s faces are largely fixed, and the collection of the ratio of the distance between the main features, such as the corners of the nose and eyes, rarely varies much. So image recognition algorithms specialize in finding large collections of photos for faces showing similar ratios.

But also basic concepts such as understanding what makes a chair confusing with variation. There are thousands of different types of things where people can sit, and maybe even millions of examples. Some are creating databases that look for specific replicas of known items but it is often difficult for machines to properly classify new items.

The specific challenge comes from the quality of the sensors. The human eye can operate in a wide range of light, but digital cameras have difficulty matching the display when the light is low. On the other hand, there are some sensors that can detect colors outside the range of rods and cones in the human eye. An active field of research is using this vast capacity to allow machine vision algorithms to discover objects that are literally invisible to the human eye.

read more: How will AI be used ethically in the future? AI Responsibility Lab has a plan

Similar Posts

Leave a Reply

Your email address will not be published.