There is nothing like a good benchmark to help promote a computer vision field.
That’s why a research team from the Allen Institute for AI, also known as AI2, recently worked with the University of Illinois at Urbana-Champaign to develop a new, unified benchmark called GRIT (General Robust Image Task). Computer Vision Model. Their goal is to help AI developers create the next generation of computer vision programs that can be applied to a number of common tasks – particularly complex challenges.
“We discuss, like weekly, the need to create more general computer vision systems that are capable of solving a variety of tasks and generalize in a way that existing systems cannot,” said Derek Hoime. Urbana-Champagne. “We realized that one of the challenges was that there was no better way to evaluate the system’s general vision capabilities. All current criteria are set up to evaluate systems that have been specially trained for that benchmark. ”
What is required for normal computer vision models
According to Tanmay Gupta, those who have a Ph.D. After receiving AI2 joined as a research scientist. From the University of Illinois at Urbana-Champaign, other efforts have been made to try to create multitask models that can do more than one thing – but a general purpose model requires more than just three or four different functions.
“Often you don’t know ahead of time what all the functions the system will need to do in the future,” he said. “We want to make the architecture of the model so that any system from a different background can give natural language instructions.”
For example, he explained, someone might say “describe the image” or “find the brown dog” and the system could execute that instruction and either return the bounding box – a rectangle around the dog you’re referring to. – Or return the caption ‘There’s a brown dog playing on the green field.’ Therefore, it was a challenge to develop a system that could implement instructions, including instructions that it had never seen before and to do that for a wide range of tasks including splitting or bounding boxes or captions or answers to questions, “he said. .
Gupta added that the GRIT benchmark is a way to evaluate these capabilities so that the system can evaluate how strong it is for distortions in images and how common it is across different data sources. “Does it solve the problem of not just one or two or ten or twenty different concepts, but thousands of concepts?” He said.
Benchmarks has served as a driver for computer vision research
Hoeim said benchmarks have been a major driver for computer vision research since its inception. “When a new benchmark is created, if it is well-equipped to evaluate the type of research that interests people, it really facilitates the research that takes place, without having to re-implement the algorithms, making it easier to compare progress and evaluate innovations. . A lot of time, “he said.
Computer Vision and AI have made great strides in the last decade, he added. “You can see that in smartphones, home support and vehicle security systems, with AI and almost in a way that was not the case ten years ago,” he said. “We would go to computer vision conferences and people would ask, ‘What’s new?’ And we’ll say, ‘It’s still not working’ – but now things are starting to work.
However, the disadvantage is that the existing computer vision system is usually designed and trained to perform only certain tasks. “For example, you can create a system that can put boxes around vehicles and people and bicycles for driving applications, but then if you want it to put boxes around motorcycles too, you have to change the code and architecture and retrain it. Have to give. ”
GRIT researchers wanted to know how to create systems that are more similar to people, in the sense that they can learn to perform different types of tests. “We don’t have to change our bodies to learn how to do new things,” he said. “We want that kind of generality in AI, where you don’t have to change the architecture, but the system can do a lot of different things.”
The benchmark will advance the computer vision field
The vast computer vision research community, which publishes thousands of papers each year, has seen an increasing amount of work on making vision systems more common, he said, adding that different people reporting numbers on the same benchmark.
Researchers say they hope to build a workshop around the GRIT benchmark and will announce it at the 2022 conference on Computer Vision and Pattern Recognition on June 19-20. “Hopefully, it will encourage people to submit their methods, their new models and evaluate them on this benchmark,” Gupta said. “We hope to see significant work in this direction within the next year and see a lot of improvement in performance from where we are today.”
Due to the growth of the computer vision community, there are many researchers and industries that want to advance the field, Hoime said.
“They are always looking for new benchmarks and new problems to work on,” he said. “A good benchmark can change the big focus of the sector, so this is a great place to help the sector move forward, to overcome that challenge and build in this new direction.”