AI models are becoming better at answering questions, but they’re not perfect

Did you miss a session from the Future of Work Summit? Visit our Future of Work Summit on-demand library to stream.

Let OSS Enterprise Newsletter Guide your Open Source Tour! Sign up here,

Late last year, the research institute Allen Institute for AI, founded by Microsoft co-founder Paul Allen, quietly unveiled a huge AI language model called Macaw. Unlike other language models that have recently caught people’s attention (see OpenAI’s GPT-3), Macaw is very limited in what it can do, just to answer and generate questions. But researchers behind Macaw claim that even though it is small in order, it can outperform GPT-3 on a set of questions.

Answering questions may not be the most compelling application of AI. But question-and-answer techniques are becoming increasingly valuable in the enterprise. Rising customer calls and email volumes during the epidemic encouraged businesses to turn to automated chat assistants – according to Statista, the size of the chatbot market will exceed $ 1.25 billion by 2025. But the AI ​​techniques of chatbots and other conversations remain fairly rigid, bound by questions. That they were trained.

Today, Allen Institute released an interactive demo to find Macaw as a complement to the GitHub repository containing Macaw’s code. The lab believes that the model’s performance and “practical” size – about 16 times smaller than GPT-3 – explains how large language models are becoming “commoditized” into something more widely accessible and deployable.

Answering questions

Built on UnifiedQA, Allen Institute’s previous attempt at a simple question-and-answer system, Macaw has thousands of yes / no questions, stories designed to test reading comprehension, explanations for questions, and fine-tuned datasets containing school science and English exam questions. Was done. . The largest version of the model – the version in the demo and it is open-sourced – has 11 billion parameters, significantly less than the 175 billion parameters of the GPT-3.

Given a question, Macaw can generate answers and explanations. If answered, the model can generate one question (alternatively multiple choice question) and explanation. Finally, if an explanation is given, McAvoy can give a question and answer.

Peter Clark of the Allen Institute and Ovind Tefjord, who were involved in the development of Macaw, said, “Macaw was created by training Google’s T5 Transformer model on about 300,000 questions and answers, collected from some existing datasets created by the natural language community over the years. ” , Said VentureBeat via email. Macaaw models were trained on Google Cloud TPU (v3-8). This training takes advantage of the pre-training already done by Google in their T5 model, thus avoiding significant costs (both cost and environment) in the construction of Macau. From the T5, we took 30 hours of TPU time in additional fine-tuning for the largest model.

Allen Institute Macau

Above: Examples of Macau capabilities.

Image Credit: Allen Institute

In machine learning, parameters are part of the model learned from historical training data. Generally speaking, in the field of language, the relationship between the number of parameters and sophistication is maintained remarkably well. But Macau punches above its weight. When macaws were tested by researchers at the Allen Institute on 300 questions specifically designed to “break”, Macaw outperformed not only the GPT-3 but also the latest Jurassic-1 jumbo model from AI21 Labs, which is even larger than the GPT-3. Is.

According to researchers, the Macau novel shows little ability to reason about predictable situations, which led him to ask “How do you make a home electricity?” Lets answer such questions. “Paint it with metal paint.” The model also indicates an awareness of the role of objects in different situations and appears to know what the application is, for example answering the question “If a bird does not have wings, what effect will it have?” With “he will be unable to fly.”

But the model has limitations. In general, “How old was Mark Zuckerberg when he founded Google?” He often makes mistakes in answering questions that require common sense logic, such as “What if I put a glass on a feathered bed?” (Macaw replies “The glass is torn”). In addition, the model generates overly concise answers; Breaks down when questions are repeated; And repeats the answers to certain questions.

Researchers have also noted that Macaw, like other large language models, is not free from prejudice and toxins that it can choose from datasets used to train it. Clarke added: “Macau is released without any restrictions. Being an open-ended generation model means that there are no guarantees about the output (in terms of bias, inappropriate language, etc.), so we expect it to be initially used for research purposes (e.g., what the current model is capable of). To study. From). “


Macaw will not be able to solve the current outstanding challenges in language model design, bias from them. Also, the model still needs powerful hardware properly and to run it – researchers recommend 48GB of total GPU memory. (Two of Nvidia’s 3090 GPUs, each with 24GB of memory, cost $ 3,000 or more – not counting the other components needed to use it.) But Macaw doing This shows that, according to the Allen Institute, competent language models are becoming more accessible than ever before. The GPT-3 is not open source, but if it were, it would cost at least $ 87,000 per year to run it on a single Amazon Web Services.

Allen Institute Macau

Macaw integrates with other open source, multi-task models that have been released over the years, including EleutherAI’s GPT-Neo and BigScience’s T0. DeepMind recently unveiled a model with 7 billion parameters, retro, which it claims can beat others 25 times its size using a huge database of text. Already, these models have found new applications and startups. Macaw – and other question-answering systems like it – may be willing to do just that.


VentureBeat’s mission is to become a digital town square for technical decision makers to gain knowledge about transformative technology and practices. Our site delivers essential information on data technologies and strategies so you can lead your organizations. We invite you to access, to become a member of our community:

  • Up-to-date information on topics of interest to you
  • Our newsletters
  • Gated idea-leader content and discounted access to our precious events, such as Transform 2021: Learn more
  • Networking features and more

Become a member

Similar Posts

Leave a Reply

Your email address will not be published.