A preprint study published this week by coauthors at Google Research describes Entities as Experts (EAE), a new type of machine learning model that can access memories of entities (e.g., people, places, organizations, dates, times, and figures) mentioned in a piece of sample text. They claim it outperforms two state-of-the-art models with far less data while capturing more factual knowledge and is more modular and interpretable than the Transformers architecture on which it’s based.
If peer review bears out the researchers’ claims about EAE, it could solve a longstanding natural language processing challenge: acquiring the knowledge needed to answer questions about the world without injecting entity-specific knowledge. In enterprise environments, EAE could form the foundation for chatbots that ingest corpora of domain-specific information and respond to questions about the corpora with information that’s most likely to be relevant.
EAE contains neurons (mathematical functions) arranged in layers that transmit signals from input data and adjust the strength (weights) of each connection, as with all deep neural networks. That’s how it extracts features and learns to make predictions, but because EAE is based on the Transformer architecture, it has attention. This means every output element is connected to every input element and the weightings between them are calculated dynamically.
Uniquely, EAE also contains entity memory layers that enable it to “understand” and respond to questions about text in a highly data-efficient way. The model learns knowledge directly from text, along with other model parameters (i.e., configuration variables estimated from data and required by the model when making predictions) and associates memories with specific entities, or data types like titles and numeric expressions.
As the coauthors explain “[For example,] a traditional Transformer would need to build an internal representation of Charles Darwin from the words ‘Charles’ and ‘Darwin,’ both of which can also be used in reference to very different entities, such as the Charles River or Darwin City. Conversely, EAE can access a dedicated representation of ‘Charles Darwin,’ which is a memory of all of the contexts in which this entity has previously been mentioned. This representation can also be accessed for other mentions of Darwin, such as ‘Charles Robert Darwin’ or ‘the father of natural selection.’ Having retrieved and re-integrated this memory, it is much easier for EAE to relate the question to the answer.”
To evaluate EAE, the researchers trained the model on Wikipedia articles, scraping hyperlinks to the articles and the Google Cloud Natural Language API for a total of 32 million contexts paired with over 17 million entity mentions. They kept only the top 1 million most frequent entities and reserved 0.4% of them for development and testing purposes. Then they pretrained the model from scratch for 1 million steps and fine-tuned the pretrained EAE over the course of 50,000 training steps on TriviaQA, a reading comprehension task in which questions are paired with documents. (From TriviaQA, 77% of training examples were kept — those that weren’t an entity were discarded.)
The team reports that on several “cloze” tests, where the model had to recover the words in a blanked-out mention by correctly associating the mention with its surrounding sentence context, EAE used only a small proportion of its parameters at inference time — roughly the top 100 entities for each mention in a given question — versus a baseline Transformer model. (The Transformer model used nearly 30 times the number of parameters.) Preliminary evidence also suggests that EAE had more factual knowledge than a baseline BERT model.
“[Our] analysis shows that the correct identification and reintegration of entity representations is essential for EAE’s performance,” wrote the coauthors. “Training EAE to focus on entities is better than a similar-sized network with an unconstrained memory store.”