Difference between information extraction and relation extraction in NLP

Author Topic: Difference between information extraction and relation extraction in NLP  (Read 2019 times)

Offline Nazia Nishat

  • Full Member
  • ***
  • Posts: 132
  • Test
    • View Profile
Information Extraction:When we are doing Information Retrieval , we care about each document individually and our intention is to look what is there in it. The intention is to assimilate the documents quickly and hence we rely on removing stop words, use stemming and synonyms and then we are left with important features in each document.If we are performing a text classification, then we are likely to train our model based on these features after doing careful feature selection, the output would be a classified labeled for that document. we can get useful information like whether it was a spam, related to product, related to service experience, related to complaint, advocate of a brand, a customer likely to purchase, etc.

Relation Extraction: It is about finding relation between two documents or two entities. The most simple relation extraction is finding similarity between two documents, clustering can also be associated as relation extraction, but similarity and clustering doesn't actually extract relation based on which we can take actions. So, we need something where we know the relation between the entities. Relation extraction is mostly done for the entities like People, Cities, Zipcodes, Movies, Restaurant, dealerships, etc.
 
See details in:https://www.quora.com/What-is-the-difference-between-information-extraction-and-relation-extraction-in-NLP

Offline Nazia Nishat

  • Full Member
  • ***
  • Posts: 132
  • Test
    • View Profile
Relation extraction:
Relation extraction plays an important role in extracting structured information from unstructured sources such as raw text. One may want to find interactions between drugs to build a medical database, understand the scenes in images, or extract relationships among people to build an easily searchable knowledge base.

For example, let's assume we are interested in marriage relationships. We want to automatically figure out that "Michelle Obama" is the wife of "Barack Obama" from a corpus of raw text snippets such as "Barack Obama married Michelle Obama in...". A naive approach would be to search news articles for indicative phrases, like "married" or "XXX's spouse". This would yield some results, but human language is inherently ambiguous, and one cannot possibly come up with all phrases that indicate a marriage relationship. A natural next step would be to use machine learning techniques to extract the relations. If we have some labeled training data, such as examples of pairs of people that are in a marriage relationship, we could train a machine learning classifier to automatically learn the patterns for us. This sounds like a great idea, but there are several challenges:

   How do we disambiguate between words that refer to the same entity? For example, a sentence may refer to "Barack Obama" as "Barack" or "The president".
 How do we get training data for our machine learning model?
How do we deal with conflicting or uncertain data?

Entity linking:

Before starting to extract relations, it is a good idea to determine which words refer to the same "object" in the real world. These objects are called entities. For example, "Barack", "Obama" or "the president" may refer to the entity "Barack Obama". Let's say we extract relations about one of the words above. It would be helpful to combine them as being information about the same person. Figuring out which words, or mentions, refer to the same entity is a process called entity linking. There are various techniques to perform entity linking, ranging from simple string matching to more sophisticated machine learning approaches. In some domains we have a database of all known entities to link against, such as a dictionary of all countries. In other domains, we need to be open to discovering new entities.

Dealing with uncertainty:

Given enough training data, we can use machine learning algorithms to extract entities and relations we care about. There is one problem left: human language is inherently noisy. Words and phrases can be ambiguous, sentences are often ungrammatical, and spelling mistakes are frequent. Our training data may have errors in it as well, and we may have made mistakes in the entity linking step. This is where many machine learning approaches break down: they treat training or input data as "correct" and make predictions using this assumption.

DeepDive makes good use of uncertainty to improve predictions during the probabilistic inference step. For example, DeepDive may figure out that a certain mention of "Barack" is only 60% likely to actually refer to "Barack Obama", and use this fact to discount the impact of that mention on the final result for the entity "Barack Obama". DeepDive can also make use of domain knowledge and allow users to encode rules such as "If Barack is married to Michelle, then Michelle is married to Barack" to improve the predictions.

Source:http://deepdive.stanford.edu/relation_extraction

Offline Nazia Nishat

  • Full Member
  • ***
  • Posts: 132
  • Test
    • View Profile
What does DeepDive do?
DeepDive is a trained system that uses machine learning to cope with various forms of noise and imprecision. DeepDive is designed to make it easy for users to train the system through low-level feedback via the Mindtagger interface and rich, structured domain knowledge via rules. DeepDive wants to enable experts who do not have machine learning expertise. One of DeepDive's key technical innovations is the ability to solve statistical inference problems at massive scale.

DeepDive asks the developer to think about features—not algorithms. In contrast, other machine learning systems require the developer think about which clustering algorithm, which classification algorithm, etc. to use. In DeepDive's joint inference based approach, the user only specifies the necessary signals or features.

DeepDive systems can achieve high quality: PaleoDeepDive has higher quality than human volunteers in extracting complex knowledge in scientific domains and winning performance in entity relation extraction competitions.

Further Reading:http://deepdive.stanford.edu/

Offline iftekhar.swe

  • Full Member
  • ***
  • Posts: 144
  • মানুষ তার স্বপ্নের সমান বড়
    • View Profile
    • DIU_SWE Faculty
Re: Difference between information extraction and relation extraction in NLP
« Reply #3 on: September 06, 2018, 03:27:25 PM »
DeepDive asks the developer to think about features—not algorithms. In contrast, other machine learning systems require the developer think about which clustering algorithm, which classification algorithm, etc. to use. In DeepDive's joint inference based approach, the user only specifies the necessary signals or features.

amazing technique
_________________________
MD. IFTEKHAR ALAM EFAT
Sr. Lecturer
Department of Software Engineering, FSIT
Daffodil International Univeristy

Offline s.arman

  • Sr. Member
  • ****
  • Posts: 260
  • Test
    • View Profile
good one