Training Spacy matcher for Location extraction
If you want to extract location from a sentence, then below solution will help you to do so. As you know NER(Named Entity Recognition) works well if you are dealing with some Internationl location, But if your task is to extract local location from a sentence then NER wouldn’t work or you have to train NER for the local locations as well. But if you are having a limited number of locations and you want to extract it from the sentence then give a try to Spacy Matcher.
First you have to train it with all the availble location then it will do the extraction magic for you.
# Load required modules
from spacy.matcher import Matcher
from spacy.attrs import IS_PUNCT, LOWER
import spacy
nlp = spacy.load('en')
matcher = Matcher(nlp.vocab)
There is a specific pattern to train Sapcy Matcher-
E.g pattern = {‘HelloWorld’: [{‘LOWER’: ‘hello’}, {‘LOWER’: ‘world’}]}
def skillPattern(skill):
pattern = []
for b in skill.split():
pattern.append({'LOWER':b})
return pattern
def buildPatterns(skills):
pattern = []
for skill in skills:
pattern.append(skillPattern(skill))
return list(zip(skills, pattern))
def on_match(matcher, doc, id, matches):
return matches
def buildMatcher(patterns):
for pattern in patterns:
matcher.add(pattern[0], on_match, pattern[1])
return matcher
def cityMatcher(matcher, text):
skills = []
doc = nlp(unicode(text.lower()))
matches = matcher(doc)
for b in matches:
match_id, start, end = b
print doc[start : end]
cities = [ u'delhi',
u'bengaluru',
u'kanpur',
u'noida',
u'ghaziabad',
u'chennai',
u'hydrabad',
u'luckhnow',
u'saharanpur',
u'dehradun',
u'bombay']
patterns = buildPatterns(cities)
city_matcher = buildMatcher(patterns)
### Size of dictionary
len(city_matcher)
11
cityMatcher(city_matcher, "I am from Saharanpur, i live in bengaluru..")
saharanpur
bengaluru
Written on May 21, 2018