NLP: Text Summarization and Keyword Extraction on Property Rental Listings — Part 1 | by Daniel Kristiyanto

Whereas summaries are useful, key phrases have completely different functions. Key phrases seize probably the most important features that potential renters is perhaps searching for. To extract key phrases, we are able to use NLP methods akin to Named Entity Recognition (NER). This course of goes past simply figuring out frequent phrases. We will extract crucial data by contemplating components like phrase co-occurrence and relevance to the area of rental listings. This data could be a single phrase, akin to ‘luxurious’ (adjective), ‘Ginza’ (location), or a phrase like ‘quiet atmosphere’ (noun phrases) or ‘close to to Shinjuku’ (proximity).

Evaluating NER: SpaCy’s built-in NER performs effectively, however sure entity varieties may require extra coaching knowledge for optimum accuracy. (NER stands for Named Entity Recognition, GPE: Geo Political Entity)

3a. Degree: Simple — Regex

The ‘discover’ operate in string operations, together with common expressions, can do the job of discovering key phrases. Nonetheless, this strategy requires an exhaustive record of phrases and patterns, which is typically not sensible. If an exhaustive record of key phrases to search for is obtainable (like inventory alternate abbreviations for finance-related tasks), regex is perhaps the best technique to do it.

3b. Degree: Intermediate — The Matcher

Whereas common expressions can be utilized for easy key phrase extraction, the necessity for in depth lists of guidelines makes it exhausting to cowl all bases. Thankfully, most NLP instruments have this NER functionality that’s out of the field. For instance, Pure Language Toolkit (NLTK) has Named Entity Chunkers, and spaCy has Matcher.

Matcher lets you outline patterns based mostly on linguistic options like part-of-speech tags or particular key phrases. These patterns might be matched in opposition to the rental descriptions to determine related key phrases and phrases. This strategy captures single phrases (like, Tokyo) and significant phrases (like, stunning home) that higher symbolize the promoting factors of a property.

noun_phrases_patterns = [
[{'POS': 'NUM'}, {'POS': 'NOUN'}], #instance: 2 bedrooms
[{'POS': 'ADJ', 'OP': '*'}, {'POS': 'NOUN'}], #instance: stunning home
[{'POS': 'NOUN', 'OP': '+'}], #instance: home
]# Geo-political entity
gpe_patterns = [
[{'ENT_TYPE': 'GPE'}], #instance: Tokyo
]
# Proximity
proximity_patterns = [
# example: near airport
[{'POS': 'ADJ'}, {'POS': 'ADP'}, {'POS': 'NOUN', 'ENT_TYPE': 'FAC', 'OP': '?'}], 
# instance: close to to Narita
[{'POS': 'ADJ'}, {'POS': 'ADP'}, {'POS': 'PROPN', 'ENT_TYPE': 'FAC', 'OP': '?'}] 
]

3c. Degree: Superior — Deep Studying-Primarily based Matcher

Even with Matcher, some phrases will not be captured by rule-based matching as a result of context of the phrases within the sentence. For instance, the Matcher may miss a time period like ‘a stone’s throw away from Ueno Park’ because it gained’t cross any predefined patterns, or mistake “Shinjuku Kabukicho” as an individual (it’s a neighborhood, or LOC).

In such instances, deep-learning-based approaches might be simpler. By coaching on a big corpus of rental itemizing with related key phrases these mannequin be taught the semantic relationships between phrases. This makes this technique extra adaptable to evolving language use and might uncover hidden insights.

Utilizing spaCy, performing deep-learning-based NER is simple. Nonetheless, the key constructing block for this technique is often the provision of the labeled coaching knowledge, as additionally the case for this train. The label is a pair of the goal phrases and the entity identify (instance: ‘a stone throw away’ is a noun phrase — or as proven in image: Shinjuku Kabukicho is a LOC, not an individual), formatted in a sure means. Not like rule-based the place we describe the phrases into noun, location, and others from the built-in performance, knowledge exploration or area knowledgeable are wanted to find the goal phrases that we wish to determine.

Half 2 of the article will talk about this method of discovering themes or labels from the info for matter modeling utilizing clustering, bootstrapping, and different strategies.

Source link

The Invisible Revolution: How Vectors Are (Re)defining Business Success | by Felix Schmidt | Jan, 2025

Great Books for AI Engineering. 10 books with valuable insights about… | by Duncan McKinnon | Jan, 2025

AI Ethics for the Everyday User — Why Should You Care? | by Murtaza Ali | Jan, 2025

Despite return, Rams should still prepare for future without Stafford

New Coin Listing – Sealana Crypto Presale Hits $5 Million, 24 Hours Left

Financial Peace University vs. True Financial Freedom vs. Crown Financial MoneyLife

Nigeria not an easy place for startups

Best AI Nude Generators Revealed (2024)

Our Picks

Iran Foreign Ministry spokesman confers with Ethiopian counterpart in Moscow

Let’s Learn a Little About Computer Vision via Sudoku | by Brian Roepke | Dec, 2024

65 million harmful posts deleted in Nigeria, says NITDA

Most Popular