SpaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.
If you’re operating with plenty of text, you’ll eventually want to know more about it. For example, what’s it about? What do the phrases suggest in context? Who is doing what to whom? Which texts are just like every other?
Certainly, spaCy can resolve all the problems stated above.
Linguistic Features in SpaCy
SpaCy goes about as an all-inclusive resource for different tasks used in NLP projects. For instance, Tokenization, Lemmatisation, Part-of-speech(POS) labeling, Name substance acknowledgment, Dependency parsing, Sentence Segmentation, Word-to-vector changes, and other cleaning and standardization text methods.

Installation of SpaCy
!pip install -U spacy
!pip install -U spacy-lookups-data
!python -m spacy download en_core_web_sm
Once we’ve downloaded and installed a model, we will load it via spacy.load(). spaCy has different types of pre-trained models. In addition, the default model for the English language is en_core_web_sm.
Moreover, the NLP object is a language instance of the spaCy model. And, this will return a Language object containing all components and data needed to process text.
import spacy nlp = spacy.load('en_core_web_sm')
Tokenization in SpaCy
Tokenization is the task of splitting a text into meaningful segments called tokens. The input to the tokenizer is a Unicode text and the output is a Doc object.
In addition, a Doc is a sequence of Token objects. Each Doc consists of individual tokens, and we can iterate over them.
doc = nlp('We are learning SpaCy library today')
for token in doc:
print(token.text)

Part-of-speech tagging
Part of speech tagging is the process of assigning a POS
tag to each token depending on its usage in the sentence.
doc = nlp('We are learning SpaCy library today')
for token in doc:
print(f'{token.text:{15}} {token.lemma_:{15}} {token.pos_:{10}} {token.is_stop}')

Dependency Parsing
Dependency Parsing is the process of extracting the dependency parse of a sentence to represent its grammatical structure. It defines the dependency relationship between headworks and their dependents.
The head of a sentence has no dependency and is called the root of the sentence. The verb is usually the head of the sentence. And, headwork is related to all other words.
doc = nlp('We are learning SpaCy library today')
for chunk in doc.noun_chunks:
print(f'{chunk.text:{30}} {chunk.root.text:{15}} {chunk.root.dep_}')

Lemmatization
Work-related tokenization, lemmatization is the method of decreasing the word to its base form, or origin form. This reduced form or root word is called a lemma.
For example, organizes, organized and organizing are all forms of organize. Here, organize is the lemma.
Lemmatization is necessary because it helps to reduce the inflected forms of a word. So that they can be analyzed as a single item. It can also help you normalize the text.
doc = nlp('We are learning SpaCy library today') for token in doc: print(token.text, token.lemma_)

Sentence Boundary Detection
Sentence Segmentation is the process of locating the start and end of sentences in a given text. This allows you to divide a text into linguistically meaningful units. SpaCy uses the dependency parse to determine sentence boundaries. And to extract sentences in spaCy sents property is used.
doc = nlp('First Sentence. Second Sentence. Third Sentence.') print(list(doc.sents))

Named Entity Recognition
Named Entity Recognition (NER) is the process of locating named entities in unstructured text. After that classifying them into pre-defined categories. Such as person names, organizations, locations, monetary values, percentages, time expressions, and so on.
In order to improve the keyword search, we populate tags for a set of documents. Named entities are available as the ents property of a Doc.
doc = nlp('We are learning SpaCy library today')
for ent in doc.ents:
print(ent.text, ent.label_)

Similarity
The similarity is determined by comparing word vectors or “word embeddings”, multi-dimensional meaning representations of a word.
As you can see in the example, The words “dog”, “cat” and “banana” are all pretty common in English, so they’re part of the pipeline’s vocabulary, and come with a vector. The word “afskfsd” on the other hand is a lot less common and out-of-vocabulary – so its vector representation consists of 300 dimensions of 0.
tokens = nlp("dog cat banana afskfsd") for token in tokens: print(token.text, token.has_vector, token.vector_norm, token.is_oov)

Conclusion
In conclusion, spaCy is a modern, reliable NLP framework that quickly became the standard for doing NLP with Python. Its main advantages are speed, accuracy, extensibility.
We have gained insights into linguistic Annotations like Tokenization, Lemmatisation, Part-of-speech(POS) tagging, Entity recognition, Dependency parsing, Sentence segmentation, and Similarity.