Concept Learning: The stepping stone towards Machine Learning with Find-S

Reading Time: 7 minutes

From our previous blog, we came across what awesome stuff a machine can do with machine learning and what all math stuff is required before you take a deep dive into machine learning. Now we all know the prerequisites for machine learning, so let’s start the journey towards machine learning with small but effective steps towards awesomeness.

Most of us always wonder how machines can learn from data, and predict future based on the available information considering facts and scenarios. Today we are living in an era where most of us are working globally on big data technologies having great efficiency and speed. But having a huge amount of data only is not the complete solution and optimal use of data until we are able to find patterns out of it and use those patterns to predict future events and identify our interest specific solutions.


To understand how a machine can learn from past experiences and predict future based on the experience gained, we need to understand the working of a human brain first. Once we find out the pattern of a human brain to solve a problem, we can make our machine to learn in an almost same way. Almost same way because the human brain has no limits and there is a lot to explore. As machine learning is a huge field of study, and there are a lot of possibilities, here we are going to discuss one of the most simple algorithms of machine learning which is called Find-S algorithm.

What is Learning?

There are several definitions available on the internet of learning. One of the simplest definitions is  “The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing something”. Similar to various definitions available of learning, there are various categories of learning methods.

As a human, we learn a lot of things during entire life. Some of them are based on our experience and some of them are based on memorization. On the basis of that we can divide learning methods into five parts:

1. Rote Learning (memorization): Memorizing things without knowing the concept/ logic behind them
2. Passive Learning (instructions): Learning from a teacher/expert.
3. Analogy (experience): Learning new things from our past experience.
4. Inductive Learning (experience): On the basis of past experience formulating a generalized concept.
5. Deductive Learning: Deriving new facts from past facts.

The inductive learning is based on formulating a generalized concept after observing a number of instances of examples of the concept. For an example, if a kid is asked to write an answer of 2*8=?. He/She can either use the rote learning method to memorize the answer, or he/she can use inductive learning using the examples like 2*1=2, 2*2=4… and so on, to formulate a concept to calculate the results. In this way, the kid will be able to solve a similar type of questions using the same concept.

Similarly, we can make our machine to learn from past data and make them intelligent to identify whether an object falls into a specific category of our interest or not.

What is concept learning?

In terms of machine learning, the concept learning can be formulated as “Problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits the training examples”-Tom Michell.  

Much of human learning involves acquiring general concepts from past experiences. For an example, humans identify different vehicles among all the vehicles based on some specific set of features defined over a large set of features. This special set of features differentiates the subset of cars in the set of vehicles. This set of features that differentiate cars, can be called a concept.

Similarly, machines can also learn from the concepts to identify whether an object belongs to a specific category or not by processing past/training data to find a hypothesis that best fits the training examples.

Target Concept:


The set of items/objects over which the concept is defined is called the set of instances and denoted by X. The concept or function to be learned is called the target concept and denoted by c. It can be seen as a boolean valued function defined over X and can be represented as:

c: X -> {0, 1}

So, if we have a set of training examples with specific features, of target concept c, the problem faced by the learner to estimate c that can be defined on training data.
H is used to denote the set of all possible hypotheses that the learner may consider regarding the identity of the target concept.
The goal of a learner is to find a hypothesis h which can identify all the objects in X so that:

h(x) = c(x) for all x in X

In this way there are three necessary things for an algorithm which supports concept learning:

1. Training data (Past experiences to train our models)
2. Target Concept (Hypothesis to identify data objects)
3. Actual data objects (For testing the models)

Inductive Learning Hypothesis:

As we discussed earlier, the ultimate goal of concept learning is to identify a hypothesis ‘h’ identical to target concept c over data set X with only available information about c is its value over X. So our algorithm can guaranty that it best fits on training data. In other words Any hypothesis found approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

Considering an example if a person goes to a movie or not based on 4 binary features with 2 values (true or false) possible:

1. Has Money -> <true, false>
2. Has Free Time -> <true, false>
3. It’s a Holiday -> <true, false>
4. Has Pending work -> <true, false>

And training data we have with two data objects as positive samples and one as negative:

x1: <true, true, false, false> : +ve
x2: <true, false, false, true> : +ve
<true, false, false, true> : -ve

Hypothesis Notations:

Each of the data objects represents a concept and hypotheses. Considering a hypothesis <true, true, false, false> is more specific because it can cover only one sample. To make a more generalized concept we can add some notations into this hypothesis. To this task we have following notations:

1.   (Represents a hypothesis which rejects all.)
2. < ? , ? , ? , ? >  (Accepts all)
3. <true, false, ? , ? >  (Accepts some)

The hypothesis ⵁ will reject all the data samples. The hypothesis <? , ? , ? , ? > will accept all the data samples. The ‘?’ notation indicates that the values of this specific feature do not affect the result.

In this way the total number of the possible hypothesis are:
(3 * 3 * 3 * 3) + 1
where 3 because one feature can have either true, false or ‘?’ and one hypothesis for rejects all (ⵁ).

General to the specific ordering of hypothesis:

Many machine learning algorithms rely on the concept general-to-specific ordering of hypothesis.

h1 = < true, true, ?, ? >
h2 = < true, ? , ? , ? >

Any instance classified by h1 will also be classified by h2. So we can say that h2 is more general than h1. Using this concept we can find a general hypothesis that can be defined over entire data set X.

Find-S Algorithm: Finding Maximally Specific Hypothesis:

To find out a single hypothesis defined on X we can use the concept more-general-then partial ordering. One way to do this is start with the most specific hypothesis from H and generalize this hypothesis each time it fails to classify and observed positive training data object as positive.

Step 1. The first step in Find-S algorithm is to start with most specific hypothesis that can be denoted by

h <- <ⵁ, ⵁ, ⵁ, ⵁ>

Step 2. This step involves picking up next training sample and apply step 3 on the sample.

Step 3. The next step involves observing the data sample. If the sample is negative the hypothesis remains unchanged and we pick next training sample by processing step 2 again else we process step 4.

Step 4. If the sample is positive and we find that our initial hypothesis is too specific because it does not cover the current training sample then we need to update our current hypothesis. This can be done by the pairwise conjunction (Logical AND operation) of current hypothesis and training sample.

If next training sample is <true, true, false, false> and current hypothesis is <ⵁ, ⵁ, ⵁ, ⵁ>, then we can directly replace our existing hypothesis with the new one.

If the next positive training sample is<true, true, false, true> and current hypothesis is <true, true, false, false> then we can perform a pairwise conjunctive AND with the current hypothesis and next training sample and find new hypothesis by putting ‘?’ in the place where the result of conjunction is false:

<true, true, false, true> ⴷ <true, true, false, false> = <true, true, false, ?>

Now we can replace our existing hypothesis with the new one:

h <-<true, true, false, ?>

Step 5. This step involves repetition of step 2 till we have more training samples.

Step 6. Once there are no training samples the current hypothesis is the one we were trying to find. We can use the final hypothesis for classifying the real objects.

A concise form of the Find-S algorithm:

Step 1. Start with h = ⵁ
Step 2. Use next input {x, c(x)}
Step 3. If c(x) = 0, go to step 2
Step 4. h <- h ⴷ x (Pairwise AND)
Step 5. If more examples : Go to step 2
Step 6. Stop

Limitations of the Find-S algorithm:

Find-S algorithm for concept learning is one of the most basic algorithms of machine learning with some limitation and disadvantages. Some of them are listed here:

1. No way to determine if the only final hypothesis (found by Find-S) is consistent with data or there are more hypothesis that is consistent with data.

2. Inconsistent sets of training examples can mislead the finds algorithm as it ignores negative data samples, so an algorithm that can detect inconsistency of training data would be better to use.

3. A good concept learning algorithm should be able to backtrack the choice of hypothesis found so that the resulting hypothesis can be improved over time. Unfortunately, Find-S provide no such method.

Many of the limitations can be removed in one most important algorithm of concept learning called Candidate elimination algorithm.

In our next blog, we will be explaining Find-S algorithm with a basic example. To explore Find-S implementation please visit the next part of this blog(Coming Soon).

Reference: Machine Learning-Tom-Michell


Written by 

Girish is a Software Consultant, with experience of more than 3.5 years. He is a scala developer and very passionate about his interest towards Scala Eco-system. He has also done many projects in different languages like Java and He can work in both supervised and unsupervised environment and have a craze for computers whether working or not, he is almost always in front of his laptop's screen. His hobbies include reading books and listening to music. He is self motivated, dedicated and focused towards his work. He believes in developing quality products. He wants to work on different projects and different domains. He is curious to gain knowledge of different domains and try to provide solutions that can utilize resources and improve performance. His personal interests include reading books, video games, cricket and social networking. He has done Masters in Computer Applications from Lal Bahadur Shastri Institute of Management, New Delhi.