Multiple Coloured Object Tracking using Thresholding.

In this post I will show you how to track multiple objects. But first we must learn to track a single object

Here I have made small finger cap type thingy with a blue paper and placed it on my finger. My aim was to control the mouse through my hand. Hence such an arrangement.

I have used emguCV a wrapper of C# over openCV. But openCV users can also refer to the post because the high level functions and the theories are exactly similar.

Step 1: The first part is to detect the object through its colour. Here our object has blue colour.

We would need to threshold this colour .But thresholding is difficult in RGB namespace, mainly because each channel is stored as grayscale image. The intensity changes the colour itself.Hence we have the HIS or HSV space .

HSV-Hue, Saturation and Value.

Hue represents the colour. The colour information irrespective of the intensity .

Saturation-Represents the amount of light in the colour.

Value(V) or Intensity(I) -> Brightness.

HSV color space is very similar to the way humans understand color. No one says that a car has a color of 50% blue, 20% Red and 30% Green. Instead we see the hue .So first we need to convert our image to HSV space.

    Image<Hsv, Byte> hsvimage = frame.Convert<Hsv,Byte>();

Now that we have our image, we would proceed to thresholding. The result of thresholding can be seen left bottom camera feed.

    MCvScalar min = new
MCvScalar(104, 150, 150);

MCvScalar max = new
MCvScalar(120, 255, 255);

CvInvoke.cvInRangeS(hsvimage, min, max, temp);

First to threshold the image we need to scalars .The pixels present in between the two pixels will be changed to 255 and the rest to black.The first parameter in MCvScalar is the Hue, second is the Saturation and the third is the intensity .You need to adjust these based on the colour you are thresholding.

After the thresholding, we now have to find the position of the object. This is done using moments. I will touch upon moments in some other post.

MCvMoments moment = new

CvInvoke.cvMoments(temp,ref moment,1);

double moment10 = CvInvoke.cvGetSpatialMoment(ref moment, 1, 0);

double moment01 = CvInvoke.cvGetSpatialMoment(ref moment, 0, 1);

First we create the object of MCvMoments type which is used to store the moments which will be calculated in the nextstep.the cvMoments(<thresholdedimage>,<reference to the MCvMoment object>,<isBinaryFlag>);

If the is binary flag is non zero,any non zero pixels are treated as one.

The moments are then calculated and stored.

Now to calculate the area .

double area = CvInvoke.cvGetSpatialMoment(ref moment, 0, 0);

From the moments and the area, we get the centre of gravity(the position of the object).

posX = (int)(moment10 / area);

posY = (int)(moment01 / area);

Image<Bgr, Byte> tracking = new
Image<Bgr, byte>(frame.Size);

CvInvoke.cvCircle(frame, new
Point(posX, posY), 4, new
MCvScalar(0, 255, 0), 5, LINE_TYPE.EIGHT_CONNECTED, 0);

Thus we get the position by dividing the moments by the area.

The end result can be seen in the bottom left of the diagram.

Multiple object Tracking. Here I made a small app to draw with my fingers.I used a blue cap and yellow cap,made out of paper.I will paste the code for your reference.

void ProcessFrame(object sender, EventArgs e)


frame = _videoStream.RetrieveBgrFrame();//Retrieve the frame from thecamera

Point new_yellow = TrackYellow(frame); //Track the yellow colour through adifferent value of the scalar

Point new_blue=TrackBlue(frame); //Tracking blue colouras I did previously

//Sometimes when the fingers go out of frame, it may result in noise or the tracker circle going haywire.Since I made a paint app,I did not want unnecessary lines drawn

if (blue.X != 0 && blue.Y != 0 && new_blue.X!=0 && new_blue.Y!=0 )


//Draw a line from the previous position to the current position

CvInvoke.cvLine(img, blue, new_blue, new
MCvScalar(255, 0, 0), 5, LINE_TYPE.EIGHT_CONNECTED,0);


        //Both the tracking takes place in the same frame.

if ( yellow.X != 0 && yellow.Y != 0 && new_yellow.X!=0 && new_yellow.Y!=0 )


CvInvoke.cvLine(img, yellow, new_yellow, new
MCvScalar(0, 255, 255), 5, LINE_TYPE.EIGHT_CONNECTED, 0);


        The new position gets stored in the old position for the next iteration

yellow = new_yellow;

blue = new_blue;

TrackImagebox1.Image = img;


This is the final result. This had been made by hand movements J.

Sometimes there are background noises ,you can filter having a condition based on area.

Those that did not understand the moments part.I will be back with a blog post on the same. J

License Plate detection with Connected Component Analysis

If you have read my previous posts then probably you know everything that went into developing this.

First I will describe the fine prints of the process.

Step 1: Standard smoothing process using the Gaussian technique.

Step 2:Morphological Operations:TOP HAT

Now in most cases generally people use Black Hat operation for connected component analysis.But here we have used TOP HAT simply because we are trying to detect the License plates which will be lighter than its neighbors.TOP HAT exaggerates those portions which are LIGHTER than the surroundings.Our logic is that the license plate mostly contain light portions.


As you can see the depending on the difference of the light regions with the surrounding regions ,the intensity depends.Unfortunately our license plate isn’t the lightest region.

Step 3 ,

Now the obvious result is to threshold it so that the light regions are more prominent for contour extraction.

I have thresholded the entire pixels above 128 to 255.This based on the general features.

Step 4,

Contour extraction.Now First I want to show the contours normally detected in this image.


These are the blobs(contours) that are present.You can see we have a large collection of insignificant contours that are either too large or too small.

So now we would like to ignore too large bobs or too small bobs.

So we first filter the blobs by area.I have kept the out put are to anything between 600 to 10000.Anything above it gets discarded.

Next we filter again on the basis of ratio.We check each of the the blobs to find the ratio.I have kept the Width/height ratio range 2 to 5.4

So if we filter out the blobs we get



Some other examples.

In Indian scenarios this doesn’t always work cause vehicles don’t obey the norms.But then the method is good enough.But it detects some false positives too.




So now the code is openCV.I have used the BLOB RESULT library.It is the best for handling bobs as it supports contour labeling.

Here goes the code.

  1: 	IplImage* img1=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  2: 	cvConvertImage(img, img1,0);
  3: 	IplImage* img2=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  4: 	IplImage* img3=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  5: 	cvSetZero(img2);
  6: 	cvSetZero(img3);
  7: 	IplImage* img_temp=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  8: 	IplImage* cc_color=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
 10: 	 CvMemStorage *mem;
 11: 	mem = cvCreateMemStorage(0);
 12: 	CvSeq *contours = 0;
 13:     CvSeq *ptr,*polygon;
 14: 	  cvSmooth( img1, img1, CV_GAUSSIAN, 3, 3 );
 15: 	cvMorphologyEx(img1,img1,img2,cvCreateStructuringElementEx(21,3,10,2,CV_SHAPE_RECT,NULL),CV_MOP_TOPHAT,1);
 16: 	display(img1);
 17: 	cvThreshold(img1,img1,128,255,CV_THRESH_BINARY | CV_THRESH_OTSU);
 19: 	 cvSaveImage("thres.png",img1,0);
 20: 	display(img1);
 21: 	cvSetZero(img2);
 23: 	CBlobResult blobs;
 24: 	blobs=CBlobResult(img1,NULL,0);
 26: 	blobs.Filter(blobs,B_INCLUDE,CBlobGetArea(),B_INSIDE,600,10000);
 28: 	cvConvertImage(img, img_temp,0);
 29: 	cvCopy(img_temp,cc_color);
 30: 	CBlob *currentBlob;
 33: 	int i;
 34: 	for (i = 0; i < blobs.GetNumBlobs(); i++ )
 35: 	{
 36: 			currentBlob = blobs.GetBlob(i);
 38: 			//currentBlob->FillBlob(img, CV_RGB(255,100,0));
 39: 			int s_x=currentBlob->GetBoundingBox().x;
 40: 			int s_y=currentBlob->GetBoundingBox().y;
 41: 			float width=currentBlob->GetBoundingBox().width;
 42: 			float height=currentBlob->GetBoundingBox().height;
 43: 			float ratio= width/height;
 44: 			if(ratio>2 && ratio<5.4)
 45: 			cvRectangle(img, cvPoint(s_x,s_y),cvPoint(s_x+width,s_y+height),CV_RGB(255,0,0),2,8,0);
 46: 			//display(img);
 47: 	}
 48: 	display(img);


Taking cue from my previous post .Our original image was 32

This is a high contrast image with the intensity of the letters much lower than the surroundings.In my previous post ,I showed how to condense the text into groups called blobs.The result was ,


Though the image may seem unintelligible,but this is what I would describe as a near perfect result.The next part is Binarizing the image with Canny edge detection.A big reason why we prefer canny edget detection is that the edge-field is binarized

via hysteresis Thresholding method. First strong edges are obtained with a high threshold value, then weak edges are included provided they are connected to strong edges.So this accentuates connected components even if some part of it is of low intensity.So the result after edge detection is as follows,


Now Canny gives us this image.Now we need to separate these images and process it.This is where Contour comes handy.cvFindContour as the name suggests finds the connected components from an image and stores them in a sequence of structures.

The diagram below shows 1 form  of storing.



The diagram below shows another hierarchical form of storing in which contours which are inside another contour ,is stored as a child of the contour.


It can also entirely hierarchical like a tree.Any way a simple google search will give you all the possible from of storing contours.Contours are stored as cvSeq meaning a sequence of curves.So basically contour analysis lets us take individually each contour and analyze it.

So basically now we just need to extract each contour,find its Bounding rectangle and map it too the original image.The contours extracted and their bounding rectangles mapped to the original image.



The total image generated with the text regions marked are .We can see when we use CvRect ,to bound the contour as a rectangle.Some bits and pieces of non contour regions also creep in.This is normal as all contours are not shaped as a rectangle.



The final  result is this.The arrow has also been marked.This can easily be eliminated by the OCR.So now we have successfully isolated the portion of texts.


The source code is as follows

  1: IplImage* img = cvLoadImage("C:\\samples\\test\\32.jpg");
  2: 	IplImage* img1=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  3: 	cvConvertImage(img, img1,0);
  4: 	IplImage* img2=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  5: 	IplImage* img3=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  6: 	cvSetZero(img2);
  7: 	cvSetZero(img3);
 10: 	 CvMemStorage *mem;
 11: 	mem = cvCreateMemStorage(0);
 12: 	CvSeq *contours = 0;
 13:     CvSeq *ptr,*polygon;
 15: 	cvMorphologyEx(img1,img1,img2,cvCreateStructuringElementEx(21,3,10,2,CV_SHAPE_RECT,NULL),CV_MOP_TOPHAT,1);
 16: 	display(img1);
 18: cvThreshold(img1,img1,128,255,CV_THRESH_BINARY);
 19: 	 cvSaveImage("thres.png",img1,0);
 20: 	display(img1);
 21: cvSetZero(img2);
 23: 	cvSmooth(img1, img1, CV_GAUSSIAN, 3, 3 );
 24: 	cvSaveImage("smooth-gaussian.png",img1,0);
 25: 	display(img1);
 26: 	cvDilate(img1,img1,cvCreateStructuringElementEx(21,3,10,2,CV_SHAPE_RECT,NULL),2);
 27: 	display(img1);
 28: 	cvCanny(img1,img1,500,900,3);
 29: 	display(img1);
 30: 	  cvSaveImage("canny.png",img1,0);
 31: 	cvFindContours(img1, mem, &contours, sizeof(CvContour), CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE, cvPoint(0,0));
 33: 	for (ptr =contours; ptr != NULL; ptr = ptr->h_next) 
 34: 	{
 36: 			  double reg = fabs(cvContourArea(ptr, CV_WHOLE_SEQ));
 37: 			//if(reg >600 && reg <10000)
 38: 			{
 39: 			CvScalar ext_color = CV_RGB( 255, 255, 255 ); //randomly coloring different contours
 40: 			cvDrawContours(img3, ptr,ext_color,CV_RGB(0,0,0), -1, CV_FILLED, 8, cvPoint(0,0));
 41: 			CvRect rectEst = cvBoundingRect( ptr, 0 );
 42: 			 CvPoint pt1,pt2;
 43:                 pt1.x = rectEst.x;
 44:                 pt1.y = rectEst.y;
 45:                 pt2.x = rectEst.x+ rectEst.width;
 46:                 pt2.y = rectEst.y+ rectEst.height;
 47: 				int thickness =1 ;
 48:                 cvRectangle( img, pt1, pt2, CV_RGB(255,255,255 ), thickness );
 49:  cvRectangle( img3, pt1, pt2, CV_RGB(255,255,255 ), thickness );
 50: 			//display( img);
 51: 			cvSetImageROI(img,rectEst);
 52: 			display(img);
 56: 			cvResetImageROI(img);
 60: 			}
 62: 	}
 63: 	 cvSaveImage("Detection-normal.png",img,0);
 64: 	  cvSaveImage("blobs.png",img3,0);

Morphological Operations for Text Detection in high contrast Images

In my stint with image processing.I have come to believe that the most important part of image processing is not the actual “detection” but the preprocessing that goes before the operation.In my previous post I had talked about how to use cvDilate .CvDilate is specifically used when the connected components are not properly connected.

As opposed to Dilation ,we have something called erosion which involves choosing of Maxima.In this post I will be showing you the effects of Morphological operations on an image with text.Our sample image will be an image with high contrast.Sample text with Image

Now our first step will be to convert this image to gray scale using cvConvertImage;

Now we apply a morphological operation called Top Hat.Before explaining Tophat operate let me tell you a little about “Closing” an image.Closing(Morphological Close) is basically dilation done after erosion.The reason for erosion is to eliminate noise and speckle in an image.The reason for use of erosion over blurring is that large significant regions do not get affected.It is the protrusions that get eroded.

After that Dilation is done to connect the components which are very close to each other giving rise to connected components.

Thus Morphological Close = Erosion and then Dilation.

Morphological Top Hat Operation:

TopHat(src) = src–open(src)

Thus TOP HAT reveals areas that are lighter than the surroundings,which is exactly that we would require in this case.Notice that color of the text is lighter than the surroundings.

The GrayScale Image


In my previous post while explaining Dilation I asked you to imagine a disc(Kernel) moving over the image and replacing all the cells that it covers with the local maxima.Now instead of a disc,I will be using my own kernel.

Making your kernel is pretty easy using cvCreateStructuringElementEx().I have used a 21*3 kernel,specifically because I want my algorithm to work on horizontal text like License Plates.

After application of TOPHAT morphological operations.


You can notice slight stretch marks sort of.That is because we have used a fairly large rectangular Kernel.The result is exactly as we wanted.Now the next step is Thresholding.

We will be apply Binary thresholding in which any pixel above 128 will be replaced with 255 and all the ones below it will become 0.Thus actually we are brightening the brighter parts of the image.

Result after Thresholding.


Now the stretch marks appear more visible cause they have been reduced to “O”

The image is also speckled so we need to smooth it using cvSmooth


Now the last step is obviously cvDilate as we can see gaps between the connected components.Example :The letter R which should have been together is filled with gaps.Now we need to apply dilation.But since we are also thinking about license plates,so we already know that these letters will be closely placed and it may be difficult to get each letter as a connected component.I will talk about that in my next post.

Anyway I again use my own kernel,optimized for horizontal closely placed texts.

Final Result.


You can see the texts are not visible at all instead they have been blurred together to form a Blob or a CONTOUR.This is exactly what you would want when going for license plates.Getting each letter is not only tedious but a waste of time .Localization can be best done when you can Coalesce the text into a blob.

The Source:

  1:         img3=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  2: 	IplImage *img_temp=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  3: 	cvSetZero(img_temp);
  4: 	cvConvertImage(img, img3,0);
  5: 	display(img3);
  6: 	IplImage *cc_color=cvCreateImage(cvGetSize(img3), IPL_DEPTH_8U, 3);
  7: 	cvMorphologyEx(img3,img3,img_temp,cvCreateStructuringElementEx(21,3,10,2,CV_SHAPE_RECT,NULL),CV_MOP_TOPHAT,1);
  8: 	display(img3);
  9: 	cvThreshold(img3,img3,128,255,CV_THRESH_BINARY);
 10: 	display(img3);
 11: 	cvSmooth(img3, img3, CV_GAUSSIAN, 5, 5 );
 12: 	display(img3);
 13: 	cvDilate(img3,img3,cvCreateStructuringElementEx(21,3,10,2,CV_SHAPE_RECT,NULL),2);
 14: 	display(img3);

I have used a high contrast image so that it resembles a typical License Plate.

The next steps will be done in my next post.Stay Tuned. Smile

Dilation and Edge detection for License Plate recognition.

One of  first thing that one needs to learn in image processing is Connected Component Analysis.Connected components are large discrete regions of similar pixel intensity.There is really no point in talking about what exactly is “connected component analysis ” since you already have Wikipedia for that purpose . :-)

OpenCv provides various methods for connected component analysis.The foremost methods are morphological operations and Contour extraction.The basic morphological transformations are called dilation and erosion, and they arise in a wide variety of contexts such as removing noise, isolating individual elements, and joining disparate elements in an image.

With respect to License Plate Recognition ,I have found dilation particularly effective after localization of the license plate.Let me show you an example of using canny and dilation together.

Dilation involves scanning an image with a kernel (may be user define ,or may be the default ones).Think about a 3*3 square.The square slides across the image.At any moment it is covering  9 pixels.The local maxima is computed and all the pixels are replaced with the values of that maxima.

With respect to License plates lets say we have a sample of partly localized preprocessed license plate.The original image is given below.


Most people would be tempted to apply canny edge detection for the text detection phase.So the result after cvCanny(img,img,100,500) is

canny without dilation

As you can see a lot of unnecessary noise that can hamper your text detection process.Adjusting the threshold also doesn’t work every time.

So the better process would be to apply Dilation on the original Image ,this is the result after dilation.

dilation first

Then apply Canny edge detection.

dilation then canny

This image now forms the base for connected component analysis and text detection as well as license plate localization.The image is much more cleaner and is devoid of noise.The code is pretty basic


</p> <p>cvCanny(img3,img3,100,500);</p> <p>cvDilate(img3,img3,0,2);</p> <p>display(img3);</p> <p>

Cropping and copying an image using cvSetImageROI

Firstly,OpenCv provides you many ways to initialize an IplImage.We will look at two such ways

  • cvCreateImage
  • cvCloneImage
  • cvCopy
Though these may seem very easy,but actually its very tricky .
So lets come to the usage of cvCreateImage.
IplImage *img1 ,*img2;
img1=cvCreateImage(cvSize(640,480),8, 3 );
This is similar to creation of a “canvas” in photoshop .
cvCopy()–>Copies an image from to another
cvCloneImage–>clones an image.
Details given below
now lets us assume we already have an image img and we wish to crop it a region of res “640*480″.
IplImage* img = cvLoadImage(“your path name”);
Step 1:
Set the Region of Interest.ROI is the region that you will be working.All processing done after this will only be done on the ROI.The other remaining parts of the image will be left “as-is” .
Step 2:Create a Blank canvas to copy in.
img1=cvCreateImage(cvSize(640,480),img->depth, 3 );
Use cvCreateImage(cvGetSize(img),img1->depth, 1 ); if you do not know the size of the original image
Step 3:
Now the obvious logic will be to apply “cvCopy” ,since I have already stated that all further processing will be done on the ROI.
You have successfully cropped the image.
Sometimes you don’t need to “Crop” an image.At some later point you may wish to get the original unaffected parts of an image.
Let me explain with an example
setting ROI of an image
After some processing we may think  that we need to copy the entire image to another Iplimage.One process would be to remove the ROI using
copying and then again applying the ROI.This would mean recalculating the ROI which is not at all optimal.So we have
cvCloneImage() .

Contrary to cvCopy() ,cvCloneImage() is always applied to the whole image .The entire image gets copied along with its ROI.So if we display img1 only the portion “640 *480″ will be visible.At first the effect may be the same as “cropping” ,but if we use

cvResetImageROI(img2) we will get the entire image.

The code is pretty easy,inCase you want it ,you can contact me.