HIGH CONTRAST TEXT DETECTION WITH CONNECTED COMPONENT ANALYSIS


Taking cue from my previous post .Our original image was 32

This is a high contrast image with the intensity of the letters much lower than the surroundings.In my previous post ,I showed how to condense the text into groups called blobs.The result was ,

dilate

Though the image may seem unintelligible,but this is what I would describe as a near perfect result.The next part is Binarizing the image with Canny edge detection.A big reason why we prefer canny edget detection is that the edge-field is binarized

via hysteresis Thresholding method. First strong edges are obtained with a high threshold value, then weak edges are included provided they are connected to strong edges.So this accentuates connected components even if some part of it is of low intensity.So the result after edge detection is as follows,

canny

Now Canny gives us this image.Now we need to separate these images and process it.This is where Contour comes handy.cvFindContour as the name suggests finds the connected components from an image and stores them in a sequence of structures.

The diagram below shows 1 form  of storing.

list

 

The diagram below shows another hierarchical form of storing in which contours which are inside another contour ,is stored as a child of the contour.

hierrachy

It can also entirely hierarchical like a tree.Any way a simple google search will give you all the possible from of storing contours.Contours are stored as cvSeq meaning a sequence of curves.So basically contour analysis lets us take individually each contour and analyze it.

So basically now we just need to extract each contour,find its Bounding rectangle and map it too the original image.The contours extracted and their bounding rectangles mapped to the original image.

12Capture

 

The total image generated with the text regions marked are .We can see when we use CvRect ,to bound the contour as a rectangle.Some bits and pieces of non contour regions also creep in.This is normal as all contours are not shaped as a rectangle.

blobs

 

The final  result is this.The arrow has also been marked.This can easily be eliminated by the OCR.So now we have successfully isolated the portion of texts.

Detection-normal

The source code is as follows

  1: IplImage* img = cvLoadImage("C:\\samples\\test\\32.jpg");
  2: 	IplImage* img1=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  3: 	cvConvertImage(img, img1,0);
  4: 	IplImage* img2=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  5: 	IplImage* img3=cvCreateImage(cvSize(img->width,img->height),img->depth, 1 );
  6: 	cvSetZero(img2);
  7: 	cvSetZero(img3);
  8: 
  9: 
 10: 	 CvMemStorage *mem;
 11: 	mem = cvCreateMemStorage(0);
 12: 	CvSeq *contours = 0;
 13:     CvSeq *ptr,*polygon;
 14: 
 15: 	cvMorphologyEx(img1,img1,img2,cvCreateStructuringElementEx(21,3,10,2,CV_SHAPE_RECT,NULL),CV_MOP_TOPHAT,1);
 16: 	display(img1);
 17: 	
 18: cvThreshold(img1,img1,128,255,CV_THRESH_BINARY);
 19: 	 cvSaveImage("thres.png",img1,0);
 20: 	display(img1);
 21: cvSetZero(img2);
 22: 	
 23: 	cvSmooth(img1, img1, CV_GAUSSIAN, 3, 3 );
 24: 	cvSaveImage("smooth-gaussian.png",img1,0);
 25: 	display(img1);
 26: 	cvDilate(img1,img1,cvCreateStructuringElementEx(21,3,10,2,CV_SHAPE_RECT,NULL),2);
 27: 	display(img1);
 28: 	cvCanny(img1,img1,500,900,3);
 29: 	display(img1);
 30: 	  cvSaveImage("canny.png",img1,0);
 31: 	cvFindContours(img1, mem, &contours, sizeof(CvContour), CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE, cvPoint(0,0));
 32: 	
 33: 	for (ptr =contours; ptr != NULL; ptr = ptr->h_next) 
 34: 	{
 35: 		
 36: 			  double reg = fabs(cvContourArea(ptr, CV_WHOLE_SEQ));
 37: 			//if(reg >600 && reg <10000)
 38: 			{
 39: 			CvScalar ext_color = CV_RGB( 255, 255, 255 ); //randomly coloring different contours
 40: 			cvDrawContours(img3, ptr,ext_color,CV_RGB(0,0,0), -1, CV_FILLED, 8, cvPoint(0,0));
 41: 			CvRect rectEst = cvBoundingRect( ptr, 0 );
 42: 			 CvPoint pt1,pt2;
 43:                 pt1.x = rectEst.x;
 44:                 pt1.y = rectEst.y;
 45:                 pt2.x = rectEst.x+ rectEst.width;
 46:                 pt2.y = rectEst.y+ rectEst.height;
 47: 				int thickness =1 ;
 48:                 cvRectangle( img, pt1, pt2, CV_RGB(255,255,255 ), thickness );
 49:  cvRectangle( img3, pt1, pt2, CV_RGB(255,255,255 ), thickness );
 50: 			//display( img);
 51: 			cvSetImageROI(img,rectEst);
 52: 			display(img);
 53: 			
 54: 		
 55: 		
 56: 			cvResetImageROI(img);
 57: 			
 58: 			
 59: 			
 60: 			}
 61: 			
 62: 	}
 63: 	 cvSaveImage("Detection-normal.png",img,0);
 64: 	  cvSaveImage("blobs.png",img3,0);
Advertisements

One response to “HIGH CONTRAST TEXT DETECTION WITH CONNECTED COMPONENT ANALYSIS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s