Image Processing Pat: December 2016

Wednesday, December 7, 2016

AP186 Activity 11 Blog Report - Basic Video Processing

(1:07 PM) The last activity is basic video processing. Although my project involved videos, and that gave me some experience, taking the right video at home by myself is still the biggest roadblock I am encountering at the moment. I've taken five videos so far, one last night and four this morning, and each video was taken under different settings. None of them give clear frames on the motion of the ball I drop down. The first one is too dark, the second one got its FPS dropped from 29 (the first one) to 12, and while the third fixed the lighting issue, the FPS problem was still there. The fourth and fifth ones weren't even giving the correct FPS value, since they seemed to keep frames going for more time than each frame should.

I will try again with high exposure and default FPS settings.

(2:39 PM) I ended up taking two more videos, and I'm settling with the second one. The first one didn't even capture the dropping point within its field of vision. Anyway, here is my video with high exposure as well as recorded with a frame rate of 29 frames per second (which seems to be the maximum that my phone can do). I converted it into a lower quality video, so that it can be handled by my internet's uploading speed:

Unfortunately, this activity requires a really good video camera, since the blur is still present even with the best settings that I can set my phone with my experience. I will just have to work with blurry blobs, it seems. Hopefully the ROI manages to capture these blurry blobs, since this was mainly my issue with the very first video I took, with the ROI colors not being able to match with the ball as it falls.

The biggest issue for all of the videos I have is that because my camera has such a low frame rate relative to the motion of the ball, when the ball becomes a blur as it falls down, it loses its chromaticity. That's why the ROI method doesn't work for it. I tried the grayscale method of image segmentation in the very first video I took for this activity, and the shadows near the floor had the same grayscale value as the blur that was the falling ball. I hope that isn't the case for this seventh video I took, which I am currently using.

(3:03 PM) As I expected, the ROI method doesn't work. Even with high exposure, the chromaticity of the blur is not the same as the ball's chromaticity when it isn't moving. I will try patching together snapshots of the blur in conjunction with the original ROI into a new, larger ROI, and see what happens.

(3:20 PM) I ended up just making the ball in frames where it is stationary have a really low probability of being part of the patched-together ROI. I will just try grayscale image segmentation now.

(3:31 PM) Nope, doesn't quite work either. It looks very similar to my results for the first video I recorded for this activity when grayscale image segmentation is performed.

I think this means that if I use the video I have, I will need to create a customized code for each set of frames, and basically process each set with different parameters just to get the blob for the frames in that set.

I will do that now. Anyway, here are GIF animations of some of my failed attempts:

Fig. 1. GIF animations of failed segmentations. From left to right: first video using color segmentation, first video using grayscale segmentation, seventh video using color segmentation, seventh video using grayscale segmentation.

(6:47 PM) Using nonparametric color segmentation with these parameters:

Frames
1-3: Color, seg_J > 0, ROI from Frame 1
4-6: Color, seg_J > 0, ROI from Frame 5
7-11: Color, seg_J > 0.1, ROI from Frame 9
12-16: Color, seg_J > 0.3, ROI from Frame 13
17: Color, seg_J > 0, ROI from Frame 17

I was able to get a segmented version of the first trial's frames. I forgot to mention that the video contains three trials, and the first trial contains 17 frames. Note that seg_J has values between 0 and 1. The segmented version looks like:

Fig. 2. Segmented version (right) of the first trial (left) using nonparametric color segmentation.

As observable in Fig. 2, when the ball blurs, it becomes difficult to isolate it from the background. Thus, it looks like it disintegrates into powder while falling, until it hits the ground and looks whole again. For the vertical clumps of powder that represent the blurred ball in the middle frames, the middle of that powder should more or less represent the position of the ball at that point in time, so getting the centroid should still work, as long as they are joined into one blob and the artifacts are removed.

(7:35 PM) Here is the cleaned version using morphological operations. Particularly, closing operators for hollow blobs, and opening operators for cleaning away artifacts.

Fig. 3. Segmented frames in Fig. 2 cleaned using morphological operations.

Now to get the centroid locations. I will post this first before I place the results here just in case our internet runs out at a bad time later. I'll edit this shortly.

x y
62.040155 18.397668
62.684818 26.245875
63.222785 41.222785
64.428919 64.059538
64.272 95.650667
64.528274 132.66518
66.392709 179.28832
68.240343 231.59485
71.469045 292.85205
70.615854 359.83293
71.282927 435.5561
73.510921 521.00128
77.623872 608.6349
76.209991 713.00044
82.895717 821.03315
86.647849 938.28916
81.093805 1007.1646

The above are pixel coordinates. From 55 pixels to 1020 pixels, the height of the corner of the wall was measured to be 147.5 centimeters. This gives a conversion factor of 0.1528 centimeters per pixel, or 0.001528 meters per pixel.

Thank you to http://gifmaker.me/ for allowing me to convert frames into GIFs.

This was a very difficult activity to do in such a short time. But this is also the last activity, and that makes it bittersweet. With regards to AP186, I am happy with how the course went. The activities were challenging, and for most of them, I had rough starts. I managed to pull through with lots of help for the earlier activities, and then a lot of the later activities I had to (and managed to) do on my own, which made them even more fulfilling. I was very proud with how I managed to do the project that was tasked, and the difficulties in doing video processing I already surpassed, in a way, with the project. I'm sad I wasn't able to take a good video back in NIP, for this last activity. Still, I hope what I've done is enough.

Self-Evaluation: 8/10

Tuesday, December 6, 2016

AP186 Activity 10 Blog Report - Enhancement by Histogram Manipulation

In this activity, we will be using histogram manipulation to enhance dark-looking images.

Take this image, for example:

Fig. 1. Image to be enhanced in this activity.

The grayscale of the above image was obtained using I = rgb2gray(I_rgb). The result is:

Fig 2. Grayscale equivalent of the image in Fig. 1.

The grayscale histogram of the image is then obtainable using plot(imhist(I)), which gives:

Fig. 3. Grayscale histogram of the image.

The maximum and minimum grayscale values were obtained to be 245 (with 1 pixel having that value) and 0 (with 15 pixels having that value). Meanwhile, the maximum and minimum number of pixels having a particular grayscale value between 0 and 245 is 5257 pixels and 1 pixel. The obtained cumulative distribution function (CDF) using plot(y) where y = cumsum(nhist,1)) and nhist = imhist(I) is:

Fig. 4. Cumulative distribution function of the image.

The maximum cumulative number of pixels in Fig. 4 is 76800, and this can be used for the desired CDF's maximum as well.

In order to create the desired CDF, I created an array of integers from 0 to 76800 (max cumulative number of pixels) to act as the x-axis of the desired CDF. Then, to act as the y-axis, I divided each element of the x-axis array by 76800 then multiplied it by 245 (max grayscale value) to get the y-axis array. The desired CDF when plotting the x-axis array and the y-axis array then looks like:

Fig. 5. Desired cumulative distribution function of the image.

The enhanced image is then obtained by plugging in the grayscale image matrix I (plus 1, since the indices in Scilab start at 1) to the original CDF, and plugging in the result of that to the desired CDF. The resulting array is then reshaped into a matrix with the dimensions of the original image, giving the enhanced image. In code form, this is:

I2 = matrix(desired_y(y(I+1))/255,240,320);

where desired_y is the y-axis array of the desired CDF, y is the original CDF, I is the grayscale image matrix, and 240 by 320 is the original image pixel dimensions. The values are divided by 255 to get a range of 0 to 1 in the new image.

The resulting enhanced image, known as a histogram equalized image, looks like:

Fig. 6. Resulting histogram equalized image for the grayscale of the original image.

Upon saving the image in Fig. 6, I immediately read it using imread() to get the histogram of the image using imhist() as well as the CDF using cumsum(). These ended up looking like:

Fig. 7. Histogram of the histogram equalized image in Fig. 6.

The above looks like a more equalized histogram apart from the darker parts, which may have been a result of the 0 to 1 fractional values settling for integers from 0 to 255. Looking at the histogram with just 10 bins, the plot looks very equalized. The CDF of the histogram equalized image is:

Fig. 8. Cumulative distribution function of the histogram equalized image in Fig. 6.

The above CDF seems to imitate the desired CDF, which is expected.

For a hopefully better enhanced image, we have to consider that the human eye does not have a linear response. Thus, we can try using a sigmoid function as the desired CDF.

To create a sigmoid function, I used the second answer in this Stack Overflow question as reference:
http://stackoverflow.com/questions/3741063/scilab-x-6-6-y-1-1e-x-why-it-doesnt-work

Since I need a desired CDF that has an x-axis array from 0 to 76800, similar to Fig. 5, and a y-axis array from 0 to 245 (the desired CDF has to return a grayscale value after histogram manipulation enhancement), the correct modification to the sigmoid function from the source above is:

desired_y = -6:(12/76799):6; desired_x = 76800*ones(desired_y)./(1+%e.^-desired_y);
desired_y2 = 245*(desired_y+6)/12;
plot(desired_x,desired_y2); //desired CDF

The desired CDF looks like any old sigmoid function but with the x-axis and y-axis swapped:

Fig. 9. Desired cumulative distribution function following a sigmoid.

I think this is the correct form, since the fourth page of the Activity 10 manual gives a sample desired CDF with a regular looking sigmoid, and the x-axis there has a range of 0 to 255. Thus, the sigmoid must have the x-axis and the y-axis flipped if the 0 to 255 range should actually be on the y-axis in our desired CDF.

The enhanced image using the sigmoid CDF turned out to be:

Fig. 10. Enhanced image for the grayscale of the original image.

Using this technique for a colored image, we reuse some of the code from Activity 7:

I = double(I);
R = I(:,:,1); G = I(:,:,2); B = I(:,:,3);
Int = R + G + B;

We then perform histogram manipulation on the "Int" matrix. I checked the maximum value in the "Int" matrix and obtained 731. I then checked the minimum value, and it is 0, as expected (those are in the grayscale, after all). Thus, I replaced the 0 to 245 range from the previous code with the 0 to 731 range more applicable to the "Int" matrix.

In getting the histogram of the "Int" matrix, to be used for obtaining the original CDF, imhist() cannot be used. Thus, we replace the nhist = imhist(I) command with the [nhist,nind] = histc(732,Int) command. And then, in order to use cumsum(), we turn nhist into 732 rows by 1 column instead of 1 row by 732 columns.

[nhist,nind] = histc(732,Int,normalization=%f);
nhist = matrix(nhist,732,1);
y = cumsum(nhist,1);

Anyway, after making those small changes, the code with the sigmoid function works for the "Int" matrix. I named the resulting matrix as the "Int2" matrix. The new RGB channels are then calculated by first getting the r, g, and b as follows:

Int(find(Int==0))=100000; //avoids dividing by zero
r = R./ Int; g = G./Int; b = B./Int

Then, the new RGB channels are calculated using the "Int2" matrix by multiplying the matrix to the obtained r, g, and b:

R2 = r.*Int2; G2 = g.*Int2; B2 = b.*Int2;
I_enhanced(:,:,1) = R2; I_enhanced(:,:,2) = G2; I_enhanced(:,:,3) = B2;

I then used 255 to normalize all of the values of the multimatrix "I_enhanced" corresponding to the colored enhanced version of the original image. The resulting image is as follows:

Fig. 11. Enhanced colored image using histogram manipulation.

Compared with Fig. 1, Fig. 11 has much better colors on the trees, the background buildings, and even the details of the clothes of some of the spectators. In common with Fig. 6 and Fig. 10, the enhanced images have these weird horizontal scan lines on the very dark parts of the picture. I think this can be attributed to the treatment of my cellphone to those parts when it saves the picture as a relatively low-quality .jpg image.

Although I did this activity in a hurry while still sick, I enjoyed it for how it produced cool resulting images.

Self-Evaluation: 9/10

Monday, December 5, 2016

AP186 Activity 9 Blog Report - Playing Notes by Image Processing

After a week of project making, presenting, and paper writing, as well as four days of being sick, I think I'm ready to proceed with the next activity. There are three activities left, so this would be the third to the last activity.

In this activity, we will be playing notes based on a sheet music image. I tried to think of a good sheet music to use that wasn't too complicated, and I remembered the song we used to play for Music class in high school with our recorders, "Ode to Joy" by Ludwig van Beethoven. Or at least the main melody of it. The song can be summed up by this sheet music from Music-for-Music-Teachers.com:

Fig. 1. Sheet music to be used in this activity,

In the sheet music, there are quarter notes and half notes. The morphological operation I will need should result in blobs that distinguish between the two. But first, I will chop up the entire sheet music image into four parts. Take the first part, here:

Fig. 2. First part of the sheet music.

I plan on placing in a single array the entire melody for this first part. Thus, I initialize the array as an empty array S. I then use the reference given in our manual (http://www.phy.mtu.edu/~suits/notefreqs.html) to get the corresponding frequencies for the notes covered by the entire song.

C = 261.63*2;
D = 293.66*2;
E = 329.63*2;
F = 349.23*2;
G = 392.00*2;
A = 440.00*2;
B = 493.88*2;

And with the code snippet also from the manual, we use the function that generates sound waves based on the note and the duration.

function n = note(f, t)
n = sin (2*%pi*f*t);
endfunction;

For the values of t, the sheet music does not specify the tempo or beats per minute of the song. Thus, I will arbitrarily use 240 beats per minute, which corresponds to 0.25 seconds for the quarter note and 0.50 seconds for the half note.

t=soundsec(0.25);
t2=soundsec(0.50);

We now use the image in Fig. 2 to read the sheet music via image processing. After reading using imread() and converting to Grayscale from RGB using rgbtogray(), the image is inverted by subtracting each grayscale pixel value from 255. The image is then binarized by setting all grayscale values greater than zero as True and those that are zero as False. The resulting image is Fig. 3 below:

Fig. 3. Result from inversion and binarization of the image in Fig. 2.

A rectangular structuring element with 1 pixel width and 2 pixels height, which looks like a short vertical line, is then used to erode the image. This causes the horizontal lines to disappear.

Fig. 4. Result from erosion using the vertical line structuring element.

Another rectangular structuring element, this time with 2 pixels width and 1 pixel height or a short horizontal line, is used to erode the image in Fig. 4. This causes the vertical lines to disappear.

Fig. 5. Result from erosion using the horizontal line structuring element.

Notice that the half note blob looks like it's separated into two curved lines instead of one hollow oval. Also, it would be best to make the stray parts of the G-clef and F-clef that look like a note be attached to the G-clef and F-clef blobs. Thus, I used a rectangular structuring element with 4 pixels width and 1 pixel height, which is a slightly longer horizontal line than the previous structuring element, to join the half note blob parts into one hollow oval, as well as the stray blobs of the G-clef and F-clef to the main G-clef and F-clef blobs.

Fig. 6. Result from dilation using the slightly longer horizontal line structuring element.

Now we have a workable image. We use SearchBlobs() to assign a number to each of the blobs in the image, and the area of each blob is checked one by one using size(). I found that pixel areas greater than 40 and less than 45 correspond to the half note blobs, and that pixel areas greater than 45 and less than 60 correspond to the quarter note blobs. Thus, we can filter those blobs out and put them in their own images:

Fig. 7. Quarter note blobs from the first part of the music sheet.

Fig. 8. Half note blobs from the first part of the music sheet.

I then use SearchBlobs() twice: on the image containing quarter notes, and on the image containing half notes. Using AnalyzeBlobs() to get the centroids of each blob, I then obtained the y-axis location (corresponding to which note they are played at). I then used MS Paint to manually check the y-axis pixel ranges in Fig. 2 that correspond to which notes are played. I found these to be:

for i = 1:max(BW1)
if S5(i).Centroid(2) >= 50 & S5(i).Centroid(2) < 53
L2 = note(C,t);
elseif S5(i).Centroid(2) >= 47 & S5(i).Centroid(2) < 50
L2 = note(D,t);
elseif S5(i).Centroid(2) >= 44 & S5(i).Centroid(2) < 47
L2 = note(E,t);
elseif S5(i).Centroid(2) >= 41 & S5(i).Centroid(2) < 44
L2 = note(F,t);
elseif S5(i).Centroid(2) >= 38 & S5(i).Centroid(2) < 41
L2 = note(G,t);
elseif S5(i).Centroid(2) >= 35 & S5(i).Centroid(2) < 38
L2 = note(A,t);
elseif S5(i).Centroid(2) >= 32 & S5(i).Centroid(2) < 35
L2 = note(B,t);
else
L2 = note(E,t); //this would sound incorrect
end
S = cat(2,S,L2);
end

Looping across each numbered blob and comparing the y-axis location with which range it falls on, I then place it in an array L2 containing that note. The S array then compiles the arrays via concatenation. The x-axis location of the blob should also be usable to obtain the order in which the notes are played, but since the half note is always at the end in this particular song, I simply performed the blob scanning on the half notes after the quarter notes, attaching the half note at the end of the array S through concatenation.

I then used the sound() function to listen to the resulting S array, and the writewav() function to save it into a .wav file.

The other parts were made sure to have the same y-axis coordinate positions for the horizontal lines of the sheet music. Using the exact same code but changing the filename to the other parts, these were the results:

That ends Activity 9. This was difficult to do since I am still a bit sick. However, I'm glad I was able to do it.

Self-Evaluation: 9/10