this post was submitted on 19 Jan 2025
7 points (88.9% liked)

Machine Learning | Artificial Intelligence

973 readers
2 users here now

Welcome to Machine Learning – a versatile digital hub where Artificial Intelligence enthusiasts unite. From news flashes and coding tutorials to ML-themed humor, our community covers the gamut of machine learning topics. Regardless of whether you're an AI expert, a budding programmer, or simply curious about the field, this is your space to share, learn, and connect over all things machine learning. Let's weave algorithms and spark innovation together.

founded 2 years ago
MODERATORS
 

I'm trying to train a machine learning model to detect if an image is blurred or not.

I have 11,798 unblurred images, and I have a script to blur them and then use that to train my model.

However when I run the exact same training 5 times the results are wildly inconsistent (as you can see below). It also only gets to 98.67% accuracy max.

I'm pretty new to machine learning, so maybe I'm doing something really wrong. But coming from a software engineering background and just starting to learn machine learning, I have tons of questions. It's a struggle to know why it's so inconsistent between runs. It's a struggle to know how good is good enough (ie. when should I deploy the model). It's a struggle to know how to continue to improve the accuracy and make the model better.

Any advice or insight would be greatly appreciated.

View all the code: https://gist.github.com/fishcharlie/68e808c45537d79b4f4d33c26e2391dd

top 17 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 1 week ago (1 children)

Your training loss is going fine, but your validation isn't really moving with it at all. That suggests either there's some flaw where the training and validation aren't the same or your model is overfitting rather than solving the problem in a general way. From a quick scan of the code nothing too serious jumps out at me. The convolution size seems a little small (is a 3x3 kernel large enough to recognize blur?), but I haven't thought much about the level of impact of a gaussian blur on the small scale. Increasing that could help. If it doesn't I'd look into reducing the number of filters or the dense layer. Reducing the available space can force an overfitting network to figure out more general solutions.

Lastly, I bet someone else has either solved the same problem as an exercise or something similar and you could check out their network architecture to see if your solution is in the ballpark of something that works.

[–] [email protected] 1 points 1 week ago (1 children)

Thanks so much for the reply!

The convolution size seems a little small

I changed this to 5 instead of 3, and hard to tell if that made much of an improvement. It still is pretty inconsistent between training runs.

If it doesn’t I’d look into reducing the number of filters or the dense layer. Reducing the available space can force an overfitting network to figure out more general solutions

I'll try reducing the dense layer from 128 to 64 next.

Lastly, I bet someone else has either solved the same problem as an exercise or something similar and you could check out their network architecture to see if your solution is in the ballpark of something that works

This is a great idea. I did a quick Google search and nothing stood out to start. But I'll dig deeper more.


It's still super weird to me that with zero changes how variable it can be. I don't change anything, and one run it is consistently improving for a few epochs, the next run it's a lot less accurate to start and declines after the first epoch.

[–] [email protected] 3 points 1 week ago* (last edited 1 week ago) (1 children)

It looks like 5x5 led to an improvement. Validation is moving with training for longer before hitting a wall and turning to overfitting. I'd try bigger to see if that trend continues.

The difference between runs is due to the random initialization of the weights. You'll just randomly start nearer to solution that works better on some runs, priming it to reduce loss quickly. Generally you don't want to rely on that and just cherry pick the one run that looked the best in validation. A good solution will almost always get to a roughly similar end result, even if some runs take longer to get there.

[–] [email protected] 1 points 1 week ago (1 children)

Got it. I'll try with some more values and see what that leads to.

So does that mean my learning rate might be too high and it's overshooting the optimal solution sometimes based on those random weights?

[–] [email protected] 1 points 1 week ago (1 children)

No, it's just a general thing that happens. Regardless of rate, some initializations start nearer to the solution. Other times you'll see a run not really make much progress for a little while before finding the way and then making rapid progress. In the end they'll usually converge at the same point.

[–] [email protected] 1 points 1 week ago (1 children)

So does the fact that they aren't converging near the same point indicate there is a problem with my architecture and model design?

[–] [email protected] 1 points 1 week ago (1 children)

That your validation results aren't generally moving in the same direction as your training results indicates a problem. The training results are all converging, even though some drop at steeper rates. You won't expect validation to match the actual loss value of training, but the shape should look similar.

Some of this might be the y-scaling overemphasizing the noise. Since this is a binary problem (is it blurred or not), anything above 50% means it's doing something, so a 90% success isn't terrible. But I would expect validation to end up pretty close to training as it should be able to solve the problem in a general way. You might also benefit from looking at the class distribution of the bad classifications. If it's all non-blurred images, maybe it's because the images themselves are pretty monochrome or unfocused.

[–] [email protected] 1 points 1 week ago (1 children)

Ok I changed the Conv2D layer to be 10x10. I also changed the dense units to 64. Here is just a single run of that with a Confusion Matrix.

I don't really see a bias towards non-blurred images.

[–] [email protected] 1 points 1 week ago (1 children)

I'm not sure what the issue is, but I can't see your confusion matrix. There's a video player placeholder, but it doesn't load anything.

Hard to tell whether the larger convolution was a bust or not. It's got the one big spike away from the training loss, but every other epoch is moving downward with training, which looks good. Maybe try a few more runs, and if they commonly have that spike, try to give them longer to see if that loss keeps going downward to meet the training.

For efficiency you can also increase the convolutional stride. If you're doing a 10x10 you can move by more than one pixel each stride. Since you're not really trying to build local structures, going 5 or 10 pixels at a time seems reasonable.

[–] [email protected] 1 points 1 week ago* (last edited 1 week ago) (1 children)

Sorry for the delayed reply. I really appreciate your help so far.

Here is the raw link to the confusion matrix: https://eventfrontier.com/pictrs/image/1a2bc13e-378b-4920-b7f6-e5b337cd8c6f.webm

I changed it to keras.layers.Conv2D(16, 10, strides=(5, 5), activation='relu'). Dense units still at 64.

And in case the confusion matrix still doesn't work, here is a still image from the last run.

EDIT: The wrong image was uploaded originally.

[–] [email protected] 1 points 1 week ago (1 children)

The new runs don't look good. I wouldn't have expected a half-width stride to cause issues though.

Are you sure the confusion matrix for the validation set? It doesn't match a ~90% accuracy (previous solo run) or ~70-80% accuracy for the validation in the new runs.

[–] [email protected] 1 points 2 days ago (1 children)

So someone else suggested to reduce the learning rate. I tried that and at least to me it looks a lot more stable between runs. All the code is my original code (none of the suggestions you made) but I reduced the learning rate to 0.00001 instead of 0.0001.

Not quite sure what that means exactly tho. Or if more adjustments are needed.

As for the confusion matrix. I think the issue is the difference between smoothed values in TensorBoard vs the actual values. But I just ran it again with the previous values to verify. It does look like it matches up if you look at the actual value instead of the smoothed value.

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago) (1 children)

Yeah, that's looking a lot better. Too high learning rate is something I usually expect to see represented in an erratic training curve rather than (or I guess in addition to) an erratic validation curve though.

The learning rate is basically a measure of how big a step you're going to take when trying to update your weights. If it's large you'll approach solutions quickly but are likely to overshoot, if it's small you'll approach slowly and may end up stuck in a local minima as the step might be too small to leave it so every attempt will always be reversed by further training. IIRC, you had a decay to start high and get lower, which tries to get the best of both worlds, but it may have been that it started too high and/or didn't reduce quickly enough. The "steps" parameter there is counting in batches of images, so you're probably not getting much movement in 6 epochs. It looks like changing the initial rate solved your problem though, so there's not much reason to try to tweak that. Something to keep in mind for future efforts though.

And yeah, I wasn't looking at the unsmoothed data. That is quite a lot of variation.

[–] [email protected] 1 points 22 hours ago (1 children)

Got it. Thanks so much for your help!! Still a lot to learn here.

Coming from a world of building software where things are very binary (it works or it doesn't), it's also really tough to judge how good is "good enough". There is a point of diminishing returns, and not sure at what point to say that it's good enough vs continuing to learn and improve it.

Really appreciate your help here tho.

[–] [email protected] 1 points 11 hours ago

No problem, happy to help. In a lot of cases, even direct methods couldn't reach 100%. Sometimes the problem definition, combined with just regular noise in you input, will mean that you can have examples that have basically the same input, but different classes.

In the blur-domain, for example, if one of your original "unblurred" images was already blurred (or just out of focus) it might look pretty indistinguishable from "blurred" image. Then the only way for the net to "learn" to solve that problem is by overfitting to some unique value in that image.

A lot of machine learning is just making sure the nets are actually solving your problem rather than figuring out a way to cheat.

[–] [email protected] 3 points 1 week ago (1 children)

I feel like 98+ % is pretty good. I’m not an expert but I know over-fitting is something to watch out for.

As for inconsistency- man I’m not sure. You’d think it’d be 1 to 1 wouldn’t you with the same dataset. Perhaps the order of the input files isn’t the same between runs?

I know that when training you get diminishing returns on optimization and that there are MANY factors that affect performance and accuracy which can be really hard to guess.

I did some ML optimization tutorials a while back and you can iterate through algorithms and net sizes and graph the results to empirically find the optimal combinations for your data set. Then when you think you have it locked, you run your full training set with your dialed in parameters.

Keep us updated if you figure something out!

[–] [email protected] 1 points 1 week ago

I think what you’re referring to with iterating through algorithms and such is called hyper parameter tuning. I think there is a tool called Keras Tuner you can use for this.

However. I’m incredibly skeptical that will work in this situation because of how variable the results are between runs. I run it with the same input, same code, everything, and get wildly different results. So I think in order for that to be effective it needs to be fairly consistent between runs.

I could be totally off base here tho. (I haven’t worked with this stuff a ton yet).