WHAT DOES COMPUTER VISION ACTUALLY SEES?
Computer vision is general term used for anything where we use AI to make software understand images and videos. Most of these use an AI model known as CNN (Convolutional Neural Network) which is type of as name suggests a Neural network
Now how do CNN actually see?
Neural networks mostly take 1D input in forms of parameters, but as we know images are 2D so how do we make a 2D image into a 1D input and remember we not only need to feed the pixel data but also make sure that AI can differentiate between your cat and yourself i.e. context
There are 2 solutions: -
1. Make a neural network which can take 2D image as input.
It is very interesting and developing idea where a multi dimensional neural network is trained and gives response without any preprocessing
2. Making image flat so that it retains both context and fits existing neural networks.
Most CNN these days use a preprocessing method called flattening in which we literally flatten the image and stretch the image or any higher dimension till it’s a 1D image or lower dimension array without losing context. Now fit that sweet 1D image into NN and done
As an industrially used solution it is most likely you will find flattening being used in most cases.
Now to finally answer what computer vision actually sees?
Imagin a 1D image of your cat where each pixel carries some information of every original pixel that’s what computer vision sees.
Hard to Imagin? because it’s something for AI to understand we just need to sit back and relax while out AI trained so that it can identify your cat sleeping in your 10,000-photo library.