Simple generators
A quick tutorial on how to make generators. We want to achieve this simple behavior and the same output:
>>> for i in range(5):
... print(i)
...
0
1
2
3
4
Define a function where each call to 'yield' makes a new step in the loop that uses this generator:
>>> def gen():
... for i in range(5):
... yield i
...
>>> for i in gen():
... print(i)
...
0
1
2
3
4
Here is where the interesting stuff starts. What if on each step you have more than one value ready to deliver. Then there is such thing as 'yield from'
>>> def gen():
... for i in range(5):
... yield from (i, 100 - i)
...
>>> for i in gen():
... print(i)
...
0
100
1
99
2
98
3
97
4
96
The way how yield stops the execution in the generator itself is strange for my brain, so here's one more test: what if we have 2 loops, in what order the yield will be called?
>>> def gen():
... for i in range(5):
... yield i
... for i in range(5):
... yield 100 - i
...
>>> for i in gen():
... print(i)
...
0
1
2
3
4
100
99
98
97
96
Generators in Machine Learning
Note: this is an part of my answer following my investigation of a random question on Stack Overflow. All for the bounty hunt ;)
I have found GitHub repository and 3 part video tutorial on YouTube that mainly focuses on the benefits of using generator functions in Python. The data is based on this kaggle.
You do not need to write a data generator from scratch, though it is not hard, but inventing the wheel is not productive.
- Keras has the ImageDataGenerator class.
- Plus here is a more generic example for DataGenerator.
- Tensorflow offers very neat pipelines with their
tf.data.Dataset
.
Nevertheless, to solve the kaggle's task, the model needs to perceive single images only, hence the model is a simple deep CNN. If you combine 8 random characters (classes) into one image to recognize multiple classes at once, you need R-CNN or YOLO as your model. I just recently opened for myself YOLO v4, and it is possible to make it work for specific task really quick.