back to course overview
Generators - Advanced Python 14
Generators are functions that can be paused and resumed on the fly, returning an object that can be iterated over. Unlike lists, they are lazy and thus produce items one at a time and only when asked. So they are much more memory efficient when dealing with large datasets.
A generator is defined like a normal function but with the
yield statement instead of
def my_generator(): yield 1 yield 2 yield 3
Execution of a generator function
Calling the function does not execute it. Instead, the function returns a generator object which is used to control execution. Generator objects execute when
next() is called. When calling
next() the first time, execution begins at the start of the function and continues until the first
yield statement where the value to the right of the statement is returned. Subsequent calls to
next() continue from the
yield statement (and loop around) until another
yield is reached. If
yield is not called because of a condition or the end is reached, a
StopIteration exception is raised:
def countdown(num): print('Starting') while num > 0: yield num num -= 1 # this will not print 'Starting' cd = countdown(3) # this will print 'Starting' and the first value print(next(cd)) # will print the next values print(next(cd)) print(next(cd)) # this will raise a StopIteration print(next(cd))
Starting 3 2 1 --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-1-3941498e0bf0> in <module> 16 17 # this will raise a StopIteration ---> 18 print(next(cd)) StopIteration:
# you can iterate over a generator object with a for in loop cd = countdown(3) for x in cd: print(x)
Starting 3 2 1
# you can use it for functions that take iterables as input cd = countdown(3) sum_cd = sum(cd) print(sum_cd) cd = countdown(3) sorted_cd = sorted(cd) print(sorted_cd)
Starting 6 Starting [1, 2, 3]
Big advantage: Generators save memory!
Since the values are generated lazily, i.e. only when needed, it saves a lot of memory, especially when working with large data. Furthermore, we do not need to wait until all the elements have been generated before we start to use them.
# without a generator, the complete sequence has to be stored here in a list def firstn(n): num, nums = 0,  while num < n: nums.append(num) num += 1 return nums sum_of_first_n = sum(firstn(1000000)) print(sum_of_first_n) import sys print(sys.getsizeof(firstn(1000000)), "bytes")
499999500000 8697464 bytes
# with a generator, no additional sequence is needed to store the numbers def firstn(n): num = 0 while num < n: yield num num += 1 sum_of_first_n = sum(firstn(1000000)) print(sum_of_first_n) import sys print(sys.getsizeof(firstn(1000000)), "bytes")
499999500000 120 bytes
Another example: Fibonacci numbers
def fibonacci(limit): a, b = 0, 1 # first two fibonacci numbers while a < limit: yield a a, b = b, a + b fib = fibonacci(30) # generator objects can be converted to a list (only used for printing here) print(list(fib))
[0, 1, 1, 2, 3, 5, 8, 13, 21]
Just like list comprehensions, generators can be written in the same syntax except with parenthesis instead of square brackets. Be careful not to mix them up, since generator expressions are often slower than list comprehensions because of the overhead of function calls (https://stackoverflow.com/questions/11964130/list-comprehension-vs-generator-expressions-weird-timeit-results/11964478#11964478)
# generator expression mygenerator = (i for i in range(1000) if i % 2 == 0) print(sys.getsizeof(mygenerator), "bytes") # list comprehension mylist = [i for i in range(1000) if i % 2 == 0] print(sys.getsizeof(mylist), "bytes")
120 bytes 4272 bytes
Concept behind a generator
This class implements our generator as an iterable object. It has to implement
__next__ to make it iterable, keep track of the current state (the current number in this case), and take care of a
StopIteration. It can be used to understand the concept behind generators. However, there is a lot of boilerplate code, and the logic is not as clear as with a simple function using the
class firstn: def __init__(self, n): self.n = n self.num = 0 def __iter__(self): return self def __next__(self): if self.num < self.n: cur = self.num self.num += 1 return cur else: raise StopIteration() firstn_object = firstn(1000000) print(sum(firstn_object))
Join My Newsletter! Get Python and ML tips emailed directly to your inbox. Each month you’ll get a summary of all the content I created, including the newest videos, articles, promotions, tips, and more.
Implement popular Machine Learning algorithms from scratch using only built-in Python modules and numpy.
Advanced Python Tutorials. It covers topics like collections, decorators, generators, multithreading, logging, and much more.
Learn all the necessary basics to get started with this deep learning framework.