TMTOWTDI: Top 10 python generators use cases

Generators are useful in a variety of situations where we need to produce a stream of values, rather than a fixed collection of values. Some common use cases of generators in Python include:

1.Processing large files: Generators can be used to process large files in a memory-efficient manner, by reading one line or block at a time and processing it, rather than reading the entire file into memory.

If you have a large file that you need to process line by line, you can use a generator to read the file one line at a time, rather than reading the entire file into memory at once. This can be much more memory-efficient, especially if the file is very large.

def read_file(filename):

    with open(filename, 'r') as f:

        for line in f:

            yield line.strip()

This generator function reads a file line by line and returns each line as a string. You can use it like this:


for line in read_file('large_file.txt'):
    # do something with the line

2.Stream processing: Generators can be used to process streams of data, such as sensor readings or network packets, in real-time.

import random



def stream_data():

    while True:

        yield random.randint(1, 100)



def process_data(data_stream):

    for data_point in data_stream:

        # Process each data point

        print(data_point)



data_stream = stream_data()

process_data(data_stream)

This generator code generates an infinite stream of random integers between 1 and 100 using stream_data function, and then processes each data point one-by-one using the process_data function.

3.Parallel processing: Generators can be used to perform parallel processing of data, by splitting up a large data set into smaller chunks and processing each chunk in a separate generator.

import math



def chunk_data(data, chunk_size):

    for i in range(0, len(data), chunk_size):

        yield data[i:i+chunk_size]



def process_chunk(chunk):

    result = []

    for num in chunk:

        result.append(math.sqrt(num))

    return result



data = [i**2 for i in range(10000)]

chunk_size = 1000



chunk_gen = chunk_data(data, chunk_size)

result_gen = (process_chunk(chunk) for chunk in chunk_gen)



for result in result_gen:

    # Process each chunk result

    print(result)

This generator code splits a large list of numbers into smaller chunks of 1000 using chunk_data function, and then processes each chunk in parallel using process_chunk function. The result_gen generator yields the results of each chunk computation as a generator expression.

4.Infinite sequences: Generators can be used to generate infinite sequences of values, such as the Fibonacci sequence or prime numbers, without requiring infinite memory.

def fibonacci():

    a, b = 0, 1

    while True:

        yield a

        a, b = b, a + b

This generator function generates the Fibonacci sequence indefinitely. You can use it like this:

fib = fibonacci()
for i in range(10):
    print(next(fib))

This will print the first 10 numbers in the Fibonacci sequence.

5.Lazy evaluation: You can use generators to implement lazy evaluation, which means that values are only computed when they are needed. This can be useful when working with large datasets, as it allows you to avoid computing values that you don't need.

For example, suppose you have a large list of numbers, and you want to compute the sum of the squares of all the even numbers. You can use a generator to compute the squares of the even numbers on the fly:

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

def squares_of_evens(numbers):

    for n in numbers:

        if n % 2 == 0:

            yield n**2



sum_of_squares = sum(squares_of_evens(numbers))

print(sum_of_squares)

6.Data processing pipelines: Generators can be used to create data processing pipelines, where data is passed through a series of processing steps. This can be useful for processing large amounts of data that cannot fit into memory all at once. For example, you can create a generator that reads lines from a file, and then use it to filter out lines that contain a certain word:

def read_lines(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

def filter_lines(lines, word):
    for line in lines:
        if word in line:
            yield line

lines_gen = read_lines('my_file.txt')
filtered_gen = filter_lines(lines_gen, 'Python')
for line in filtered_gen:
    print(line)

This will read lines from the file "my_file.txt", and then filter out lines that contain the word "Python"

7.Caching: Generators can be used to cache expensive computations, by computing and storing intermediate results in a generator and only computing new values when necessary.

def compute_expensive_result(num):

    # Compute expensive result here

    return num ** 2



def get_cached_results(nums):

    cache = {}

    for num in nums:

        if num in cache:

            yield cache[num]

        else:

            result = compute_expensive_result(num)

            cache[num] = result

            yield result



nums = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]



result_gen = get_cached_results(nums)



for result in result_gen:

    # Process each result

    print(result)

This generator code computes expensive results for a list of numbers using compute_expensive_result function, but caches the results using a dictionary to avoid recomputing the same results multiple times. The get_cached_results generator yields the cached result if it exists, or computes and caches a new result if it does not exist.

8.Data cleaning and transformation: Generators can be used to clean and transform data in a memory-efficient manner. For example, we can use a generator to read in a large dataset, clean and transform each record one-by-one, and write out the transformed records to a new file or database.

def read_data(file_path):

    with open(file_path, 'r') as f:

        for line in f:

            yield line.strip()



def clean_data(data_stream):

    for data_point in data_stream:

        # Clean each data point

        cleaned_data = data_point.replace('\t', ' ').replace('\n', '')

        yield cleaned_data



def transform_data(data_stream):

    for data_point in data_stream:

        # Transform each data point

        transformed_data = data_point.upper()

        yield transformed_data



def write_data(data_stream, file_path):

    with open(file_path, 'w') as f:

        for data_point in data_stream:

            # Write each data point

            f.write(data_point + '\n')



input_file = 'input_data.txt'

output_file = 'output_data.txt'



data_stream = read_data(input_file)

cleaned_data_stream = clean_data(data_stream)

transformed_data_stream = transform_data(cleaned_data_stream)

write_data(transformed_data_stream, output_file)

This generator code reads in a large data file line-by-line using read_data function, cleans and transforms each data point using clean_data and transform_data functions, and then writes out the transformed data to a new file using `

9.Natural language processing: Generators can be used in natural language processing tasks, such as tokenization, parsing, and text classification. For example, we can use a generator to read in a large corpus of text, tokenize each document one-by-one, and feed the resulting tokens into a machine learning model for classification.

import nltk



def tokenize_documents(documents):

    for document in documents:

        tokens = nltk.word_tokenize(document)

        yield tokens



documents = ["This is a document.", "This is another document."]



# Tokenize each document

for tokens in tokenize_documents(documents):

    # Process tokens

    process_tokens(tokens)

This code defines a generator function tokenize_documents that takes in a list of documents. The function uses the Natural Language Toolkit (NLTK) to tokenize each document into a list of words and then yields the list of tokens one at a time.

In the main code, a list of documents is defined as strings. The tokenize_documents generator function is called with the list of documents as input, and the resulting generator is used in a for loop to iterate over each list of tokens generated by the generator.

Each list of tokens is then passed to a hypothetical process_tokens function, which can perform any desired processing on the tokens, such as part-of-speech tagging, sentiment analysis, or topic modeling.

This approach allows for efficient processing of large amounts of text data, as only one document's worth of tokens is loaded into memory at a time. It also allows for easy modification of the processing steps, as they are performed one document at a time in a separate function.

10.Web scraping: Generators can be used to scrape data from websites in a memory-efficient and scalable manner. For example, we can use a generator to fetch web pages one-by-one, extract the relevant data from each page, and store the data in a database or file.

import requests

from bs4 import BeautifulSoup



def fetch_page(url):

    """

    Generator that fetches web pages and yields the content

    """

    response = requests.get(url)

    if response.status_code == 200:

        yield response.content



def extract_links(content):

    """

    Generator that extracts links from HTML content

    """

    soup = BeautifulSoup(content, 'html.parser')

    for link in soup.find_all('a'):

        href = link.get('href')

        if href is not None:

            yield href



# Example usage: fetch pages and extract links

url = 'https://en.wikipedia.org/wiki/Web_scraping'

page_gen = fetch_page(url)



for page_content in page_gen:

    link_gen = extract_links(page_content)

    for link in link_gen:

        print(link)

we define two generators: fetch_page() and extract_links(). The fetch_page() generator fetches web pages using the requests library and yields the content as a byte string. The extract_links() generator takes the HTML content as input, uses the BeautifulSoup library to parse the content, and yields any links found in the HTML.

We then use these generators to scrape a Wikipedia page and extract all the links found on the page. We first initialize the fetch_page() generator with the URL of the page to be scraped. We then iterate over the content yielded by the fetch_page() generator, initializing the extract_links() generator with each page's content. Finally, we iterate over the links yielded by the extract_links() generator and print each link to the console.

By using generators in this way, we can process large volumes of web data in a memory-efficient and scalable manner, without having to load the entire web page or list of links into memory at once.

Overall, generators are a versatile and powerful tool in Python that can be used in a wide range of applications, from data processing and analysis to scientific computing and simulation.

Labels: python tutorial, top10

TMTOWTDI
[There's More Than One Way To Do It]

Main Menu

Tuesday 30 November 2021

Top 10 python generators use cases

0 Comments:

Post a Comment

About Me

Previous Posts

TMTOWTDI [There's More Than One Way To Do It]