Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Fundamentals of String Processing and Word Frequency Analysis with Python

Tech 2

Developing a Word Frequency Counter in Python

Below is a Python script for calculating word occurrences within a given text. The function handles punctuation and case normalization.

import string

def compute_word_frequency(input_text):
    """
    Analyzes input text and returns a dictionary with word frequencies.
    Processes text by removing punctuation and converting to lowercase.
    """
    # Create a translation table to remove punctuation characters
    translator = str.maketrans('', '', string.punctuation)
    # Apply translation and convert text to lowercase
    cleaned_text = input_text.translate(translator).lower()
    
    # Tokenize the cleaned text into individual words
    tokens = cleaned_text.split()
    
    # Dictionary to store final word counts
    frequency_map = {}
    
    # Iterate through tokens and update frequency counts
    for token in tokens:
        if token in frequency_map:
            frequency_map[token] = frequency_map[token] + 1
        else:
            frequency_map[token] = 1
    
    return frequency_map


if __name__ == "__main__":
    # Sample text for demonstration
    sample = """
        Bought this stuffed panda for my daughter's birthday.
        She adores it and carries it everywhere. The material is soft,
        and its appearance is quite charming. However, the size is
        somewhat smaller relative to its cost. Perhaps other products
        offer larger dimensions at a comparable price. Delivery was
        surprisingly prompt, arriving a day ahead of schedule, which
        allowed me some personal enjoyment before handing it over.
    """
    
    result = compute_word_frequency(sample)
    print(result)

Output:

{'bought': 1, 'this': 1, 'stuffed': 1, 'panda': 1, 'for': 1, 'my': 1, 'daughters': 1, 'birthday': 1, 'she': 1, 'adores': 1, 'it': 4, 'and': 2, 'carries': 1, 'everywhere': 1, 'the': 2, 'material': 1, 'is': 3, 'soft': 1, 'its': 2, 'appearance': 1, 'quite': 1, 'charming': 1, 'however': 1, 'size': 1, 'somewhat': 1, 'smaller': 1, 'relative': 1, 'to': 1, 'cost': 1, 'perhaps': 1, 'other': 1, 'products': 1, 'offer': 1, 'larger': 1, 'dimensions': 1, 'at': 1, 'a': 1, 'comparable': 1, 'price': 1, 'delivery': 1, 'was': 1, 'surprisingly': 1, 'prompt': 1, 'arriving': 1, 'day': 1, 'ahead': 1, 'of': 1, 'schedule': 1, 'which': 1, 'allowed': 1, 'me': 1, 'some': 1, 'personal': 1, 'enjoyment': 1, 'before': 1, 'handing': 1, 'over': 1}

Debugging Python Code in VS Code

  1. Open the File: Launch the Python script you intend to debug in the editor.
  2. Set a Breakpoint: Click in the gutter (left margin) of the desired line number. A red circle appears, indicating the program will pause execution at this point. For enstance, set a breakpoint on the line containing the for loop to inspect each iteration.
  3. Initiate Debugging: Press F5 or click the green "Run and Debug" button in the sidebar. Alternatively, use the debug icon in the top-right corner and select "Debug Python File".
  4. Utilize Debugging Controls: Once execution halts at the breakpoint, use the following tools:
    • Variables Panel: Inspect the current state of variables like tokens and frequency_map.
    • Debug Console: Execute ad-hoc Python expressions or debugging commands.
    • Call Stack: View the sequence of functon calls leading to the current point.
    • Step Into (F11): Move into the execution of a called function.
    • Step Over (F10): Execute the current line and move to the next one.
    • Step Out (Shift+F11): Complete execution of the currrent function and return to its caller.
    • Continue (F5): Resume program execution until the next breakpoint is encountered.

While paused, you can monitor how the frequency_map dictionary is constructed step-by-step as the loop processes each word from the tokens list.

Tags: Python

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.