Home > Tech > Content

Python Fundamentals: Variables, Data Types, and Building Word Clouds with jieba

Tech May 9 14

Variables in Python

Variables serve as containers for storing data values. In Python, they are created the moment you assign a value to them. A variable consists of three key elements: a descriptive name, an assignment operator, and a value.

Naming Conventions

Variable names must start with a letter or underscore, never a number
Names can only contain alphanumeric characters and underscores
Reserved keywords cannot be used as variable names
Names should be meaningful and describe the data they hold

Comments

Python supports single-line comments using the # symbol and multi-line comments using triple quotes.

# This is a single-line comment

'''
This is a multi-line comment
spanning multiple lines
'''

"""
Another way to write
multi-line comments
"""

Data Types

Numeric Types

Integers

Integers are whole numbers without decimal points. They support standard arithmetic operations.

user_id = 1001
phone_number = int(13800138000)

# Arithmetic operations
num_a = 15
num_b = 4

print(num_a + num_b)   # Addition: 19
print(num_a - num_b)   # Subtraction: 11
print(num_a * num_b)   # Multiplication: 60
print(num_a / num_b)   # Division: 3.75
print(num_a % num_b)   # Modulus: 3
print(num_a // num_b)  # Floor division: 3
print(num_a ** num_b)  # Exponentiation: 50625

Floats

Floating-point numbers represent decimal values.

monthly_salary = 15000.50
height_meters = float(180)  # Converts to 180.0

# Mathematical operations
import math
result = math.sqrt(16)  # 4.0
log_value = math.log(10)  # Natural logarithm

String Type

Strings are sequences of characters enclosed in single, double, or triple quotes.

first_name = 'John'
last_name = "Doe"
address = '''
123 Main Street,
New York, NY 10001
'''

# String methods
text = "Hello World"
print(text.startswith("Hello"))  # True
print(text.endswith("World"))    # True

# Indexing and slicing
message = "Python Programming"
print(message[0])      # P (first character)
print(message[-1])      # g (last character)
print(message[0:6])     # Python
print(message[::2])     # Pto rgamn (step of 2)

String Operations: join() and split()

# Join - combines list elements into a string
char_list = ['P', 'y', 't', 'h', 'o', 'n']
joined_string = '-'.join(char_list)
print(joined_string)  # P-y-t-h-o-n

# Split - breaks string into list
sentence = "apple,banana,cherry"
fruits = sentence.split(',')
print(fruits)  # ['apple', 'banana', 'cherry']

List Type

Lists are ordered, mutable collections that can hold multiple data types.

# Creating lists
hobbies = ['reading', 'gaming', 'coding']
empty_list = []
converted_list = list('abc')  # ['a', 'b', 'c']

# Accessing elements
print(hobbies[0])    # reading
print(hobbies[-1])   # coding
print(hobbies[1:3])  # ['gaming', 'coding']

# Modifying lists
hobbies.append('music')
hobbies.remove('gaming')

Dictionary Type

Dictionaries store data in key-value pairs, providing fast lookups by key.

# Creating a dictionary
person_info = {
    'name': 'Alice',
    'age': 28,
    'city': 'Beijing',
    'skills': ['Python', 'Java', 'SQL']
}

# Accessing values by key
print(person_info['name'])       # Alice
print(person_info.get('age'))    # 28

# Modifying values
person_info['age'] = 29
person_info['email'] = 'alice@example.com'  # Add new key

# Deleting a key-value pair
del person_info['city']

Configuring Pip Mirror Sources

When installing Python packages, the default pip source may be slow due to geographic distance. Configuring a domestic mirror significantly improves download speeds.

Common Chinese mirror sources include:

Tsinghua: https://pypi.tuna.tsinghua.edu.cn/simple
Aliyun: https://mirrors.aliyun.com/pypi/simple
Douban: https://pypi.douban.com/simple

To permanently configure a mirror, modify the pip configuration file or use the command:

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

Jieba Library for Chinese Word Segmentation

Jieba is a powerful Chinese text segmentation library. Install it using pip:

pip install jieba

Segmentation Methods

import jieba

text = "Artificial intelligence is changing the world"

# Accurate mode (default) - returns list
seg_result = jieba.lcut(text)
print(seg_result)
# ['Artificial', 'intelligence', 'is', 'changing', 'the', 'world']

# Full mode - scans all possible words
full_seg = jieba.lcut(text, cut_all=True)
print(full_seg)

# Search engine mode - better for search indexing
search_seg = jieba.lcut_for_search(text)
print(search_seg)

# Adding custom words
jieba.add_word("Artificial intelligence")
custom_seg = jieba.lcut("Artificial intelligence is developing rapidly")
print(custom_seg)

Generating Word Clouds with wordcloud

The wordcloud library creates visual representations of text data. Install required packages:

pip install wordcloud
pip install imageio
pip install pillow

Basic Word Cloud Generation

import jieba
import wordcloud

# Sample text
content = "Python is a powerful programming language used for web development, data analysis, machine learning, and automation. Python has a simple syntax and is easy to learn."

# Segment Chinese text (for Chinese content)
words = jieba.lcut(content)
text_processed = ' '.join(words)

# Create word cloud
wc = wordcloud.WordCloud(
    width=800,
    height=400,
    background_color='white'
)

wc.generate(text_processed)
wc.to_file('wordcloud_output.png')

Customized Word Cloud with Mask

Using a mask allows you to shape the word cloud into custom forms.

import jieba
import wordcloud
from imageio import imread

# Load mask image (white background required)
mask_image = imread('star_shape.png')

# Text content
sample_text = "Technology innovation drives progress and creates opportunities for development and growth"

# Add custom vocabulary
jieba.add_word("Technology innovation")

# Process text
word_list = jieba.lcut(sample_text)
processed_text = ' '.join(word_list)

# Configure word cloud with mask
cloud = wordcloud.WordCloud(
    font_path='C:/Windows/Fonts/simhei.ttf',  # Chinese font support
    mask=mask_image,
    background_color='white',
    width=1000,
    height=800,
    max_words=100
)

cloud.generate(processed_text)
cloud.to_file('custom_shaped_wordcloud.png')

Key parameters for WordCloud customization:

font_path: Path to font file for Chinese character support
mask: Image array defining the shape
background_color: Background color (default: black)
max_words: Maximum number of words to display
width, height: Output image dimensions
colormap: Color scheme for words

Tags: Python jieba WordCloud

Back to List

Prev: Configuring SpringMVC with Maven in Eclipse

Next: Integrating Redux with React Applications

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Fading Coder

Python Fundamentals: Variables, Data Types, and Building Word Clouds with jieba

Variables in Python

Naming Conventions

Comments

Data Types

Numeric Types

Integers

Floats

String Type

String Operations: join() and split()

List Type

Dictionary Type

Configuring Pip Mirror Sources

Jieba Library for Chinese Word Segmentation

Segmentation Methods

Generating Word Clouds with wordcloud

Basic Word Cloud Generation

Customized Word Cloud with Mask

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Python Fundamentals: Variables, Data Types, and Building Word Clouds with jieba

Variables in Python

Naming Conventions

Comments

Data Types

Numeric Types

Integers

Floats

String Type

String Operations: join() and split()

List Type

Dictionary Type

Configuring Pip Mirror Sources

Jieba Library for Chinese Word Segmentation

Segmentation Methods

Generating Word Clouds with wordcloud

Basic Word Cloud Generation

Customized Word Cloud with Mask

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment