Python Fundamentals: Variables, Data Types, and Building Word Clouds with jieba
Variables in Python
Variables serve as containers for storing data values. In Python, they are created the moment you assign a value to them. A variable consists of three key elements: a descriptive name, an assignment operator, and a value.
Naming Conventions
- Variable names must start with a letter or underscore, never a number
- Names can only contain alphanumeric characters and underscores
- Reserved keywords cannot be used as variable names
- Names should be meaningful and describe the data they hold
Comments
Python supports single-line comments using the # symbol and multi-line comments using triple quotes.
# This is a single-line comment
'''
This is a multi-line comment
spanning multiple lines
'''
"""
Another way to write
multi-line comments
"""
Data Types
Numeric Types
Integers
Integers are whole numbers without decimal points. They support standard arithmetic operations.
user_id = 1001
phone_number = int(13800138000)
# Arithmetic operations
num_a = 15
num_b = 4
print(num_a + num_b) # Addition: 19
print(num_a - num_b) # Subtraction: 11
print(num_a * num_b) # Multiplication: 60
print(num_a / num_b) # Division: 3.75
print(num_a % num_b) # Modulus: 3
print(num_a // num_b) # Floor division: 3
print(num_a ** num_b) # Exponentiation: 50625
Floats
Floating-point numbers represent decimal values.
monthly_salary = 15000.50
height_meters = float(180) # Converts to 180.0
# Mathematical operations
import math
result = math.sqrt(16) # 4.0
log_value = math.log(10) # Natural logarithm
String Type
Strings are sequences of characters enclosed in single, double, or triple quotes.
first_name = 'John'
last_name = "Doe"
address = '''
123 Main Street,
New York, NY 10001
'''
# String methods
text = "Hello World"
print(text.startswith("Hello")) # True
print(text.endswith("World")) # True
# Indexing and slicing
message = "Python Programming"
print(message[0]) # P (first character)
print(message[-1]) # g (last character)
print(message[0:6]) # Python
print(message[::2]) # Pto rgamn (step of 2)
String Operations: join() and split()
# Join - combines list elements into a string
char_list = ['P', 'y', 't', 'h', 'o', 'n']
joined_string = '-'.join(char_list)
print(joined_string) # P-y-t-h-o-n
# Split - breaks string into list
sentence = "apple,banana,cherry"
fruits = sentence.split(',')
print(fruits) # ['apple', 'banana', 'cherry']
List Type
Lists are ordered, mutable collections that can hold multiple data types.
# Creating lists
hobbies = ['reading', 'gaming', 'coding']
empty_list = []
converted_list = list('abc') # ['a', 'b', 'c']
# Accessing elements
print(hobbies[0]) # reading
print(hobbies[-1]) # coding
print(hobbies[1:3]) # ['gaming', 'coding']
# Modifying lists
hobbies.append('music')
hobbies.remove('gaming')
Dictionary Type
Dictionaries store data in key-value pairs, providing fast lookups by key.
# Creating a dictionary
person_info = {
'name': 'Alice',
'age': 28,
'city': 'Beijing',
'skills': ['Python', 'Java', 'SQL']
}
# Accessing values by key
print(person_info['name']) # Alice
print(person_info.get('age')) # 28
# Modifying values
person_info['age'] = 29
person_info['email'] = 'alice@example.com' # Add new key
# Deleting a key-value pair
del person_info['city']
Configuring Pip Mirror Sources
When installing Python packages, the default pip source may be slow due to geographic distance. Configuring a domestic mirror significantly improves download speeds.
Common Chinese mirror sources include:
- Tsinghua: https://pypi.tuna.tsinghua.edu.cn/simple
- Aliyun: https://mirrors.aliyun.com/pypi/simple
- Douban: https://pypi.douban.com/simple
To permanently configure a mirror, modify the pip configuration file or use the command:
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
Jieba Library for Chinese Word Segmentation
Jieba is a powerful Chinese text segmentation library. Install it using pip:
pip install jieba
Segmentation Methods
import jieba
text = "Artificial intelligence is changing the world"
# Accurate mode (default) - returns list
seg_result = jieba.lcut(text)
print(seg_result)
# ['Artificial', 'intelligence', 'is', 'changing', 'the', 'world']
# Full mode - scans all possible words
full_seg = jieba.lcut(text, cut_all=True)
print(full_seg)
# Search engine mode - better for search indexing
search_seg = jieba.lcut_for_search(text)
print(search_seg)
# Adding custom words
jieba.add_word("Artificial intelligence")
custom_seg = jieba.lcut("Artificial intelligence is developing rapidly")
print(custom_seg)
Generating Word Clouds with wordcloud
The wordcloud library creates visual representations of text data. Install required packages:
pip install wordcloud
pip install imageio
pip install pillow
Basic Word Cloud Generation
import jieba
import wordcloud
# Sample text
content = "Python is a powerful programming language used for web development, data analysis, machine learning, and automation. Python has a simple syntax and is easy to learn."
# Segment Chinese text (for Chinese content)
words = jieba.lcut(content)
text_processed = ' '.join(words)
# Create word cloud
wc = wordcloud.WordCloud(
width=800,
height=400,
background_color='white'
)
wc.generate(text_processed)
wc.to_file('wordcloud_output.png')
Customized Word Cloud with Mask
Using a mask allows you to shape the word cloud into custom forms.
import jieba
import wordcloud
from imageio import imread
# Load mask image (white background required)
mask_image = imread('star_shape.png')
# Text content
sample_text = "Technology innovation drives progress and creates opportunities for development and growth"
# Add custom vocabulary
jieba.add_word("Technology innovation")
# Process text
word_list = jieba.lcut(sample_text)
processed_text = ' '.join(word_list)
# Configure word cloud with mask
cloud = wordcloud.WordCloud(
font_path='C:/Windows/Fonts/simhei.ttf', # Chinese font support
mask=mask_image,
background_color='white',
width=1000,
height=800,
max_words=100
)
cloud.generate(processed_text)
cloud.to_file('custom_shaped_wordcloud.png')
Key parameters for WordCloud customization:
font_path: Path to font file for Chinese character supportmask: Image array defining the shapebackground_color: Background color (default: black)max_words: Maximum number of words to displaywidth,height: Output image dimensionscolormap: Color scheme for words