Home > Tools > Content

Scraping NetEase Cloud Music Hot Comments to Generate Word Clouds

Tools 1

Data Collection

Building a word cloud requires raw data first. For NetEase Cloud Music, this involves several steps:

Packet analysis to locate the API endpoint
Handling encrypted request parameters
Extracting hot comment content

Packet Analysis

Using Chrome DevTools, the comment API endpoint becomes visible. The requests use POST method with specific parameters and encrypted headers.

Handling Encrypted Parameters

The NetEase Cloud Music API requires two encrypted fields: params and encSecKey. These values can be extracted from browser requests and reused across different song IDs. For deep technical details on the encryption mechanism, refer to the NetEase Cloud Music API analysis projects available on GitHub.

Extracitng Hot Comments

Once the endpoint is identified, the resposne returns JSON data containing comment objects. Parse the JSON and extract the content field from each hot comment.

import requests
import json

def fetch_hot_comments(song_id):
    api_url = f'http://music.163.com/weapi/v1/resource/comments/R_SO_4_{song_id}?csrf_token=test'
    
    post_data = {
        'params': '4hmFbT9ZucQPTM8ly/UA60NYH1tpyzhHOx04qzjEh3hU1597xh7pBOjRILfbjNZHqzzGby5ExblBpOdDLJxOAk4hBVy5/XNwobA+JTFPiumSmVYBRFpizkWHgCGO+OWiuaNPVlmr9m8UI7tJv0+NJoLUy0D6jd+DnIgcVJlIQDmkvfHbQr/i9Sy+SNSt6Ltq',
        'encSecKey': 'a2c2e57baee7ca16598c9d027494f40fbd228f0288d48b304feec0c52497511e191f42dfc3e9040b9bb40a9857fa3f963c6a410b8a2a24eea02e66f3133fcb8dbfcb1d9a5d7ff1680c310a32f05db83ec920e64692a7803b2b5d7f99b14abf33cfa7edc3e57b1379648d25b3e4a9cab62c1b3a68a4d015abedcd1bb7e868b676'
    }
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Referer': f'http://music.163.com/song?id={song_id}',
        'Host': 'music.163.com',
        'Origin': 'http://music.163.com'
    }
    
    response = requests.post(api_url, headers=headers, data=post_data)
    result = json.loads(response.text)
    
    comments = []
    for entry in result.get('hotComments', []):
        comments.append(entry['content'])
    
    return comments

if __name__ == '__main__':
    song_id = 439915614
    hot_comments = fetch_hot_comments(song_id)
    for comment in hot_comments:
        print(comment)

Running this script outputs the hot comments for the specified song.

Word Cloud Ganeration

The wordcloud library provides straightforward word cloud generation capabilities. Install it via pip and consult the official documentation for basic usage patterns.

Chinese text rendering requires specifying a font file that supports Chinese characters. The font_path parameter in the WordCloud constructor handles this:

from wordcloud import WordCloud
import matplotlib.pyplot as plt

from scraper import fetch_hot_comments

song_id = 439915614
text_content = " ".join(fetch_hot_comments(song_id))

cloud = WordCloud(
    random_state=1,
    font_path=r'C:/Users/Windows/fonts/simkai.ttf'
).generate(text_content)

plt.figure()
plt.imshow(cloud, interpolation='bilinear')
plt.axis('off')
plt.show()

Visual Results

The generated word cloud displays frequently occurring terms from the hot comments, providing an intuitive visualization of what resonates most with listeners.

Potential Enhancements

Custom masks: Generate word clouds shaped like specific images or patterns
Batch processing: Scrape comments from multiple songs by iterating through different song IDs
Service extraction: Wrap the functionality into a REST API for serving word clouds on demand

Tags: Python Web Scraping WordCloud

Back to List

Prev: Manipulating Element Text Content and Dynamically Adding or Removing DOM Nodes

Next: Algorithmic Solutions: Calendar Cycles, Subsequence Optimization, and Geometric Counting

Fading Coder

Scraping NetEase Cloud Music Hot Comments to Generate Word Clouds

Data Collection

Packet Analysis

Handling Encrypted Parameters

Extracitng Hot Comments

Word Cloud Ganeration

Visual Results

Potential Enhancements

Related Articles

Efficient Usage of HTTP Client in IntelliJ IDEA

Installing CocoaPods on macOS Catalina (10.15) Using a User-Managed Ruby

Resolve PhpStorm "Interpreter is not specified or invalid" on WAMP (Windows)

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Scraping NetEase Cloud Music Hot Comments to Generate Word Clouds

Data Collection

Packet Analysis

Handling Encrypted Parameters

Extracitng Hot Comments

Word Cloud Ganeration

Visual Results

Potential Enhancements

Related Articles

Efficient Usage of HTTP Client in IntelliJ IDEA

Installing CocoaPods on macOS Catalina (10.15) Using a User-Managed Ruby

Resolve PhpStorm "Interpreter is not specified or invalid" on WAMP (Windows)

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment