Home > Tech > Content

Hadoop Framework Installation and Application Experiments

Tech 1

Experiment Requirements

Applicable Majors: Computer Science and Technology, Software Engineering, Internet of Things Engineering

Learning Objectives: Understand distributed architecture and Linux commands, achieve proficiency in Hadoop installation, HDFS programming, and MapReduce development.

Experiment Project List:

No.	Project	Hours	Type	Description
1	Hadoop Installation and Use	4	Validation	Install Linux on a virtual machine, set up Hadoop, and learn basic Linux and Hadoop commands.
2	HDFS Application	4	Design	Master Hadoop file commands, compile and run Java programs on Linux, perform file read/write operations, and sort data.
3	MapReduce Application I—Top 10 Word Count	4	Design	Develop, compile, and execute MapReduce programs in Linux using Java. Create a word count program to identify the top 10 most frequent words.
4	MapReduce Application II—Movie Recommendation	4	Design	Implement a movie recommendation system using a three-phase MapReduce solution: count raters per movie, identify users who rated both movies A and B, and compute correlations between movie pairs.

Experiment Report Guidelines: Reports must include experiment title, objectives, completion date, core design concepts and algorithms, results, and summary. Identical reports are prohibited.

Grading Criteria: Grades consist of attendance (10%), experiment completion (80%), and report quality (10%). Completion is based on the four experiment outcomes; report are evaluated on content, formatting, accuracy, and originality.

Recommended Textbook: Principles and Applications of Big Data Technology (2nd Edition, Lin Ziyu, People's Posts and Telecommunications Pres, 2017)

Reference Books:

Data Algorithms/Hadoop/Spark Big Data Processing Techniques (Mahmoud Parsian, translated by Su Jinguo et al., China Electric Power Press, 2016)

Experiment 1: Hadoop Setup and Configuration

Content

Install Linux on a virtual machine, deploy Hadoop, and explore basic Linux and Hadoop commands.

Objectives

Comprehend the three operational modes of Hadoop.
Master the installation process for Hadoop pseudo-distributed mode.
Develop independence in configuring Hadoop pseudo-distributed installations.
Gain familiarity with the Eclipse development environment.
Acquire proficiency in installing Hadoop development plugins.

Environment

Linux Ubuntu 16.04
JDK 8u162 Linux x64
Hadoop 3.1.3
Eclipse 4.7.0 Linux GTK x86_64

Procedure

Refer to detailed tutorials such as those by Professor Lin Ziyu from Xiamen University.

Results

Navigate to the Hadoop directory and start the distributed file system:

cd /usr/local/hadoop
./sbin/start-dfs.sh

Verify the running processes with jps and access the web interface at localhost:9870.

Experiment 2: HDFS File Operations and Sorting

Content

Utilize Hadoop file commands, write, compile, and execute Java programs on Linux to perform file read/write operations and sort data.

Objectives

Become adept with Hadoop file commands.
Gain experience in writing, compiling, and running Java applications on Linux.
Understand the usage of the Eclipse development environment.
Skillfully install Hadoop development plugins.

Environment

Same as Experiment 1.

Procedure

Start Hadoop services.

Upload text files to HDFS:

./bin/hdfs dfs -put /home/user/data/file1.txt /user/hadoop

Verify the upload by listing the directory contents.
Launch Eclipse and configure the necessary JAR files from Hadoop's common, common/lib, hdfs, and hdfs/lib directories.
Develop a Java application, package it into a JAR, and execute it.
Retrieve and display the output from HDFS.

Results

Confirm successful startup, file upload, and execution of the Java application with sorted output.

Source Code

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.util.*;

public class HDFSSort {
    public static void main(String[] args) {
        try {
            List<Integer> values = new ArrayList<>();
            Configuration config = new Configuration();
            config.set("fs.defaultFS", "hdfs://localhost:9000");
            FileSystem hdfs = FileSystem.get(config);
            
            Path sourceFile1 = new Path("input1.txt");
            Path sourceFile2 = new Path("input2.txt");
            BufferedReader reader1 = new BufferedReader(new InputStreamReader(hdfs.open(sourceFile1)));
            BufferedReader reader2 = new BufferedReader(new InputStreamReader(hdfs.open(sourceFile2)));
            
            String line;
            while ((line = reader1.readLine()) != null) {
                values.add(Integer.parseInt(line.trim()));
            }
            while ((line = reader2.readLine()) != null) {
                values.add(Integer.parseInt(line.trim()));
            }
            reader1.close();
            reader2.close();
            
            Collections.sort(values);
            for (int num : values) {
                System.out.println(num);
            }
            
            Path resultFile = new Path("sorted_output.txt");
            FSDataOutputStream writer = hdfs.create(resultFile);
            for (int num : values) {
                writer.writeBytes(String.valueOf(num) + "\n");
            }
            writer.close();
            hdfs.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Experiment 3: MapReduce Word Frequency Analysis

Content

Write, compile, and run a MapReduce program in Linux using Java to perform word count and identify the top 10 most frequent words.

Objectives

Gain proficiency in developing, compiling, and executing MapReduce applications on Linux.
Familiarize with the Eclipse development environment.
Successfully install Hadoop development plugins.

Environment

Same as previous experiments.

Procedure

Start the Hadoop cluster.
Inspect HDFS directories and remove any residual folders.
Create input directories and upload a text file for processing.
In Eclipse, add required JARs from Hadoop's common, common/lib, and mapreduce directories.
Develop the MapReduce program, package it, and run it with specified input and output paths.
Examine the results stored in the output directory.

Results

Verify the execution of the MapReduce job and the generation of the top 10 word frequency list.

Source Code

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import java.util.*;

public class TopWordCount {
    public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer tokenizer = new StringTokenizer(value.toString());
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }
    
    public static class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private Map<String, Integer> frequencyMap = new HashMap<>();
        
        public void reduce(Text key, Iterable<IntWritable> values, Context context) {
            int total = 0;
            for (IntWritable val : values) {
                total += val.get();
            }
            frequencyMap.put(key.toString(), total);
        }
        
        @Override
        protected void cleanup(Context context) throws IOException, InterruptedException {
            List<Map.Entry<String, Integer>> entries = new ArrayList<>(frequencyMap.entrySet());
            entries.sort((a, b) -> b.getValue().compareTo(a.getValue()));
            
            int limit = Math.min(10, entries.size());
            for (int i = 0; i < limit; i++) {
                Map.Entry<String, Integer> entry = entries.get(i);
                context.write(new Text(entry.getKey()), new IntWritable(entry.getValue()));
            }
        }
    }
    
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "top word count");
        job.setJarByClass(TopWordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setReducerClass(SumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Experiment 4: Movie Recommendasion with MapReduce

Content

Implement a three-stage MapReduce solution for movie recommendations: count raters per movie, identify common raters for movie pairs, and compute inter-movie correlations.

Objectives

Develop a multi-phase MapReduce pipeline to analyze user ratings and derive movie associations.

Environment

Same as previous experiments.

Procedure

Design and implement MapReduce jobs for each stage, ensuring data flow between phases. Test with sample rating datasets.

Results

Produce a set of movie correlations based on shared user ratings.

Tags: Hadoop hdfs MapReduce

Back to List

Prev: Network Address Translation: Concepts, Types, and Configuration

Next: Comprehensive Technical Overview of Python Core Concepts and Systems

Fading Coder

Hadoop Framework Installation and Application Experiments

Experiment Requirements

Experiment 1: Hadoop Setup and Configuration

Content

Objectives

Environment

Procedure

Results

Experiment 2: HDFS File Operations and Sorting

Content

Objectives

Environment

Procedure

Results

Source Code

Experiment 3: MapReduce Word Frequency Analysis

Content

Objectives

Environment

Procedure

Results

Source Code

Experiment 4: Movie Recommendasion with MapReduce

Content

Objectives

Environment

Procedure

Results

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Hadoop Framework Installation and Application Experiments

Experiment Requirements

Experiment 1: Hadoop Setup and Configuration

Content

Objectives

Environment

Procedure

Results

Experiment 2: HDFS File Operations and Sorting

Content

Objectives

Environment

Procedure

Results

Source Code

Experiment 3: MapReduce Word Frequency Analysis

Content

Objectives

Environment

Procedure

Results

Source Code

Experiment 4: Movie Recommendasion with MapReduce

Content

Objectives

Environment

Procedure

Results

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment