Big Data Tools & Techniques For Msc Harvard Case Solution & Analysis

Big Data Tools & Techniques For Msc Case Study Analysis

Table Creation from Data Folder to MySQL

Creating tables from the data which was extracted previously, using pcloudera and load the data into MySQL.

mysql-uroot-pcloudera<db_setup.sql

mysql-uroot-pcloudera<diagnoses.sql

mysql-uroot-pcloudera<imaging.sql

mysql-uroot–pcloudera<hearing_evaluation.sql

First, we have created tab separated file using below command:

mysql-uroot-pclouderaassignment-e"select*fromimaging"-B>imaging.csv

mysql-uroot-pclouderaassignment-e"select*fromdiagnoses"-B>diagnoses.csv

mysql-uroot-pclouderaassignment-e"select*fromhearing_evaluation"-B>hearing_evaluation.tsv

To create the actual csv file for all the given tables (i.e. diagnoses, hearingevaluation and imaging). We have to use the following command for each table to form the output from the tables and split it using the following regex command.

mysql-uroot-pclouderaassignment-e"select*fromdiagnoses"-B|sed"s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g">diagnoses.csv

mysql-uroot-pclouderaassignment-e"select*fromhearing_evaluation"-B|sed"s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g">hearing_evaluation.csv

mysql-uroot-pclouderaassignment-e"select*fromimaging"-B|sed"s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g">imaging.csv

Above command filters out the data as per requirement.  We will repeat same filtering for all the three files before importing to the csv file.

Importing data to Hadoop

After extracting and creating the data into the MySQL, we have to import the data into Hadoop for the analysis. I have used the following commands to import the data into Hadoop from MySQL.

  • sqoop import --connect jdbc:mysql://localhost:3306/assignment --table diagnoses --username root --password cloudera --target-dir /sqoop_import -m 1
  • sqoop import --connect jdbc:mysql://localhost:3306/assignment --table hearing_evaluation --username root --password cloudera --target-dir /sqoop_import -m 1

sqoop import --connect jdbc:mysql://localhost:3306/assignment --table imaging --username root --password cloudera --target-dir /sqoop_import -m 1

Analyze the data

This process is to find out or figure out the outcomes of the data, which means analysis of the data using hive. Hive uses the map reducer technique to figure out the output of the command run on the hive cluster. Hive is a method for the retrieval of organized data in the Hadoop Data Warehouse. It exists on Hadoop to synthesize broad data and allows it simple to search and evaluate. Hive was originally developed by Facebook and subsequently developed by the Apache Software Foundationas an open source called the Apache Hive. This is utilized by different companies. In Amazon ElasticMapReduce, for example, Amazon uses it....................................

 

This is just a sample partical work. Please place the order on the website to get your own originally done case solution.

Share This

SALE SALE

Save Up To

30%

IN ONLINE CASE STUDY

FOR FREE CASES AND PROJECTS INCLUDING EXCITING DEALS PLEASE REGISTER YOURSELF !!

Register now and save up to 30%.