Cluster analysis/factor analysis Harvard Case Solution & Analysis

Cluster analysis/factor analysis Case Study Solution

Cluster analysis

Cluster analysis is a group of techniques which are used to group similar objects. The groups are formed with similar characteristics of the objects. All homogenous objects are collected in a single group and are known as “cluster”. These objects include respondents, groups and other entities. When viewing geographically, objects with similar characteristics grouped in a cluster are close together. On the other hand, clusters of different objects are viewed far apart.

Cluster analysis is the analysis of different groups and the techniques which are used to form the groups. There are three kinds of methods used to form a cluster. These methods are hierarchical clustering, k-means clustering and two step clustering. All of these methods are used for different purposes and for different types of data. However, the analysis begins with the case quantity and the bases of variables to form the cluster group. Next, standardization of variable for the purpose of equal contribution of distance. Lastly, the appropriate method is selected, which is based on a number of cluster, size of clusters and the types of variables that are used.

Hierarchical cluster is known to be an aggressive and straight forward method of forming clusters. Hierarchical clusters can be divided into agglomerative and decisive. In agglomerative clustering, the case includes clusters in itself. As aresult, similar clusters are successful merged. Moreover, as the cluster is formed, it cannot be split, but can only be combined with other similar clusters. In divisive clusters, all of the objects are placed in one cluster and results in the objects in different clusters. When using hierarchical clusters, basis of differentiation and similarity of objects must be selected. In addition, minimum and maximum numbers of clusters and basis for merging clusters at successful step must be determined.

K-means clustering doesn’t require all distances computations. It is used when categories and data are not defined. This clustering method results in forming the groups on the basis of k which represents the number of groups needed to be formed. All the observations belong to the cluster with the nearest mean number. This method starts from the collection of the means and the cases are classified on the basis of distances from their respective centers. Next, the cluster mean is recalculated on the basis of cases that are assigned. Again means are reclassified on the basis of new recalculated means. This step is repeating, until the mean is reached to a certain level of succession. Finally, mean of all clusters calculated are assigned to permanent clusters.

Two step cluster analysis procedure is designed for a large amount of data, which can form different clusters of either categorical or continuous data. This method requires a very large set of data. This will result in solution of data which is based on mixed variables for different number of clusters. The best result is provided when all variables are independent. Continuous variables have normal distribution and categorical variables have multi nomial distribution. However, if required data is not available, even then, algorithm is expected to behave reasonably.

Factor analysis

The factor analysis is an explorative analysis, which groups the variables with similar characteristics into dimensions. However, there is no distinction between independent and dependent variable in factor analysis. It is used for many purposes such as simplification of data or reducing the number of variables. This analysis is also used to describe the correlation and variability among the components or factors, which resulted in the identification of unobserved or underlying variables. Moreover, latent variables and errors are also identified.

There are many types of factor analysis however, we have used the Principal Component Analysis to identify the extracted factors. Moreover, scree plots diagram is also used for analysis. This graphical representation shows that which factor and component has the most variability according to the data.

Table 1 results

The results of factor analysis of table 1 show the correlation that interprets the relationship between all the factors. In addition, it shows the explanation of variances in initial eigenvalues. The scree plot that is the graphical presentation that shows relationship between the components and the eigenvalues. These relationships describe that there is only one component that is extracted from the component matrix as well as rotational component matrix.

Table 2 results

The results of table two show that all items have 1 initial. The total number of components its eigenvalues with its variance percentage and cumulative percentage is shown. Moreover, rotational sum of squared loadings is also calculated.

In addition, the graphical presentation of relationship between eigenvalue and components is also designed. However, this presentation shows that there are 3 components/factors that should be extracted. With the principal component analysis, rotated component matrix has also been developed........................

This is just a sample partial work. Please place the order on the website to get your own originally done case solution.