Business Applications Lab

Friday, 29 March 2013

IT Business Applications Lab : Session 10

Date 26-03-2013

Assignment 1 :

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
Create a 3D vector by binding the 3 individual vectors
Create 3 dimensional plot of the same.

Solution :

Step1:
Creating a dataset with the help of rnorm

Step2:
Sampling 3 vectors of equal length from the above data set

Step3:
Binding the 3 vectors together to create a 3-D vector

3D plotting

Type1:

plot3d(T[,1:3])

Type2: with axis labels and color

plot3d(T[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000))

Type3: with axis labels ,color and points type as spheres

plot3d(T[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="s")

Type4: with axis labels ,color and points type as lines

plot3d(T[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="l")

Assignment 2:

Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Solution :

Step1:
Create 2 random variables x,y using rnorm
Add a 3rd variable by sampling data and using the factor as shown -:

graphs

Type1:
qplot between x and y

Command used

qplot(x,y)

Type2:
qplot between x and z

Command used

qplot(x,z)

Type 3
Semi transparent qplot between x and z with alpha

Command used

qplot(x,z , alpha=I(4/10))

Type 4:
colored plot

Command used
qplot(x,y , color=z)

Type 5:
Logarithmic colored plot

Command used
qplot(log(x),log(y) , color=z)

Type 6:
smooth curve and best fit line using geom

Command used

qplot(x,y,geom=c("path","smooth"))

Command used

qplot(x,y,geom=c("point","smooth"))

Command used

qplot(x,y,geom=c("boxplot","jitter"))

Friday, 22 March 2013

Infographics : Visual Resume building

As a job seeker or a student , it is important to have a resume that stands out among the rest — one of the more visually pleasing options on the market today is the infographic resume.

An infographic resume enables a job seeker to better visualize his or her career history, education and skills.

For those of us not talented in design, it can also be costly to hire an experienced designer to toil over a career-centric infographic.
Luckily, a number of companies are picking up on this growing trend and building apps to enable the average job seeker to create a beautiful resume.

As a part of this assignment , I searched for top sites/applications online which help create a good infographic resume. I came across a plethora of such sites but i chose 4 of those for a detailed study. These are as listed below - :

1. Visual.ly/create
2. re.vu
3. kinzaa.com
4. visualize.me

Most of them follow the same process flow. they are equipped with a feature to pick up user data and their career information from LinkedIn.

1. visual.ly

create infographics with visual.ly

Pros:
- Allows to choose between 4-5 themes with different gradient.
- Options to tweet , share on FB , Pin and share on other social media sites
- provides option to download as PDF , mail to your email ID
- Ease of data access , no need to edit/enter any data.

Cons -:
- Doesn't allow to play around with structure or format of the resume. Less options to customise the graphics.

2. re.vu page

Re.vu allows you to make a dedicated page for your resume , allows to add widgets and arrange them as per your choice.
it also provides a design option wherein you can choose to upload a background image of your choice or choose from themes provided.

Link to re.vu page

http://re.vu/bjuneja60015

3. Visualize.me

Visualize.me
- allows to connect to the available data on LinkedIn
- provides the option of inline editing of profile in case some info needs updation.
- provides options to choose theme , styles ( color, fonts and background)
- has an upcoming feature called "portfolio" wherein you can highlight some of the notable projects you’ve worked on.
- you can save the changes and also share the page on social media networks

Icon to Visualize.me infographic resume -->

P.S -: the site is in beta and is coming up with hoards of new exciting features.

4. Kinzaa.com

Kinzaa.com is another professional service on Infographic resumes.

Features :
- It prompts the user to add data about your work profile (education/career interests)
- Highlights your current skills and the skill level
- Also provides slider based view on your professional priorities asked during the time of profile building
- Describes your work history , education interactively focusing on the major responsibilities.
- Describe your personality in a separate section of the resume which is different and not available on other resume building sites
- Provides an option to print the pdf and share the resume online.

Link to Kinzaa.com resume page -: http://kinzaa.com/12BM60015

===============================================================================

Thursday, 14 March 2013

IT Business App Lab- Session #8

Date : 12 Mar, 2013

In this session we learnt about the panel data generation and its various models.

Panel Data refers to the combination of various time series data cascaded together
The basic function used for panel data generation and estimation is plm.

The data set we have used in this session in "Produc".

The description for the same is as under.It contains the following data headings

- state : the state
- year : the year
- pcap: private capital stock
- hwy : highway and streets
- water: water and sewer facilities
- util: other public buildings and structures
- pc: public capital
- gsp: gross state products
- emp: labor input measured by the employement in non–agricultural payrolls
- unemp: state unemployment rate

Download and Load the "plm" package.
Use the data set "Produc" , a panel data set within plm package for panel estimations

Assignment
to calculate the values for all the 3 models and decide which models best fits the data set for panel estimation ?

Solution :

Step1 : calculating value for pooling model

Step2 : calculating value for fixed model

Step3 : calculating value for random model

Now to choose the best model that fits the data set "Produc" , we need to run pairwise hypothesis tests among the 3 models and select the best fit in the end.

Test1 :
Between pooling and fixed model

Command used :
pFtest (fixed1 , pooled)

Test details :
H0: Null: the individual index and time based params are all zero
Alternative Hypothesis : atleast one of the index and time based params are non zero

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence Fixed model is better than the pooling model.

Test2:

Between pooling and random model

Command used :
plmtest (pooled)

Test details :
H0: Null: the individual index and time based params are all zero : Pooling Model
Alternative Hypothesis : atleast one of the index and time based params are non zero : Random Model

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence random model is better than the pooling model.

Test3:

Between fixed and random model

Command used :

We use Hausman test -:
phtest(random1 , fixed1)

Test details :
H0: Null: individual effects are not correlated with any regressor : Random Model
Alternative Hypothesis : Individual effects are correlated : Fixed Model

The hypothesis test suggests that the one of the models is inconsistent.
As the p-value is too low.. Null hypothesis is rejected.

Hence fixed model is better than random model.

Conclusion -:
After the series of tests , we can conclude that fixed model best fits the "Produc" data set panel data estimations. i.e there is significant correlation observed with the regressor variables and index impact exists.
Hence we would choose "Fixed" model to estimate the panel data presented by "Produc" data set.

Wednesday, 13 February 2013

IT Business Applications Lab : Session 6

Date : 12-Feb-2013

Assignment 1 - Doenload data for NIFTY index from 1st Jan , 2012 to 31st Jan 2013.. Calculcate the log of returns data and find out the historical volatility.

Soln -:

Commands used - :

readData<-read.csv(file.choose() , header=T)
closePrice<-readData[,5] // Reading Closing Price Column
closePrice.ts<-ts(closePrice , frequenxy=252) // making a time series
varLag<- lag(closePrice.ts , k=-1) // calculating stock price for time (t-1)
logNum<- log(closePrice.ts , base=exp(1)) - log(varLag , base=exp(1)) // Calculating log
LogReturns<-logNum/log(varLag , base=exp(1)) // calculating log for returns data

// To calculate Historical volatility
sqrt<-(252)^0.5
histVolaitility<-sd(logreturns)*sqrt

Assignment 2 :

To create an acf plot for the log returns data calculated previously. Also do and adf test and interpret the findings.

Soln -:

// to create acf plot

acf(logReturns)

Grahical Interpreation
- the blue dotted lines represent confidence interval for the hypothesis (95% in default case)
- As all the co-relations plots(vertical lines) lie inside those two blue dotted lines , we can safely suggest that the returns data is "Stationary" in nature. This is visual inspection method for determining stationarity.

using ADF test

Command used
adf.test(logReturns)

Interpretation from ADF test
Null Hypothesis -: The returns data is not Stationary
Alternative Hypothesis -: Returns Data is stationary

As from the test results p-value = 0.01 which is less than 0.05 value as stated for 95%confidence interval.
Hence Null Hypothesis is rejected.

Results -: given data is stationary in nature

========================================================================

Thursday, 7 February 2013

Session 5 - IT Business Application Lab

Assignment 1 -:

Creating a table with closing prices and the differences having start point at 10th data pt and end pt as 95th data point.

Soln -:

Command Used -:

NSEData<-read.csv(file.choose(),header=T) // read file with data from 1 Jul 2012 to 31 Jan 2013

head(NSEData) // to display first few columns

closeCol<-NSEData$Close // to retrieve "Close" column contents from data into closeCol object

closeCol.ts1<-ts(data=closeCol.ts1[10:95],deltat=1/252) // Create time-series objects for close data from element (1,10 to1,95)

summary(closeCol.ts1) // showing summary

closeCol.diff = diff(closeCol.ts1) // Calculate difference between preceding and succeeding value

retVar = closeCol.diff/lag(closeCol.ts1 , k=-1) // calculating returns

retFinal = cbind(closeCol.ts1 , closeCal.diff , retVar) // creating a table for data , difference and return

Plotting Graphs

Graph -:

graph between Data , difference and Return

Assignment 2 :

Data from S.no 1 to 700 is provided. Provide predictions for data from S.No 701 to 850.
Use glm estimation and do LOGIT Analysis for the same.

Soln 2:

Command Used -:

fileData<-read.csv(file.choose(),header=T) // reading file

selData<-fileData[1:700 , 1:9] // getting first 700 rows of data

head(new)

// Identifying the factor and running the Logit regression
selData$ed <- factor(selData$ed) // ed column as factor
selData.est<-glm(default ~ age + ed + employ + address + income, data=selData, family ="binomial")
summary(selData.est)

// predicting the values for data set 701-850
newData<-fileData[701:850,1:8]
newData$ed<-factor(newData$ed)
newData$prob<-predict(selData.est, newdata =newData, type = "response")
head(newData)

=====================================END===================================

Tuesday, 22 January 2013

Business Application Lab : Session 3

Day3 - 22Jan , 2013

Assignment 1
Read the set of data given in the .csv file and fit a linear model for the data set.
Comment on it s applicability.

Soln -:

Commands used -:

reg1 <- (DependentVariable ~ Independent Variable) - // to calculate regression coefficient

res1<-resid(reg1) - // to calculate residuals
resStd<-rstandard(reg1) - // to calculate standard residuals

Plot between Independent variable and residuals

Plot between Independent variable and standard residuals

Q-Q Normal plot

Q-Q normal plot fit with a line

-Regression applicability- :
As the plot is scattered , non-linear and shows a parabolic pattern , the application of linear regression is not feasible.

Assignment 1(b)

Data set with Alpha and Pluto

Read data from the csv file and calculate the regression

Plot between Independent variable and residuals

Q-Q Normal Plot

Q-Q Normal Plot fit with a line

-Regression applicability- :
As the plot is random with lots of points around the Q-Q normal plot line , linearity is visible. hence application of linear regression is possible

Assignment 2
To justify a NULL Hypothesis for a given data using ANOVA

Soln-:
Commands used -:

var_name.anv<-aov(<var_name>$<Dependent Variable> ~ <var_name>$<Nominal_scale_variable>)
summary(var_name.anv)

As shown , after reading the data from a csv file

The result shows the P value of the Hypothesis to be 0.687 which is very much greater than the assumed confidence interval.

hence Null hypothesis cannot be rejected.

Tuesday, 15 January 2013

Business Application Lab - Session 2

15th Jan,2013
Assignment 1 -:
To bind columns/rows from 2 different matrices into a new matrix

Sol -:
Matrix 1 assignment and generation

> mat1<-c(1:10)
> dim(mat1)<-c(2,5)

Matrix 2 assignment and generation

> mat2<-c(11:16)
> dim(mat2)<-c(2,3)

Taking 3 column from matrix1 and 2nd column from matrix 2
Binding using the cbind and rbind functions as shown -:

Assignment 2
Multiply 2 matrices

Sol -:
Command to multiply 2 matrices
> multip <- z1 %*% z2

Assignment 3-:
To read NSE historical data dated from 1st Dec, 2012 to 31st Dec, 2012 from a .csv file.
To find regression between the High Price and the opening share price and also calculating the residuals.

Soln- :
Command For Regression :
> reg1<-lm(HighPrice ~ OpenPrice , data = NSEData)

NSEData - Object with file historical data
High Price - Dependent variable
Open Price - Independent variable

Residuals

Assignment 4
To generate data for a normal distribution and plot the distribution curve

Soln -:
To generate normally distributed random numbers function used is -:

rnorm(N, mean,sd)
where N is the no of observations
mean is the mean vector
sd - standard deviation

As shown below -:

The plot is as shown -: