Analytic for Small and Medium Enterprises

Analytic for Small and Medium Enterprises

In this article , we look at the tools , methods and opportunities for Small and Medium Enterprises to take advantage of the existing information as well as acquiring new information to compete with the bigger players who could well afford entire teams of DBAs and Analysts.

The Technology

We look at the foundation tools of the analytic , from database to the programming languages and methods that allow anyone to freely dissect and analyze the data without opening the wallet.

1) Database

MySQL Database (Alternatives – PostgreSQL )

Probably the most famous and popular database in the world. The free version costs nothing and yet powers millions of websites on the net. It is also a tale of why there is always a bigger fish in the ocean. With it , there is no excuse for any companies , whether big or small , not to have a database. All you need would be a DBA.

MySQL < Acquired by Sun Microsystems < Acquired by Oracle It is free to download at Here

2) One programming language to rule them all

R Programming Language (Alternatives – Python )

R has exploded in usage in recent years as the de-facto tool to extract , transform and analyze the data. Although the language itself has its own quirky syntax , the ease of setting up and analyzing data without learning most of the language itself has made it the darling of the Data Analysts / Data Scientists everywhere.

The fact that R has probably then best graphics output for any programming language right out of the box made it doubly so as one need not have to purchase or install external libraries or packages for displaying the information in graphics mode.

R could be downloaded for free here.

3) Visual Analytic Programs

Rattle (an R package) / RapidMiner (Free Edition) / Tableau (Free Edition) / Orange (Open Source)

These are the programs that allow non programmers to Extract , Transform and Analyze the data from GUI. Doing a regression analysis could be as easy as loading the csv file and clicking a few buttons with these programs. The downside of it is of course , you can only do as much as the program allows. But for most analytic projects , these programs offers more than sufficient options.

RapidMiner website :
Tableau website :
Orange website :

Regards ,
Billy Aung Myint

End-to-End Analytic using Linear Regression – Part 1

#End-to-End Analytics using Linear Regression in R – Part 1

#The aim of the tutorial is to provide simple example on how to start a #data analytic process using Linear Regression model in R , using lm() #function.

#Initialize the variables

score <- c(1950,1900,1822,1700,1600)
age <- c(18,25,39,50,75)

#Display and Update the variables

score[1] &lt;- 1955

#Simple plotting of the variables

hist(score , main = "Histogram of Score")

#Consolidating the variables into a data.frame
risk <- data.frame(score,age)

#Understanding the type of variables


#Subsetting the data.frame

risk$score &gt; 1900

subset(x = risk , score &gt; 1800)
subset(x = risk2 , score &gt; 1800 | sex == "M")
subset(x = risk2 , score &gt; 1800 &amp; sex == "M")
subset(x = risk2 , score &gt; 1800 &amp; sex == "M" , select = score:age )

# The regression &lt;- lm(score ~ age)

# Call:
#   lm(formula = risk)
# Residuals:
#   1       2       3       4       5
# 7.902   2.085  12.452 -40.118  17.679
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)
# (Intercept) 2055.711     27.344   75.18 5.19e-06 ***
#   age           -6.312      0.594  -10.63  0.00178 **
#   —
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 26.73 on 3 degrees of freedom
# Multiple R-squared:  0.9741,  Adjusted R-squared:  0.9655
# F-statistic: 112.9 on 1 and 3 DF,  p-value: 0.001781

new.age &lt;- data.frame(age = c(80,90))
predict(lm(score ~ age), new.age)
# 1        2
# 1550.762 1487.643

#add two new variables and do the same again
debt &lt;- rnorm(n = 5 , mean = 50000 , sd = 20000)
sex &lt;- c("M","F","M","M","F")

risk2 &lt;- data.frame(score,age,debt,sex) &lt;- lm(risk2)

new.age &lt;- data.frame(age = c(80,90))
predict(, new.age)

The complete script can be found here.

Regards ,
Billy Aung Myint

Volume by Price Charts using R

A neat use of R quantmod library to do a Volume Price Charts analysis. A lovely chart as well!

The original author and the source at below the code.



#Original Code

#Change the ticker to get chart of any “yahoo” symbol
ticker = “AAPL”
symbol <- getSymbols(ticker)
stock <- xts(get(symbol))

#remove stock name
names(stock)[names(stock)==paste(symbol,’.Open’,sep=””)] <- ‘Open’
names(stock)[names(stock)==paste(symbol,’.Close’,sep=””)] <- ‘Close’
names(stock)[names(stock)==paste(symbol,’.Volume’,sep=””)] <- ‘Volume’
names(stock)[names(stock)==paste(symbol,’.Adjusted’,sep=””)] <- ‘Adjusted’
names(stock)[names(stock)==paste(symbol,’.High’,sep=””)] <- ‘High’
names(stock)[names(stock)==paste(symbol,’.Low’,sep=””)] <- ‘Low’

#Add Positive and Negative Volumes
stock$posVbP <- Vo(stock[which(Lag(Cl(stock)) <= Cl(stock))])
stock$negVbP <- Vo(stock[which(Lag(Cl(stock)) > Cl(stock))])

#Since NAs got generated, replace NAs with 0.
stock[] <- 0

#Subset for data since May 2011
myQ <- stock[‘2011-06::’]

#Define function to add positive and negative volumes by prices
pVolBlock <- function(x) sum(myQ$posVbP[myQ$t==x])
nVolBlock <- function(x) sum(myQ$negVbP[myQ$t==x])

funcPriceByVol <- function(x){
myDiv <- 50                                    #Divisor for stock
myQHi <- as.integer(ceiling(max(Cl(myQ))/myDiv)*myDiv)     #Identify High of Series
myQLo <- as.integer(floor(min(Cl(myQ))/myDiv)*myDiv)    #Identify Low of Series

myBreaks <- as.integer(seq(myQLo, myQHi, by=myDiv))        #Create Breaks of interval divisor

# Identify and assign price intervals
myQ$t <<- myBreaks[findInterval(myQ$Close,myBreaks,all.inside=T)]

myVolsP <- unlist(lapply(myBreaks,pVolBlock))    #Add Positive Volumes to block
myVolsN <- unlist(lapply(myBreaks,nVolBlock))    #Add Negative Volumes to block
myVols <- rbind(myVolsP,myVolsN)            #Bind the Positive and Negative Volumes
colnames(myVols) <- myBreaks                #Define Column Naes

myPBV <- funcPriceByVol()

#lets Plot the graph now

plot(Cl(myQ),yaxt=”n”, ylab=””,xlab=”Time”,
main=paste(ticker, “Close:- Volume by Price”),sub=”Market Analyzer”)
beside=F,horiz=T, col=c(rgb(0,1,0,alpha=.3),rgb(1,0,0,alpha=.3)),
space=10, width=3,xaxt=”n”,yaxt=”n”,


Original Author and Source here.

Regards ,
Billy Aung Myint

Save / Reset Par() – Graphical Parameters in R / RStudio

Par() or Graphical Parameters settings can be life saver or really pain-in-the-a** , so here are some tricks to save yourself from the frustration.

– If you are using RStudio and want to reset par() to default , just click “Clear All” in the plot tab on the bottom right. That will reset the settings to original factory default.

– If you know you are going to mess it up and want to save a backup to restore later , here are a few ways.

1) Just type par() and you will get the entire setting arguments. Just need to copy and paste to a file to reset later.

2) But if it is not nerdy enough for you then try these commands ,

#save par settings
par.original <- par(no.readonly=TRUE);

#write to a file in working directory , check with getwd() function
save(par.original, file=”R.original.par.RData”)

#load what you have saved above

#now reset to whatever that was originally

Sweet? Good.

Regards ,
Billy Aung Myint

Copyright 2012 - 2016 ©