Entries
Text Generation
IN PROGRESSText Generation (Generative Texr) is a very interesting field of study. There are a number of different packages that help generate the lists of words to give the user a better understanding of the technology. Tracery The following Node library gives a structure which is used to generate random words that are assigned to lexical structure that is defined by the user. centar : { animal : ["wolf","bear","tiger","lion","snake","anteater"], fruit : ["banana","tomato","cherry","strawberry","starfruit"], said : ["purring", "whispering", "saying", "murmurring", "growling"], timeofday : ["morning","evening","dusk","dawn","afternoon","breakfast","breakfast"], lastSyl : "a ia ea u y en am is on an o io i el ios ax ox ix ex izz ius ian ean ekang anth".split(" "), vipTitle : ["Dr.", "Professor", "Lord", "Sir", "Captain", "His Majesty"], response : ["#animal# love #fruit##lastSyl#", "#animal# #said# at #timeofday#", "#lastSyl##lastSyl#"], "origin" : "<h3>#vipTitle# Watson, let me tell you of a #response#</h3>" } The following API
November 3, 2017
Gantt Charts in R
In ProgressUsing timevis library(timevis) data <- data.frame( id = 1:4, content = c("Item one" , "Item two" ,"Ranged item", "Item four"), start = c("2016-01-10", "2016-01-11", "2016-01-20", "2016-02-14 15:00:00"), end = c(NA , NA, "2016-02-04", NA) ) timevis(data) Using DiagrammerR library(tidyr) library(dplyr) library(DiagrammeR) mermaid(" gantt dateFormat YYYY-MM-DD title A Very Nice Gantt Diagram section Basic Tasks This is completed :done, first_1, 2014-01-06, 2014-01-08 This is active :active, first_2, 2014-01-09, 3d Do this later : first_3, after first_2, 5d Do this after that : first_4, after first_3, 5d section Important Things Completed, critical task :crit, done, import_1, 2014-01-06,24h Also done, also critical :crit, done, import_2, after import_1, 2d Doing this important task now :crit, active, import_3, after import_2, 3d Next critical task :crit, import_4, after import_3, 5d section The Extras First extras :active, extras_1, after import_4, 3d Second helping : extras_2, after extras_1, 20h More of the extras : extras_3, after extras_1, 48h ") Using Plotly If you wanted to use a more
October 17, 2017
Tips on Feature Engineering
Tips on Feature Engineering to fit how classifiers work; giving a geometry problem to a tree, oversized dimension to a kNN and interval data to an SVM are not a good ideas remove as much nonlinearities as possible; expecting that some classifier will do Fourier analysis inside is rather naive (even if, it will waste a lot of complexity there) make features generic to all objects so that some sampling in the chain won’t knock them out check previous works – often transformation used for visualisation or testing similar types of data is already tuned to uncover interesting aspects avoid unstable, optimizing transformations like PCA which may lead to overfitting experiment a lot
October 8, 2017
Great Statistics Books to Read
Following you will find a number of the best books to learn more about statistics and its philosophy. Opinionated Lessons on Statistics Introduction to Statistical Learning The Elements of Statistical Learning Applied Predicitive Modeling Statistical Inference Statistical Rethinking Data Analysis Using Regression and Multilevel/Hierarchical Models Mostly Harmless Econometrics Mastering Metrics: The Path from Cause to Effect All of Statistics Statistics Statistics for Experimenters Think Bayes Computer Age Statistical Inference Think Stats Machine Learning for Hackers Probability and Statistics Statistical Evidence: A likelihood paradigm
October 6, 2017
AB Testing in R from Scratch
Using Bayesian Systems Quantify the probability of all possibilites thus measuring risk insert institutional knowledge (add knowledge that changes the probability) learn in an online fashion A/B Testing with Approximate Bayesian Computation No mathematics required able to implement from scratch A/B Testing Measures and figures out the better design Approximate Bayesian Computation Generate a trial value for the thing we want to know (in this case its the conversion fraction of a layout) Simulate or data assuming the trail value, keep the trial value, otherwise discard it and try again If the simulation looks like the real data, keep the trial value, otherwise discard and try again Keep doing this until we’ve got lots of trial values that worked library(progress) library(ggplot2) library(reshape2) # Variables n_visitors_a <- 100 # number of visitors shown layout A n_conv_a <- 4 # number of vistors shown layout A who converted (4%) n_visitors_b <- 40 n_conv_b <- 2 Using Bayesian Systems
September 29, 2017
Tips on Creating Effective and Functional Documentation in R
Just like any skill, there is a learning curve involved in creating effective communication. This involves the code written and the documentation of its usage. Writing functional code is a intricate thing to accomplish as a newbie. It takes time to know what is efficient and how to communicate it as such. Now, writing functional documentation is more complicated as there is a delicate balance between not reguritate what the code says, and giving usable pointers to the users on how a particular function was intended to be used. It also, keeps you in line as the coder practices by actively thinking about that balance. Thus you help yourself keeping it modular and simple. So here are a few tips on how to writing effective documentation.
September 22, 2017
Tracking Change Improvements in Retail
In the ever changing world of retail; one always has to keep one step ahead of the competition and to engage with its customers. One of the best ways Formulate a test Implement Test Evaluate results Adjust the test Try again. These are all great ideas, but how do we truly watch tas things get better? library(qcc) library(xtable) library(SixSigma) library(qicharts) Cause and Effect Diagrams cManpower <- c("Recepcionist", "Record. Operator", "Storage operators") cMaterials <- c("Supplier", "Transport agency", "Packing") cMachines <- c("Compressor type", "Operation conditions", "Machine adjustment") cMethods <- c("Reception", "Transport method") cMeasurements <- c("Recording method", "Measurement appraisal") cGroups <- c("Manpower", "Materials", "Machines", "Methods", "Measurements") cEffect <- "Too high density" cause.and.effect( cause = list(Manpower = cManpower, Materials = cMaterials, Machines = cMachines, Methods = cMethods, Measurements = cMeasurements), effect = cEffect) ss.ceDiag( effect = cEffect, causes.gr <- cGroups, causes = list(cManpower, cMaterials, cMachines, cMethods, cMeasurements), main = "Cause-and-effect diagram", sub = "Pellets Density") Check Sheet data_checkSheet <- rbind( data.frame(Group = "Manpower", Cause = cManpower), data.frame(Group = "Machines", Cause = cMachines), data.frame(Group = "Materials", Cause = cMaterials), data.frame(Group = "Methods", Cause = cMethods), data.frame(Group = "Measurements", Cause = cMeasurements) ) data_checkSheet$A_supplier <- NA data_checkSheet$B_supplier <- NA data_checkSheet$C_supplier <- NA data_checkSheet Control Charts pdensity <- c(10.6817, 10.6040, 10.5709, 10.7858, 10.7668, 10.8101, 10.6905, 10.6079, 10.5724, 10.7736, 11.0921, 11.1023, 11.0934, 10.8530, 10.6774, 10.6712, 10.6935, 10.5669, 10.8002, 10.7607, 10.5470, 10.5555, 10.5705, 10.7723) myControlChart <- qcc(data = pdensity, type = "xbar.one") summary(myControlChart) Histogram hist(pdensity) par(bg = "gray95") hist(pdensity, main = "Histogram of pellets density - Sample #25", sub = "Data from ceramic process", xlab = expression("Density (g"/"cm"^3*")"), col = "steelblue", border = "white", lwd = 2, las = 1, bg = "gray") library(ggplot2) ggplot(data = data.frame(pdensity), aes(x = pdensity)) + geom_histogram(fill = "seagreen", colour = "lightgoldenrodyellow", binwidth = 0.2) + labs(title = "Histogram", x = expression("Density ("*g/cm^3*")"), y = "Frequency") Pareto Chart data_checkSheet$A_supplier <- c(2, 0, 0, 2, 1, 7, 1, 3, 6, 0, 1, 2, 0) data_checkSheet$B_supplier <- c(0, 0, 1, 1, 2, 1, 12, 1, 2, 1, 0, 0, 1) data_checkSheet$C_supplier <- c(0, 1, 0, 6, 0, 2, 2, 4, 3, 0, 1, 0, 2) data_checkSheet$Total <- data_checkSheet$A_supplier + data_checkSheet$B_supplier + data_checkSheet$C_supplier data_checkSheet data_pareto <- data_checkSheet[order( data_checkSheet$Total, decreasing = TRUE), ] par(mar = c(8, 4, 4, 2) + 0.1) barplot(height = data_pareto$Total, names.arg = data_pareto$Cause, las = 2, main = "Pareto chart for total causes") data_pareto2 <- data_pareto$Total names(data_pareto2) <- data_pareto$Cause pareto.chart(data = data_pareto2, main = "Out-of-control causes") library(qualityTools) paretoChart(x = data_pareto2, main = "Out-of-control causes") spreadvector <- rep(names(data_pareto2),times = data_pareto2) paretochart(spreadvector) x <- rep(LETTERS[1:9], c(256, 128, 64, 32, 16, 8, 4, 2, 1)) paretochart(x) Scatterplot set.seed(1234) ptemp <- - 140 + 15*pdensity + rnorm(24) plot(pdensity ~ ptemp, col = "gray40", pch = 20, main = "Pellets density vs. temperature", xlab = "Temperature (Celsius)", ylab = expression("Density ("*g/cm^3*")")) ##Stratification
September 21, 2017
Rocket Propulsion
library(tidyverse) Accelerating force $$ F=mv_e $$ thrust of the rocket is expressed in terms of the *mass flow rate * m and the efficient exhaust velocity \(v_e\) $$ V = V_e\log_e\frac{M_0}{M} $$ $M_0$ - mass of the rocket at ignition $M$ - Current mass of the rocket
June 5, 2017