RSS Feed : Mages’ Blog

RSS Feed from Mages’ Blog :

  • R in Insurance 2017
    The fifth conference on R in Insurance will be held on 8 June 2017 at ENSAE. ENSAE is the Paris Graduate School for Economics, Statistics and Finance.

    The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in Insurance.

    This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered are:

    • the use of R in a production environment
    • life insurance
    • non-life insurance
    All topics will be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

    Programs of previous editions are available online: 2013, 2014, 2015 and 2016. To learn more about past events, please visit the associated web page.

    Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the UK Actuarial Profession

    Registrations

    Registrations on the conference website are opened and will be finalized in 2017 when participants will pay the registration fees. The registration's fee will be:

    • Professional: 250 euros with the conference's dinner (150 euros without dinner)
    • Academic: 100 euros with the conference's dinner (20 euros without dinner).
    The gala dinner will take place in Musée d'Orsay.

    Keynote speakers

    We are pleased to announce that our keynote speakers are:

    Venue

    The conference will take place at ENSAE, 3 Avenue Pierre Larousse, 92240 Malakoff close to Paris.

    Committees

    Conference committee:

    You can reach the conference committe via rininsurance17@sciencesconf.org.

    The scientific committee consists of:

    Sponsors

    The organisers gratefully acknowledge the following sponsors

    Institutional sponsors:

    Read more »
  • Notes from the Kölner R meeting, 14 October 2016
    Last Friday the Cologne R user group came together for two talks and a quiz at Eye/o, the company behind Adblock Plus, in Köln-Ehrenfeld. Eye/o were a great host, offering nibbles and drinks to warm up the event and pizza at the end.

    Cologne R user meeting at Eye/o
    The first talk was given by Jiddu Alexander, a physicist turned freelance data scientist. Jiddu gave an introduction into the tidyverse. He presented the concept of tidy data, and how the tidyverse bundle can be used to manage multiple models. Furthermore, he explained the concept of learning curves for model selection. Jiddu's slides are available from his web site.
    <!--



    -->
    Jiddu Alexander explaining learning curves

    Next up was Nils Glück to share his experience on performance profiling. R code often grows from a small idea for a specific task to a longer and longer script as more and more ideas and use cases are added. Occasionally, we end up with a long and poorly documented script that 'does the job' but has become slow. Finding the bottlenecks and addressing them is good short term remedy. Nils showed us how the Rprof function of the utils package can be used to understand the performance profile of R code. Furthermore, the microbenchmark package with a function of the same name can then be used to test new approaches for a code block.

    Nils Glück quoting others who are not bothered about performance

    To bridge the time for the pizzas to arrive our host Kirill had prepared a little R quiz: Could we guess the output of simple R statements? Well, it is more difficult than you might think. Kirill had a great selection of quirky one-liners, which he had collected over time and borrowed from the fabulous R Inferno book by Pat Burns.

    Next Kölner R meeting

    The next meeting will be scheduled in about three months time. Details will be published on our Meetup site. Thanks again to Eye/o for their support.

    Please get in touch, if you would like to present at the next meeting.
    Read more »
  • Next Kölner R User Meeting: Friday 14 October
    Koeln R
    The 19th Cologne R user group meeting is scheduled for this Friday, 14 October 2016. We have three talks, followed by networking drinks.

    • Introduction to the tidyverse tools - Jiddu Alexander
    • Performance profiling and improvement in R - Nils Glück
    • Batch processing of R-Scripts with Excel - Klaus Jacobi
    Venue: Eyeo GmbH, Lichtstraße 25, 50825 Köln

    For further details visit our KölnRUG Meetup site.

    Notes from past meetings are available here.

    Read more »
  • Notes from 4th Bayesian Mixer Meetup
    Last Tuesday we got together for the 4th Bayesian Mixer Meetup. Product Madness kindly hosted us at their offices in Euston Square. About 50 Bayesians came along; the biggest turn up thus far, including developers of PyMC3 (Peadar Coyle) and Stan (Michael Betancourt).

    The agenda had two feature talks by Dominic Steinitz and Volodymyr Kazantsev and a lightning talk by Jon Sedar.

    Dominic Steinitz: Hamiltonian and Sequential MC samplers to model ecosystems
    Dominic shared with us his experience of using Hamiltonian and Sequential Monte Carlo samplers to model ecosystems.

    Volodymyr Kazantsev: Bayesian Model Averaging
    Finding the 'best' model was Volodymyr's challenge. He tried various R packages (BMA, BMS and BAS) for Bayesian model averaging, with various degrees of success.

    Jon Sedar: Easier Plate Notation in Python using Daft
    Finally, Jon gave a brief overview on Daft, a nifty Python package for creating graphs, or plate notation.

    Next meeting

    The next Bayesian Mixer Meetup meeting is already scheduled for 21 October. We will be back at Cass Business School, with two talks:

    • Darren Wilkinson: Hierarchical Bayesian Modelling of Growth Curves inc Stochastic Processes
    • Peadar Coyle: Advanced PyMC3
    Read more »
  • Fitting a distribution in Stan from scratch
    Last week the French National Institute of Health and Medical Research (Inserm) organised with the Stan Group a training programme on Bayesian Inference with Stan for Pharmacometrics in Paris.

    Daniel Lee and Michael Betancourt, who run the course over three days, are not only members of Stan's development team, but also excellent teachers. Both were supported by Eric Novik, who gave an Introduction to Stan at the Paris Dataiku User Group last week as well.

    Eric Kramer (Dataiku), Daniel Lee, Eric Novik & Michael Betancourt (Stan Group)

    I have been playing around with Stan on and off for some time, but as Eric pointed out to me, Stan is not that kind of girl(boy?). Indeed, having spent three days working with Stan has revitalised my relationship. Getting down to the basics has been really helpful and I shall remember, Stan is not drawing samples from a distribution. Instead, it is calculating the joint distribution function (in log space), and evaluating the probability distribution function (in log space).

    Thus, here is a little example of fitting a set of random numbers in R to a Normal distribution with Stan. Yet, instead of using the built-in functions for the Normal distribution, I define the log probability function by hand, which I will use in the model block as well, and even generate a random sample, starting with a uniform distribution. However, I do use pre-defined distributions for the priors.

    Why do I want to do this? This will be a template for the day when I have to use a distribution, which is not predefined in Stan, e.g. the actuar package has some interesting candidates.

    Testing

    I start off by generating fake data, a sample of 100 random numbers drawn from a Normal distribution with a mean of 4 and a standard deviation of 2. Note, the sample mean of the 100 figures is 4.2 and not 4.
    Histogram of 100 random numbers drawn from N(4,2).
    I then use the Stan script to fit the data, i.e. to find the the parameters \(\mu\) and \(\sigma\), assuming that the data was generated by a Gaussian process.

    Traceplot of 4 chains, including warm-up phase
    Histograms of posterior parameter and predictive samples
    Comparison of the emperical distributions
    The posterior parameter distributions include both \(\mu\) and \(\sigma\) in the 95% credible interval. The distribution of posterior predictive check (y_ppc) is wider, taking into account the uncertainty of the parameters. The interquartile range and mean of my initial fake data and the sample of the posterior predictive distribution look very similar. That's good, my model generates data, which looks like the original data.

    Bayesian Mixer Meetup

    Btw, tonight we have the 4th Bayesian Mixer Meetup in London.

    Session Info

    R version 3.3.1 (2016-06-21)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.12 (Sierra)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] MASS_7.3-45 rstan_2.12.1 StanHeaders_2.12.0 ggplot2_2.1.0

    loaded via a namespace (and not attached):
    [1] Rcpp_0.12.7 codetools_0.2-14 digest_0.6.10 grid_3.3.1
    [5] plyr_1.8.4 gtable_0.2.0 stats4_3.3.1 scales_0.4.0
    [9] labeling_0.3 tools_3.3.1 munsell_0.4.3 inline_0.3.14
    [13] colorspace_1.2-6 gridExtra_2.2.1
    Read more »
  • googleVis 0.6.1 on CRAN
    We released googleVis version 0.6.1 on CRAN last week. The update fixes issues with setting certain options, following the switch from RJSONIO to jsonlite.

    Screen shot of some of the Google Charts
    New to googleVis? The package provides an interface between R and the Google Charts Tools, allowing you to create interactive web charts from R without uploading your data to Google. The charts are displayed by default via the R internal help browser.

    To lean more see the examples of googleVis charts on CRAN and read the introduction vignette. Read more »
  • Notes from the 4th R in Insurance Conference
    The 4th R in Insurance conference took place at Cass Business School London on 11 July 2016. This one-day conference focused once more on the wide range of applications of R in insurance, actuarial science and beyond. The conference programme covered topics including reserving, pricing, loss modelling, the use of R in a production environment and much more.

    The audience of the conference included both practitioners (c.80%) and academics (c.20%) who are active or interested in the applications of R in Insurance. It was a truly international event with speakers and delegates from Europe, Asia and the Americas. The coffee breaks and conference dinner offered great networking opportunities.

    Mario Wüthrich, ETH Zürich

    In the first plenary session Mario Wüthrich (RiskLab ETH Zurich) spoke about the (new) challenges in actuarial science. While fundamentals of analysing data have not changed over the years, the data and technology available has, and with that new challenges emerged. Yet, as Mario pointed out, insurance is still often concerned with analysing 'little' data, as losses occur rarely. Furthermore, the bigger data sets, often generated by sensors, require careful calibration, monitoring and cleansing. Those new challenges provide opportunities for new research (if data is being made available) and the industry. The R community can provide links between the two. Mario would like to see more and better documentation of R packages, more insurance examples and better handling of big data.

    Thereafter, the programme consisted of a combination of contributed presentations and lightning talks, as well as a panel discussion on how analytics is transforming the insurance business. Adrian Cuc (Verisk), Simon Brickman (Beazley), Roland Schmid (Mirai Solutions) and Markus Gesmann (Vario Partners) discussed the efforts made in bridging between data vendors, consultants and insurers, as well as the challenges of developing collaborative business models that respond to market needs.

    Dan Murphy, Trinostics

    In the closing plenary, Dan Murphy (Trinostics, San Francisco) gave an insight into his experience as an actuary on how to provide persuasive advice for senior management. He uses the three-C's: context, confidence and clarity. Context is about articulating the problem in a language senior management can understand it. Why does the management need to worry about the problem? If you have a solution, then you have to deliver it with conviction, because, most importantly is has to be actionable. Clarity, of your actionable insight, ensures that those actions can be delegated to the relevant team/employee by the management without you in the room.

    The slides of the conference are available on request.

    Scientific committee and sponsors

    The members of the scientific committee were: Katrien Antonio (KU Leuven, UvA), Christophe Dutang (Université du Maine), Markus Gesmann (Vario Partners), Giorgio Spedicato (UnipolSai ) and Andreas Tsanakas (Cass Business School).

    Finally, we are grateful to our sponsors Verisk, Mirai Solutions, Applied AI, RStudio, CYBAEA and Oasis, without whom the event wouldn't be possible.

    R in Insurance 2017

    We are delighted to announce next year’s event already. The conference will travel across the Channel to ENSAE, Paris, 8 June 2017. Further details will be published on www.rininsurance.com. Read more »
  • Notes from the Kölner R meeting, 9 July 2016
    Last Thursday the Cologne R user group came together again. This time, our two speakers arrived from Bavaria, to talk about Spark and R Server.

    Introduction to Apache Spark

    Download slides
    Dubravko Dulic gave an introduction to Apache Spark and why Spark might be of interest to data scientists using R. Spark is designed for cluster computing, i.e. to distribute jobs across several computers. Not all tasks in R can be split easily across several nodes in a cluster, but if you use functions like by in R, then it is most likely doable. The by function in R splits a data set into several subsets and applies a specific function to each subgroup and collects the results in the end. In the world of Hadoop, this is called MapReduce. Spark has an advanced DAG (directed acyclic graph) execution engine that supports cyclic data flow and in-memory computing. Additionally, Spark has a direct API for R, which makes it relatively ease to write applications with Spark.

    Microsoft R Server

    Download slides
    Since the acquisition of Revolution Analytics in 2015, Microsoft has been busy integrating R into its product offerings. Stefan Cronjaeger gave an overview of how R can be integrated into a production environment. Microsoft R server aims to solve the problem of doing 'big data' analytics with R, which allows to carrying out in-memory and disk-based data analysis. Additional new tools are called ScaleR for big data and parallelized analytics, ConnectR to connect to various other data sources, DistributedR for grid computing. Finally, Stefan showed us how Visual Studio can be used as an R development environment, similar to RStudio.

    Next Kölner R meeting

    The next meeting will be scheduled in about three months time. Details will be published on our Meetup site. Thanks again to Microsoft for their support.

    Please get in touch, if you would like to present at the next meeting. Read more »
  • Notes from 3rd and 3.5th Bayesian Mixer Meetup
    Two Bayesian Mixer meet-ups in a row. Can it get any better?

    Our third 'regular' meeting took place at Cass Business School on 24 June. Big thanks to Pietro and Andreas, who supported us from Cass. The next day, Jon Sedar of Applied AI, managed to arrange a special summer PyMC3 event.

    3rd Bayesian Mixer meet-up

    First up was Luis Usier, who talked about cross validation. Luis is a former student of Andrew Gelman, so, of course, his talk touched on Stan and the 'loo' (leave one out) package in R. Luis started with a simple artificial example that aimed to predict the probability of goalkeepers to save a shot on target. Adding a hierarchical structure to the model and treating the variance as a random variable, resulted in a pathological posterior distribution, which makes sampling next to impossible. Instead, fitting different models, with different fixed parameters, allows the user then to compare the models via cross-validation using the 'loo' function. Clever! I need to learn more about this. Luis' slides are available here and the underlying source code on GitHub.

    Luis Usier talking about cross-validation in R and Stan

    We were lucky to have Robert Cowell talking to us, in what was his final week at Cass. Robert has been very much at the forefront of Bayesian development over the last 30 years. He is one of the co-authors of Probabilistic Networks and Expert Systems. Robert gave an insightful talk on probabilistic models for analysing mixed DNA traces. For illustration purpose, he used a crime case, where a man was killed in a pub, and where blood traces were used to support identifying the murder - turning statistics into a thriller.

    Following those two stimulating talks, we had a few networking drinks at the Artillery Arms. But not too many, as the next day continued with another Bayesian event.

    3.5th Meetup: PyMC3 summer special

    We had a rare opportunity to gather together a few of the core contributors of the PyMC3 package for a talks & hack session. PyMC3 is a leading framework for probabilistic programming entirely based in Python with a 'theano' backend, with support for the NUTS sampler, Variational Inference and lots of useful functionality - an alternative to Stan.

    We had two core contributors with us: Chris Fonnesbeck (usually in Nashville, USA) and Thomas Wiecki (online from Düsseldorf, Germany), plus other package contributors.

    Chris Fonnesbeck talking about PyMC3

    On Saturday morning Chris gave an overview of PyMC3, followed by a detailed talk of Thomas on Bayesian Deep Learning. The afternoon was spent hacking together away on different problems. I was new to PyMC3, so I went through the tutorial on Probabilistic Programming using PyMC3, which Chris had given at a workshop in Oslo.

    Many thanks to all who helped to make these events such a success and especially to Chris, Thomas, Luis, Robert, Andreas, Pietro and Jon.

    If you have ideas for a future event, then please get in touch and visit our Meetup page. Read more »
  • Early bird registration for R in Insurance closes 30 May

    Hurry! The early bird registration offer for the 4th R in Insurance conference, 11 July 2016, at Cass Business School closes 30 May.

    This one-day conference will focus once more on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered include reserving, pricing, loss modelling, the use of R in a production environment, and more.

    We have a fantastic programme with international speakers and conference dinner at Ironmongers Hall. Keynotes will be given by Mario Wüthrich and Dan Murphy.

    The organisers gratefully acknowledge the sponsorship of Verisk, Mirai Solutions, Applied AI, Studio, CYBAEA and Oasis, without whom the event wouldn't be possible.

    Read more »
  • R in Insurance 2016 Programme

    We are delighted to announce that the programme for the 4th R in Insurance conference at Cass Business School in London, 11 July 2016, have been finalised.

    Register by the end of May to get the early bird booking fee.

    The organisers gratefully acknowledge the sponsorship of Verisk, Mirai Solutions, Applied AI, Studio, CYBAEA and Oasis, without whom the event wouldn't be possible.

    Read more »
  • New R package to access World Bank data
    Staying on top of new CRAN packages is quite a challenge nowadays. However, thanks to Dirk's CRANberries service I occasionally spot a new gem, such as wbstats, which appeared on CRAN last week.

    Similarly to the WDI package, wbstats offers an interface to the World Bank database.

    With the functions of wbstats the World Bank data can be searched and data for several indicators requested. Unlike WDI, the data is returned in a 'long' table with one column for all values and a separate column for the indicators. Additionally, the function wb allows me to specify how many most recent values (mrv) I am interested.

    Thus, to recreate the famous Gapminder chart by Hans Rosling, showing the correlation between fertility, i.e. number of children per woman, and life expectancy over time by country and region, I can write (note, a Flash player is required):
    Browser not compatible.

    If you'd like to learn more about how to create interactive charts with googleVis, then check out the free tutorial on DataCamp.

    Session Info

    R version 3.2.4 (2016-03-10)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.11.4 (El Capitan)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] stats graphics grDevices utils datasets
    [6] methods base

    other attached packages:
    [1] googleVis_0.5.10 data.table_1.9.6 wbstats_0.1

    loaded via a namespace (and not attached):
    [1] httr_1.1.0 R6_2.1.2 rsconnect_0.4.2.1
    [4] tools_3.2.4 curl_0.9.7 RJSONIO_1.3-0
    [7] jsonlite_0.9.19 chron_2.3-47
    Read more »
  • Notes from 2nd Bayesian Mixer Meetup

    Last Friday the 2nd Bayesian Mixer Meetup (@BayesianMixer) took place at Cass Business School, thanks to Pietro Millossovich and Andreas Tsanakas, who helped to organise the event.
    Bayesian Mixer at Cass

    First up was Davide De March talking about the challenges in biochemistry experimentation, which are often characterised by complex and emerging relations among components.

    The very little prior knowledge about complex molecules bindings left a fertile field for a probabilistic graphical model. In particular, Bayesian networks can help the investigator in the definition of a conditional dependence/independence structure where a joint multivariate probability distribution is determined. Hence, the use of Bayesian network can lead to a more efficient way of designing experiments.


    Davide De March: Bayesian Networks to design optimal experiments

    The second act of the night was Mick Cooney, presenting ideas of using growth curves to estimate the ultimate amounts paid in insurance by some cohort of policies.

    The talk showed a model for these curves, discussed the implementation in Stan and how posterior predictive checks can be used to assess the output of the model.

    Mick Cooney: Bayesian Modelling for Loss Curves in Insurance

    Thanks again to everyone who helped to make the event a success, particularly our speakers and Jon Sedar of Applied AI.

    We are planning to run another event in mid-June. Please get in touch via our Meetup site with ideas and talk proposals. Read more »
  • R in Insurance: Abstract submission closes end of March

    Hurry! The abstract submission deadline for the 4th R in Insurance conference in London, 11 July 2016 is approaching soon.

    You have until the 28th of March to submit a one-page abstract for consideration. Both academic and practitioner proposals related to R are encouraged. Please email your abstract of no more than 300 words (in text or pdf format) to rinsuranceconference@gmail.com.

    Invited talks will be given by:
    Details about the registration and abstract submission are given on the dedicated R in Insurance page at Cass Business School, London.

    Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

    For more information about the past events visit www.rininsurance.com.

    Sponsors

    The organisers gratefully acknowledge the sponsorship of Verisk/ISO, Mirai Solutions, RStudio, Applied AI, CYBAEA, and OASIS Loss Modelling Framework.

    Gold Sponsors


    ISO2

    Mirai Logo

    Silver Sponsors


    RStudio

    Applied AI

    Cybaea

    OASIS Loss Modelling Framework
    Read more »
  • Notes from the Kölner R meeting, 26 February 2016
    Last Friday the Cologne R user group came together for the 17th time. This time, we were in for a special treatment, with two talks by psychologists!

    But, there was nothing to fear, we were in safe hands, and for the first time, we met at the new Microsoft office in Cologne.

    Lecture room at Microsoft, Cologne

    First up was Meik Michalke from the University of Düsseldorf presenting the RKWard project. RKWard is a graphical user interface and integrated development environment for statistical analysis with R. RKWard is a fully featured and extendable environment for R, available on all platforms. Furthermore, as Meik demonstrated, it is very straightforward to build new plugins for RKWard. These plugins can extend the user interface, which is great if you build tools for people who are less familiar with R, but perhaps more with SPSS. Meik is one of the developers of RKWard and he uses it to run an analysis, develop packages and to teach statistics.

    Download slides

    Next up was Paul-Christian Bürkner from the University of Münster, presenting an overview of his brms package. The name is short for Bayesian regression models with Stan. Although the package is still less than one-year-old, it is already quite mature, allowing the user to specify regression models in the usual R formula syntax. brms takes those formula calls, writes out the Stan code, compiles and runs the model, and it also provides methods to plot and predict brms models. Hence, it is a great way to get started with Stan and to build more complex Bayesian models.

    Download slides

    Following the talks, there was still plenty of time for questions and networking. Microsoft provided us with a great venue and enough drinks to keep us going until finally our stomachs asked for food and, dare I say it, Kölsch. As a result some of us ended up in Rheinau, a nice gastropub around the corner.

    Next Kölner R meeting

    The next meeting will be scheduled in about three months time. Details will be published on our Meetup site. Thanks again to Microsoft for their support.

    Please get in touch, if you would like to present at the next meeting. Read more »
  • Next Kölner R User Meeting: Friday, 26 Feburary 2016
    Koeln R
    The 17th Cologne R user group meeting is scheduled for this Friday, 26 February 2016. We have two talks, followed by networking drinks.

    • Introduction to Bayesian Regression Models using Stan with the brms package - Paul-Christian Bürkner (Uni Münster)
    • RKWard: A Graphical User Interface and Integrated Development Environment for Statistical Analysis with R - Meik Michalke (Uni Düsseldorf)
    Venue: Microsoft Deutschland, Holzmarkt 2a Cologne 50676 DE, Köln

    For further details visit our KölnRUG Meetup site. Unfortunately, this event is already fully booked, but please sign up if you would like to come along to future events.

    Notes from past meetings are available here.

    Read more »
  • Bayesian Mixer on Meetup
    We had our first successful Bayesian Mixer Meetup last Friday night at the Artillery Arms!

    We expected about 15 - 20 people to turn up, when we booked the function room overlooking Bunhill Cemetery and Bayes' grave. Now, looking at the photos taken during the evening, it seems that our prior believe was pretty good.


    The event started with a talk from my side about some very basic Bayesian models, which I used a while back to get my head around the concepts in an insurance context. My talk "Experience vs Data" was based on presentations I had given last year at LondonR and the Warsaw R user group.

    Jon Sedar followed with a fascinating talk about outlier detection using PyMC3.

    Suppose, you have a bunch of data points, most of them centred, but with some further away. How do you decide if they are outliers, or not?

    This question sounds very relevant to me in the insurance context as well. I have heard stories of underwriters telling me that certain years or events (meaning costly losses) were freaks, and should be disregarded, or in other words, without those losses the underwriter would have made a huge profit. I am not sure, I buy those arguments, as they undermine the fundamental business proposition of insurance; to pay, when policyholders experience 'freak' events. I am getting on my soap box, which I shouldn't.

    We had a good night, very good discussions and some drinks. As a result Jon and I are committed to organise another event.

    Jon has already set up a Meetup page, so please register online and get in touch with ideas, venues, talks, etc.

    Slides/Files

    Read more »
  • Using SVG graphics in blog posts
    My traditional work flow for embedding R graphics into a blog post has been via a PNG files that I upload online. However, when I created a 'simple' graphic with only basic curves and triangles for a recent post, I noticed that the PNG output didn't look as crisp as I expected it to be. So, eventually I used a SVG (scalable vector graphic) instead.

    Creating a SVG file with R could't be easier; e.g. use the svg() function in the same way as png(). Next, make the file available online and embed it into your page. There are many ways to do this, in the example here I placed the file into a public GitHub repository.

    To embed the figure into my page I could use either the traditional <img> tag, or perhaps better the <object> tag. Paul Murrell provides further details on his blog.

    With <object> my code looks like this:
    <object data="https://rawgithub.com/mages/diesunddas/master/Blog/transitionPlot.svg" type="image/svg+xml" width="400"> </object>

    There is a little trick required to display a graphic file hosted on GitHub.

    By default, when I look for the raw URL, GitHub will provide an address starting with https://raw.githubusercontent.com/..., which needs to be replaced with https://rawgithub.com/....

    Ok, let's look at the output. As a nice example plot I use a transitionPlot by Max Gordon, something I wanted to do for a long time.

    SVG output

    PNG output


    Conclusions

    The SVG output is nice and crisp! Zoom in and the quality will not change. The PNG graphic on the other hand appears a little blurry on my screen and even the colours look washed out. Of course, the PNG output could be improved by fiddling with the parameters. But, after all it is a raster graphic.

    Yet, I don't think that SVG is always a good answer. The file size of an SVG file can grow quite quickly, if there are many points to be plotted. As an example check the difference in file size for two identical plots with 10,000 points.
    x <- rnorm(10000)
    png()
    plot(x)
    dev.off()
    file.size("Rplot001.png")/1000
    # [1] 118.071
    svg()
    plot(x)
    dev.off()
    file.size("Rplot001.svg")/1000
    # [1] 3099.181
    That's 3.1 Mb vs 118 kb, a factor of 26! Even compressed to a .svgz file, the SVG file is still 317kb.

    Update 10 Feb 2016

    Or, is SVG the answer? Kenton pointed me towards the svglite package.
    library(svglite)
    svglite(file = "Rplot001.svg")
    plot(x)
    dev.off()
    file.size("Rplot001.svg")/1000
    # [1] 973.619
    gz <- function(in_path, out_path = tempfile()) {
    out <- gzfile(out_path, "w")
    writeLines(readLines(in_path), out)
    close(out)
    invisible(out_path)
    }
    file.size(gz("Rplot001.svg", "Rplot001.svgz")) / 1000
    #> [1] 74.11

    R code


    Session Info

    R version 3.2.3 (2015-12-10)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.11.3 (El Capitan)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] grid stats graphics grDevices utils datasets
    [7] methods base

    other attached packages:
    [1] RColorBrewer_1.1-2 Gmisc_1.3 htmlTable_1.5
    [4] Rcpp_0.12.3

    loaded via a namespace (and not attached):
    [1] Formula_1.2-1 knitr_1.12.3
    [3] cluster_2.0.3 magrittr_1.5
    [5] splines_3.2.3 munsell_0.4.2
    [7] colorspace_1.2-6 lattice_0.20-33
    [9] stringr_1.0.0 plyr_1.8.3
    [11] tools_3.2.3 nnet_7.3-12
    [13] gtable_0.1.2 latticeExtra_0.6-26
    [15] htmltools_0.3 digest_0.6.9
    [17] forestplot_1.4 survival_2.38-3
    [19] abind_1.4-3 gridExtra_2.0.0
    [21] ggplot2_2.0.0 acepack_1.3-3.3
    [23] rsconnect_0.3.79 rpart_4.1-10
    [25] rmarkdown_0.9.2 stringi_1.0-1
    [27] scales_0.3.0 Hmisc_3.17-1
    [29] XML_3.98-1.3 foreign_0.8-66
    Read more »
  • First Bayesian Mixer Meeting in London
    There is a nice pub between Bunhill Fields and the Royal Statistical Society in London: The Artillery Arms. Clearly, the perfect place to bring people together to talk about Bayesian Statistics. Well, that’s what Jon Sedar (@jonsedar, applied.ai) and I thought.

    Source: http://www.artillery-arms.co.uk/
    Hence, we’d like to organise a Bayesian Mixer Meetup on Friday, 12 February, 19:00. We booked the upstairs function room at the Artillery Arms and if you look outside the window, you can see Thomas Bayes’ grave.

    We intend the group to be small (announcing only on the stan user group, pymc-devs gitter, and here for now) and geared to open discussion of Bayesian inference, tools, techniques and theory. Neither of us is a great expert, we're really just users of the tools, but we'd love to welcome academic discussion as well as real world examples etc.

    Jon is more the Python/PyMC guy, while I come from the R/Rstan corner. We will prepare two talks to kick this off. Jon will talk about GLM Robust Regression with Outlier Detection using PyMC3, while I will talk about Experience vs Data with some stories from insurance and actuarial science, sprinkled with RStan examples.

    If you would like to join us, please get in touch via the form below, so that we can keep tabs on numbers, and if this goes all well we shall set up a Meetup site.

    Loading... Read more »
  • Flowing triangles
    I have admired the work of the artist Bridget Riley for a long time. She is now in her eighties, but as it seems still very creative and productive. Some of her recent work combines simple triangles in fascinating compositions. The longer I look at them, the more patterns I recognise.

    Yet, the actual painting can be explained easily, in a sense of a specification document to reproduce the pattern precisely. However, seeing the real print, as I had the chance at the London Art Fair last week, and a reproduction on the screen is incommensurable.

    Having said that, I could not resist programming a figure that resembles the artwork labelled Bagatelle 2. Well, at least I can say that I learned more about grid [1], grid.path [2] and gridSVG [3] in R.

    Inspired by Bridget Riley Bagatelle 2

    R Code


    References

    [1] P. Murrell. R Graphics, Second Edition. CRC Press. 2011
    [2] P. Murrell. What's in a Name? . The R Journal, 4(2):5–12, dec 2012.
    [3] P. Murrell and S. Potter. gridSVG: Export grid graphics as SVG. R package 1.5-0. 2015

    Session Info

    R version 3.2.3 (2015-12-10)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.11.2 (El Capitan)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] grid stats graphics grDevices utils datasets
    [7] methods base

    other attached packages:
    [1] gridSVG_1.5-0 data.table_1.9.6

    loaded via a namespace (and not attached):
    [1] tools_3.2.3 RJSONIO_1.3-0 chron_2.3-47 XML_3.98-1.3
    Read more »
  • Formatting table output in R
    Formatting data for output in a table can be a bit of a pain in R. The package formattable by Kun Ren and Kenton Russell provides some intuitive functions to create good looking tables for the R console or HTML quickly. The package home page demonstrates the functions with illustrative examples nicely.

    There are a few points I really like:
    • the functions accounting, currency, percent transform numbers into better human readable output
    • cells can be highlighted by adding color information
    • contextual icons can be added, e.g. from Glyphicons
    • output can be displayed in RStudio's viewer pane

    The CRAN Task View: Reproducible Research lists other packages as well that help to create tables for web output, such as compareGroups, DT, htmlTable, HTMLUtils, hwriter, Kmisc, knitr, lazyWeave, SortableHTMLTables, texreg and ztable. Yet, if I am not mistaken, most of these packages focus more on generating complex tables with multi-columns rows, footnotes, math notation, etc, than the points I mentioned above.

    Finally, here is a little formattable example from my side:



    Session Info

    R version 3.2.3 (2015-12-10)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.11.2 (El Capitan)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] stats graphics grDevices utils datasets methods
    [7] base

    other attached packages:
    [1] formattable_0.1.5

    loaded via a namespace (and not attached):
    [1] shiny_0.12.2.9006 htmlwidgets_0.5.1 R6_2.1.1
    [4] rsconnect_0.3.79 markdown_0.7.7 htmltools_0.3
    [7] tools_3.2.3 yaml_2.1.13 Rcpp_0.12.2
    [10] highr_0.5.1 knitr_1.12 jsonlite_0.9.19
    [13] digest_0.6.9 xtable_1.8-0 httpuv_1.3.3
    [16] mime_0.4
    Read more »
  • R in Insurance: Registration and abstract submission opened
    Following the successful 3rd R in Insurance conference in Amsterdam last year, we return to London this year.

    The registration for the 4th conference on R in Insurance on Monday 11 July 2016 at Cass Business School has opened.

    This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation.

    The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance.

    Invited talks will be given by:
    Details about the registration and abstract submission are given on the dedicated R in Insurance page at Cass Business School, London.

    The submission deadline for abstracts is 28 March 2016. Please email your abstract of no more than 300 words to: rinsuranceconference@gmail.com.

    Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

    For more information about the past events visit www.rininsurance.com.

    Sponsors

    The organisers gratefully acknowledge the sponsorship of Verisk/ISO, Mirai Solutions, RStudio, Applied AI, and CYBAEA.

    Gold Sponsors

    ISO2
    Mirai Logo

    Silver Sponsors

    RStudio
    AAI
    Cybaea Read more »
  • Next Kölner R User Meeting: Friday, 4 December 2015
    Koeln R
    The 16th Cologne R user group meeting is scheduled for this Friday, 4 December 2015 and we have great line up with three talks followed by networking drinks.

    Venue: Startplatz, Im Mediapark, 550670 Köln

    Drinks and Networking

    The event will be followed by drinks (Kölsch!) and networking opportunities.

    For further details visit our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

    The organisers, Kirill Pomogajko and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their Matrix programme.

    Read more »
  • Notes from Warsaw R meetup
    I had the great pleasure time to attend the Warsaw R meetup last Thursday. The organisers Olga Mierzwa and Przemyslaw Biecek had put together an event with a focus on R in Insurance (btw, there is a conference with the same name), discussing examples of pricing and reserving in general and life insurance.

    Experience vs. Data

    I kicked off with some observations of the challenges in insurance pricing. Accidents are thankfully rare events, that's why we buy insurance. Hence, there is often not a lot of claims data available for pricing. Combining the information from historical data and experts with domain knowledge can provide a rich basis for the assessment of risk. I presented some examples using Bayesian analysis to understand the probability of an event occurring. Regular readers of my blog will recognise the examples from earlier posts. You find my slides on GitHub.
    Download slides

    Non-life insurance in R

    Emilia Kalarus from Triple A shared some of her experience of using R in non-life insurance companies. She focused on the challenges in working across teams, with different systems, data sets and mentalities.

    As an example, Emilia talked about the claims reserving process, which in her view should be embedded in the full life cycle of insurance, namely product development, claims, risk and performance management. Following this thought, she presented an idea for claims reserving that models the life of a claim from not incurred and not reported (NINR), to incurred but not reported (IBNR), reported but not settled (RBNS) and finally paid.

    Stochastic mortality modelling

    The final talk was given by Adam Wróbel from the life insurer Nationale Nederlanden, discussing stochastic mortality modelling. Adam's talk on analysing mortality made me realise that life and non-life insurance companies may be much closer to each other than I thought.

    Although life and non-life companies are usually separated for regulatory reasons, they both share the fundamental challenge of predicting future cash flows. An example where the two industries meet is product liability.

    Over the last century, technology has changed our environment fundamentally, more so than ever before. Yet, we still don't know which long-term impact some of the new technologies and products will have on our life expectancy. Some will prolong our lives, others may make us ill.

    A classic example is asbestos, initial regarded as a miracle mineral, as it was impossible to set on fire, abundant, cheap to mine, and easy to manufacture. Not surprisingly it was widely used until it was linked to causing cancer. Over the last 35 years, the non-life insurance industry has paid well in excess of hundred billion dollars in compensations.

    Slides and Code

    The slides and R code of the presentations are hosted on the Warsaw R GitHub page. Read more »
  • Hierarchical Loss Reserving with Stan
    I continue with the growth curve model for loss reserving from last week's post. Today, following the ideas of James Guszcza [2] I will add an hierarchical component to the model, by treating the ultimate loss cost of an accident year as a random effect. Initially, I will use the nlme R package, just as James did in his paper, and then move on to Stan/RStan [6], which will allow me to estimate the full distribution of future claims payments.

    Last week's model assumed that cumulative claims payment could be described by a growth curve. I used the Weibull curve and will do so here again, but others should be considered as well, e.g. the log-logistic cumulative distribution function for long tail business, see [1].
    The growth curve describes the proportion of claims paid up to a given development period compared to the ultimate claims cost at the end of time, hence often called development pattern. Cumulative distribution functions are often considered, as they increase monotonously from 0 to 100%. Multiplying the development pattern with the expected ultimate loss cost gives me then the expected cumulative paid to date value.

    However, what I'd like to do is the opposite, I know the cumulative claims position to date and wish to estimate the ultimate claims cost instead. If the claims process is fairly stable over the years and say, once a claim has been notified the payment process is quite similar from year to year and claim to claim, then a growth curve model is not unreasonable. Yet, the number and the size of the yearly claims will be random, e.g. if a windstorm, fire, etc occurs or not. Hence, a random effect for the ultimate loss cost across accident years sounds very convincing to me.

    Here is James' model as described in [2]:
    \[
    \begin{align}
    CL_{AY, dev} & \sim \mathcal{N}(\mu_{AY, dev}, \sigma^2_{dev}) \\
    \mu_{AY,dev} & = Ult_{AY} \cdot G(dev|\omega, \theta)\\
    \sigma_{dev} & = \sigma \sqrt{\mu_{dev}}\\
    Ult_{AY} & \sim \mathcal{N}(\mu_{ult}, \sigma^2_{ult})\\
    G(dev|\omega, \theta) & = 1 - \exp\left(-\left(\frac{dev}{\theta}\right)^\omega\right)
    \end{align}
    \]The cumulative losses \(CL_{AY, dev}\) for a given accident year \(AY\) and development period \(dev\) follow a Normal distribution with parameters \(\mu_{AY, dev}\) and \(\sigma_{dev}\).

    The mean itself is modelled as the product of an accident year specific ultimate loss cost \(Ult_{AY}\) and a development period specific parametric growth curve \(G(dev | \omega, \theta)\). The variance is believed to increase in proportion with the mean. Finally, the ultimate loss cost is modelled with a Normal distribution as well.

    Assuming a Gaussian distribution of losses doesn't sound quite intuitive to me, as loss are often skewed to the right, but I shall continue with this assumption here to make a comparison with [2] possible.

    Using the example data set given in the paper I can reproduce the result in R with nlme:

    The fit looks pretty good, with only 5 parameters. See James' paper for a more detailed discussion.

    Let's move this model into Stan. Here is my attempt, which builds on last week's pooled model. With the generated quantities code block I go beyond the scope of the original paper, as I try to estimate the full posterior predictive distribution as well.
    The 'trick' is the line mu[i] <- ult[origin[i]] * weibull_cdf(dev[i], omega, theta); where I have an accident year (here labelled origin) specific ultimate loss.

    The notation ult[origin[i]] illustrates the hierarchical nature in Stan's language nicely.

    Let's run the model:
    The estimated parameters look very similar to the nlme output above.

    Let's take a look at the parameter traceplot and the densities of the estimated ultimate loss costs by origin year.
    This looks all not too bad. The trace plots don't show any particular patterns, apart from \(\sigma_{ult}\), which shows a little skewness.

    The generated quantities code block in Stan allows me to get also the predictive distribution beyond the current data range. Here I forecast claims up to development year 12 and plot the predictions, including the 95% credibility interval of the posterior predictive distribution with the observations.

    The model seems to work rather well, even with the Gaussian distribution assumptions. Yet, it has still only 5 parameters. Note, this model doesn't need an additional artificial tail factor either.

    Conclusions

    The Bayesian approach sounds to me a lot more natural than many classical techniques around the chain-ladder methods. Thanks to Stan, I can get the full posterior distributions on both, the parameters and predictive distribution. I find communicating credibility intervals much easier than talking about the parameter, process and mean squared error.

    James Guszcza contributed to a follow-up paper with Y. Zhank and V. Dukic [3] that extends the model described in [2]. It deals with skewness in loss data sets and the autoregressive nature of the errors in a cumulative time series.

    Frank Schmid offers a more complex Bayesian analysis of claims reserving in [4], while Jake Morris highlights the similarities between a compartmental model used in drug research and loss reserving [5].

    Finally, Glenn Meyers published a monograph on Stochastic Loss Reserving Using Bayesian MCMC Models earlier this year [7] that is worth taking a look at.

    References

    [1] David R. Clark. LDF Curve-Fitting and Stochastic Reserving: A Maximum Likelihood Approach. Casualty Actuarial Society, 2003. CAS Fall Forum.

    [2] James Guszcza. Hierarchical Growth Curve Models for Loss Reserving, 2008, CAS Fall Forum, pp. 146–173.

    [3] Y. Zhang, V. Dukic, and James Guszcza. A Bayesian non-linear model for forecasting insurance loss payments. 2012. Journal of the Royal Statistical Society: Series A (Statistics in Society), 175: 637–656. doi: 10.1111/j.1467-985X.2011.01002.x

    [4] Frank A. Schmid. Robust Loss Development Using MCMC. Available at SSRN. See also http://lossdev.r-forge.r-project.org/

    [5] Jake Morris. Compartmental reserving in R. 2015. R in Insurance Conference.

    [6] Stan Development Team. Stan: A C++ Library for Probability and Sampling, Version 2.8.0. 2015. http://mc-stan.org/.

    [7] Glenn Meyers. Stochastic Loss Reserving Using Bayesian MCMC Models. Issue 1 of CAS Monograph Series. 2015.

    Session Info

    R version 3.2.2 (2015-08-14)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.11.1 (El Capitan)

    locale:
    [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] ChainLadder_0.2.3 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1
    [5] lattice_0.20-33

    loaded via a namespace (and not attached):
    [1] nloptr_1.0.4 plyr_1.8.3 tools_3.2.2
    [4] digest_0.6.8 lme4_1.1-10 statmod_1.4.21
    [7] gtable_0.1.2 nlme_3.1-122 mgcv_1.8-8
    [10] Matrix_1.2-2 parallel_3.2.2 biglm_0.9-1
    [13] SparseM_1.7 proto_0.3-10 coda_0.18-1
    [16] gridExtra_2.0.0 stringr_1.0.0 MatrixModels_0.4-1
    [19] lmtest_0.9-34 stats4_3.2.2 grid_3.2.2
    [22] nnet_7.3-11 tweedie_2.2.1 inline_0.3.14
    [25] cplm_0.7-4 minqa_1.2.4 actuar_1.1-10
    [28] reshape2_1.4.1 car_2.1-0 magrittr_1.5
    [31] scales_0.3.0 codetools_0.2-14 MASS_7.3-44
    [34] splines_3.2.2 rsconnect_0.3.79 systemfit_1.1-18
    [37] pbkrtest_0.4-2 colorspace_1.2-6 quantreg_5.19
    [40] labeling_0.3 sandwich_2.3-4 stringi_1.0-1
    [43] munsell_0.4.2 zoo_1.7-12
    Read more »

Copyright Use-R.com 2012 - 2016 ©