R Packages and Utilities

combines typesetting with LaTeX and data anlysis with S into integrated statistical documents. When run through R, all data analysis output (tables, graphs, ...) is created on the fly and inserted into a final LaTeX document. The report can be automatically updated if data or analysis change, which allows for truly reproducible research.

implements a general framework for k-centroid clustering algorithms. The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance/similarity measures and centroid computation. Further cluster methods include hard competitive learning, neural gas and QT clustering.

implements a general framework for finite mixtures of regression models using the EM algorithm. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.

The main function archetypes implements a framework for archetypal analysis supporting arbitary problem solving mechanisms for the different conceputal parts of the algorithm.

The main function biclust provides several algorithms to find biclusters in two-dimensional data: Cheng and Church, Spectral, Plaid Model, Xmotifs and Bimax. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.

Visualize cluster results and investigate additional properties of clusters using interactive neighbourhood graphs. By clicking on the node representing the cluster, information about the cluster is provided using additional graphics or summary statistics. For microarray data, tables with links to genetic databases like gene ontolgy can be created for each cluster.

Testing, dating and monitoring of structural change in linear regression relationships. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.

A collection of functions written by members of my former working group at the department of statistics, TU Wien. It includes functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ...

A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.

A collection of tools to deal with statistical models.

Functions for import, export, plotting and other manipulations of bitmapped images.
All R packages described above are availabe from CRAN, most of them are joint work with various other researchers. Sweave is part of the R base distribution.