by PS group

Adelaide Freitas

Are finite mixture models of geometric distributions far from the true model for inter-nucleotide distances?

DNA sequences are non-numerical sequences of the four-letter alphabet, A, C, G and T, which stands for the four nucleotides: Adenine, Cytosine, Guanine, and Thymine. Various transformations of DNA sequences into numerical data have been proposed in order to take advantage of statistical methodologies for quantitative data. Numerical transformations that can capture useful information about mathematical properties discriminative and sensitive enough of variations in DNA composition and, at the same time, highlight important structural features of DNA sequences are desirable. The inter-nucleotide distances (InD) transform any DNA sequence into a unique numerical sequence with the same length, such that each number represents the distance of a nucleotide to the next occurrence of the same nucleotide. If nucleotides were independently placed along the genome, a finite mixture model of geometric distributions could be fitted to the InD. Based on the composite likelihood, the best theoretical approximation defined by a finite geometric mixture model to the InD is obtained. An experimental study carried out on the complete genomes of several species show that features of the structural complexity of DNA sequences may be captured by that mixture model fitted to the InD.

Pedro Macedo

Technical efficiency analysis: recent developments with maximum entropy estimation

Technical efficiency analysis is a fundamental tool to measure the performance of production activity. Nowadays, a wide range of methodologies to measure technical efficiency are available being the choice of a specific approach always controversial, since different choices lead to different results. Recently, an increasing interest with the state-contingent production frontiers has emerged in the literature. This interest is mainly due to the fact that uncertainty in economics is best interpreted within a state-contingent framework. However, this increasing interest has not yet been reflected in an increase of empirical applications, since empirical models with state-contingent production frontiers are usually ill-posed. In particular, these empirical models are affected by collinearity and micronumerosity. In this talk will be discussed new maximum entropy procedures in the estimation of technical efficiency with state-contingent production frontiers under severe empirical conditions. Simulation studies are used to illustrate that maximum entropy estimators are powerful alternatives to the maximum likelihood estimator.

Manuel Scotto

Fighting wildfires with mathematics

Forest fires are a major concern at European level where large forest fires are responsible for significant environmental, social and economic impacts. In Portugal, wildfires have become a public calamity and an ecological disaster affecting considerable areas. In particular, the vast majority of the area burned is due to fire events that occurred on in a very small number of summer days. This context highlights the need to investigate and characterize extreme fire events.

In this talk, the results of the analysis of daily area burned time series in mainland Portugal from 1980 to 2010 are presented. The analysis combines the Peak-Over-Threshold method and classification techniques to cluster the time series either on the basis of the tail indices and their corresponding predictive distributions for 5- and 15-year return values. The results show that the distributions of the area burned are heavy-tailed for all districts pointing out a considerable density in the tail and, therefore, a non-negligible probability of occurrence of a day with very large area burned. Moreover, clustering based on tail indices identified three distinct groups: north, coastal areas and inner/south Portugal. Finally, clustering based on return values shows that largest return levels of area burned occur in districts located at the center and south of Portugal.

Manuela Souto

Improving robustness in geostatistics

Geostatistical methods were originally motivated by Geosciences problems, but their importance is now extended to other sciences such as Epidemiology, Agronomy or Environmental Sciences. In essence, Geostatistics differs from general statistical procedures since it is characterized by dealing with dependent observations of continuous processes sampled on continuous spatial domains. There are two main goals in Geostatistics: the estimation of the function that describes the dependence structure of the process, which is called the variogram function, and the spatial prediction of new observations, named the kriging process. Conventional methods for both steps are not robust, in the sense that they are very sensitive to departures from theoretical models, including the presence of outlier observations. Thus, the investigation of robust alternatives to traditional procedures is of great interest. Recent research on the topic resulted in the proposal of a new robust estimator of the variogram function. The estimator has good properties and an improved efficiency when compared with other robust proposals. Moreover, it has been possible to establish conditions for the existence and uniqueness of the solution that determines the estimates in specific models. Current research is also focused in the development of a robust kriging predictor. The new predictor uses the robust variogram estimator for waiting observations in a linear prediction process, whose coefficients are also controlled by robust principles.