May 20-21, 2011
Guilt by Association: Finding Cosmic Ray Sources
David Ruppert (Cornell University)
Abstract
The Earth is continuously showered by cosmic rays, atomic nuclei moving with velocities close to that of light. In 2008 the most sensitive cosmic ray detector to date, the Pierre Auger Observatory (PAO), began operation. Roughly 70 Ultra High Energy Cosmic Rays (UHECRs) have been detected by PAO. Each is approximately ten million times more energetic than the most energetic particles produced at the Large Hadron Collider.
Astrophysical questions include: what phenomenon accelerates particles to such high energies, which astronomical objects host the accelerators, and what sorts of nuclei are energized? The magnetic defection of the trajectories of UHECRs makes them potential probes of galactic and intergalactic magnetic fields. The data consist of precise arrival times and estimated energies and directions of the detected UHECRs, measurement uncertainties, and characterization of the observatory detection capabilities. We compare models with different source populations, including a “null” model assigning all cosmic rays to unresolved sources. We aim to (1) Ascertain which cosmic rays may be associated with specific sources; (2) Estimate luminosity function parameters for astrophysical sources; (3) Estimate the proportion of detected cosmic rays generated by each population; (4) Estimate parameters describing the effects of cosmic magnetic fields; (5) Investigate whether cosmic rays from a single source are scattered independently (which we call a \buckshot model”) or share part of their scattering history (an exchangeable \radiant model”).
We use Bayes factors to compare rival models and compare a number of approaches for marginal likelihood computation, including the harmonic mean estimator (which behaves poorly), Chib’s method, an enumerative algorithm, and importance sampling.
Efficient Distribution Estimation for Data with Unobserved Sub-population Identifiers
Yanyuan Ma (Texas A&M University)
Abstract
We study efficient nonparametric estimation of distribution functions of genotype specific sub-populations from data consisting of mixed samples where the genotypes are missing. Only probabilities of each observation belonging to a genotype population are available.
The problem arises from quantitative trait locus (QTL) analysis and kin-cohort study where the scientific interest lies in estimating the cumulative distribution function of a trait given a specific genotype. However, the QTL genotypes in a QTL study or the genotypes of the relatives in the kin-cohort study are not directly observed. The distribution of the trait outcome is therefore a mixture of several genotype-specific distributions. We characterize the complete class of consistent estimators which includes members such as one type of nonparametric maximum likelihood estimator (NPMLE) and least squares or weighted least squares estimators.
We identify the efficient estimator in the class that reaches the semiparametric efficiency bound, and we implement it using a simple procedure that remains consistent even if several components of the estimator are mis-specified. In addition, our close inspections on two commonly used NPMLEs in these problems show the surprising results that the NPMLE in one form is highly inefficient, while in the other form is inconsistent. We provide simulation procedures to illustrate the theoretical results and demonstrate the proposed methods through two examples from kin-cohort and QTL study.
This is joint work with Yuanjia Wang.
Some Statistics Challenges Arising in Imaging
Peter Kuchment (Texas A&M University)
Abstract
I will address some probabilistic and statistics issues arising in medical and homeland security imaging.
Quantile Model Assessment in the Presence of Measurement Errors
Ying Wei (Columbia University)
Abstract
Wei and Carroll proposed an estimation of conditional quantile with mis-measured covariates. The proposed approach requires that the conditional quantiles of y given x follow a specified model form (linear or semi-parametric) at all the quantile levels. If the model assumption is untrue, the resulting quantile estimates are biased. Hence it is important to assess the model adequacy. To be specific, we denote g (τ; x) as the constructed quantile models. For linear quantile models, g (τ; x) = xT β (τ). So the research question is how to assess whether the true conditional quantile Q y (τ; x) can be correctly represented by the specified model g (τ, x) for all the τ ∈ (0, 1). Model assessment in this context has its own unique and actually more demanding requirements. First of all, we need to determine whether the constructed models hold for all the quantile levels, that is, the goodness-of-fit needs to be evaluated jointly on the entire conditional quantile process. Secondly, due to the measurement errors, the true value of the covariate x is not observed. Consequently, a direct assessment on the conditional quantile of y given x is not possible.
One possible model assessment is to evaluate the conditional quantile of y given the surrogate w based on g (τ, x), and compare it to the empirical quantiles of y given w. Such comparison can be visualized by QQ plot. And formal testing is also possible, but may suffer from the lack of power.
Adjusting for Covariate Information in Multilevel Functional Data
Ana-Maria Staicu (North Carolina State University)
Abstract
Most methods developed for multilevel functional data, focus solely on the modeling of curves and ignore covariates. Accounting for covariate information has been considered recently by Jiang and Wang (AnnStat 2010), in the context of single level functional data; however extending their approach to multilevel setting is beyond trivial. This research proposal is motivated by an application to brain tractography study in multiple sclerosis (MS), in which subjects’ brain white matter tracts profiles are observed at multiple visits. Modeling the profiles, while accounting for subject’s specific covariates such as age at disease onset, would provide insights into the relation between the age at disease onset and the progression of MS.
Approximation Error Modeling for Nonlinear Tomography
Simon Arridge (University College London)
Abstract
In several medical imaging problems a PDE is used to model the propagation of a probing radiation whose measurement at specified detectors provides data for which the parameters of the PDE
constitute the image sought for within an inverse problem framework. The model for propagation is typically based on either a Green’s function, which is limited to certain well defined geometries and usually homogeneous parameter distributions, together with a series approximation for the inhomogeneities, or a numerical solution such as finite elements. Both these models are unreasonable in the sense that they are either too simplistic or too computationally intensive or both. By recognizing that any model is always a limited approximation to real data we may associate the modeling error as a random variable and attempt to compensate for its inuence using Bayesian techniques. In this talk I will show some recent results of this approach for diffuse optical tomography and related problems which are both non-linear and ill-posed. The model error approximation method leads to computationally fast reconstruction times that are comparable in accuracy to more detailed models.
Multiple-Indicator, Multiple-Causes Measurement Error Models
Carmen D. Tekwe (Texas A&M University)
Abstract
Multiple Indicators, Multiple Causes (MIMIC) models are often employed by researchers studying the effects of an unobservable latent variable on a set of outcomes, when causes of the latent variable are observed. There are times however when the causes of the latent variable are not observed because measurements of the causal variable are contaminated by measurement error. In this talk, I discuss (1) an extension of the classical linear MIMIC model to allow both Berkson and classical measurement errors, defining the MIMIC measurement error (MIMIC ME) model; (2) likelihood based estimation methods using the EM algorithm with Monte Carlo approximation to the integral in the E-step for the MIMIC ME model; and (3) obtain data driven estimates of the variance of the classical measurement error associated with log(DS02), an estimate of the amount of radiation dose received by atomic bomb survivors at the time of their exposure. The Adult Health Study (AHS) cohort of atomic bomb survivors who were exposed between 500 and 2500 meters of the bomb hypocenters were studied. The defined MIMIC ME model was applied to study the effects of dyslipidemia, a latent construct and the effect of true radiation dose on the physical manifestations of dyslipidemia.
Detection of Low Emission Sources in the Presence of a Large Random Background
Yulia Hristova (University of Minnesota)
Abstract
In this talk we will discuss the feasibility of robust detection of geometrically small, low emission sources on a significantly stronger random background. We present a method for detecting such sources using Compton type cameras. Numerical examples demonstrate high sensitivity and specificity of the method.
This work is joint with M. Allmaras, D. Darrow, G. Kanschat and P. Kuchment.
Semiparametric Estimators for Restricted Moment Models with Measurement Error
Tanya Garcia (Texas A&M University)
Abstract
Root-n consistent, asymptotically normal and locally efficient estimators are constructed for regression with errors in covariates and an unspecified model error distribution. Until now, root-n consistent estimators for this setting were not attainable except for special cases, such as a polynomial relationship between the response and mismeasured variables. Our method is the first to deliver root-n consistent estimators when the distributions for both the model error and the mismeasured variable are unknown and can be misspecified. The estimators are based on the semiparametric efficient score which is calculated under several possibly incorrect distribution assumptions resulting from the misspecified model error distribution, from the misspecified error-prone covariates’ distribution, or from both. A simulation study demonstrates that our method is robust and outperforms methods which either ignore measurement error, or allow measurement error but require a correctly specified model error distribution. A real data example illustrates the performance of our method.
A Bayesian Approach to Detection of Small Low Emission Sources
Xiaolei Xun (Texas A&M University)
Abstract
The article addresses the problem of detecting presence and location of a small low emission source inside of an object, when the background noise dominates. This problem rises, for instance, in some homeland security applications. The goal is to reach the signal-to-noise ratio (SNR) levels on the order of 0.001. A Bayesian approach to this problem is implemented in 2D. The method allows inference not only about the existence of the source, but also about its location. We derive Bayes factors for model selection and estimation of location based on Markov Chain Monte Carlo simulation. A simulation study shows that with sufficiently high total emission level, our method can effectively locate the source.
Nonparametric Regression from Group Testing Data
Aurore Delaigle (University of Melbourne)
Abstract
To reduce cost and increase speed of large screening studies, data are often pooled in groups. In these cases, instead of carrying out a test (say a blood test) on all individuals in the study to see if they are infected or not, one only tests the pooled blood of all individuals in each group. We consider this problem when a covariate is also observed, and one is interested in estimating the conditional probability of contamination. We show how to estimate this conditional probability using a simple nonparametric estimator. We illustrate the procedure on data from the NHANES study.