May 30-31, 2011
Towards Automated NMR Protein Structure Determination
Xin Gao ( KAUST)
Abstract
Protein three-dimensional structure determination is the key towards the understanding of the function of the human body. Nuclear Magnetic Resonance (NMR) is one of the two main methods for protein structure determination. Currently it takes weeks to months of human labor to determine a protein structure after the NMR experiments. If we could fully automate this process, this would significantly speedup the structural biology research. In this talk, we will identify the key obstacles in this process, decompose the problem into subproblems, and introduce our work to solve these subproblems through computational methods. Our final system has succeeded in determining high resolution protein structures from a small set of NMR spectra, in a day.
Analyzing Multiple-probe Microarray: Estimation and Application of Gene Expression Indexes
Mehdi Maadooliat (TAMU)
Abstract
Gene expression index estimation is an essential step in analyzing multiple probe microarray data. Various modeling methods have been proposed in this area. Amidst all, a popular method proposed in Li and Wong (2001) is based on a multiplicative model, which is similar to the additive model discussed in Irizarry et al. (2003) at the logarithm scale. Along this line, Hu et al. (2006) proposed data transformation to improve expression index estimation based on an ad hoc entropy criteria and naive grid search approach. In this work, we re-examined this problem using a new profile likelihood-based transformation estimation approach that is more statistically elegant and computationally efficient. We demonstrate the applicability of the proposed method using a benchmark Affymetrix U95A spiked-in experiment. Moreover, we introduced a new multivariate expression index and used the empirical study to shows its promise in terms of improving model fitting and power of detecting differential expression over the commonly used univariate expression index
From Revealing New Insights into Human Tissue Development to Minimum Curvineality
Carlo Vittorio Cannistraci and Timothy Ravasi (KAUST)
Abstract
We will present the data-mining exploration of 32 human tissues determined by 1321 transcription factor (TF) expressions. Integrating the expressions with the physical TF interactions and performing machine learning (ML) analysis, we detected 6 expressionweighted-interactions – a TF homeobox-sub-network – as best discriminating features that unfolded the presence of the three developmental tissue germ-layer-classes (ectoderm, mesoderm, endoderm) with 82% accuracy. This first investigation was published in Ravasi et al. (Cell 2010, around 30 citations at present) and awarded to be in the list of 2010 breakthrough papers in computational biology (Mak HC., Nature Biotechnology volume-29 Jan-2011). Then we will introduce ‘Minimum Curvilinearity’ (MC), which is a principle that – for small datasets – suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. We will reveal how starting only from the expressions, it was possible to provide a bi-dimensional data visualization of the 32- human-tissue dataset that, evaluated by clustering, offered 84% accuracy. This was achieved by means of two novel unsupervised and parameter-free MLs based on the MC principle: minimum-curvilinear-embedding (MCE) for nonlinear-dimension-reduction, and minimum-curvilinear-affinity-propagation (MCAP) for non-spherical-clustering. The MC study was published in Cannistraci et al. (Bioinformatics 2010) and selected for oral presentation in European Conference on Computation Biology 2010 (ECCB10). We will conclude with our recent results on applying Minimum Curvilinearity for network-topological-prediction of new protein-protein interactions (PPI). Also in this different biological contest, MC demonstrated to offer a very efficient framework despite its extreme computational simplicity. The MC-based method largely overcame the other topological methods for prediction of new PPI, and most interestingly it required a significantly lower computational time.
Wave Propagation Through Solids Infused with Fluids
K. Rajagopal (TAMU)
Abstract
We study the propagation of waves in homogeneous isotropic and transversely isotropic elastic solids infused with fluids, within the context of the Theory of Mixtures. We show that our theory reduces to that proposed by Biot, when appropriate simplifying assumptions are made. We analyze the propagation of transverse plane waves, longitudinal waves and spherical waves in both isotropic and transversely isotropic elastic solids infused with a fluid.
Seismic Models of Fractured Reservoirs
Richard Gibson (TAMU)
Abstract
Both seismic wave propagation and fluid flow are affected by the concentration and orientation of fracturing in reservoir rock. An important aspect of seismic reservoir characterization is therefore to use reflected seismic waves to measure spatial variations in fracture properties within the reservoir. Changes in subsurface stress systems can also affect these fractures, making both the seismic properties and permeabilities stress-dependent. Time-lapse seismic data therefore have strong potential to provide important constraints on changes in fracture systems that can in turn help to better constrain modeling and inversion of fluid flow in hydrocarbon reservoirs to enhance reservoir management. However, there are many challenges in accurately and realistically including the effects of fractures in seismic modeling, especially when developing field-scale models incorporating geological information as well. I will both describe some of these challenges and present recent research results for methods designed to help in seismic characterization of fractured reservoirs. Specifically, I will describe a new model for pressure-dependent seismic velocities of fractured rock that requires a relatively small number of parameters. This approach has strong potential to simplify reservoir characterization tasks compared to existing solutions that require more complicated rock descriptions. These results can then be used in the field scale models to predict seismic reservoir response or, eventually, as a part of an inversion of seismic data for fracture properties.
Challenges and Opportunities in Seismic Estimation of Permeability in Carbonates
Yuefeng Sun (TAMU)
Wave Propagation in Very Large Domains
V.M. Calo, L. Demkowicz, J. Gopalakrishnan, D. Pardo, I. Muga, and J. Zitelli (KAUST)
Abstract
Numerical methods have been long used to study acoustic and elastic wave propagation in heterogeneous media, since no analytic solutions exist for realistic subsurface models. The usefulness of a numerical method to the wave propagation problem is determined by its stability and grid dispersion, while its effectiveness is determined by the computational cost of forming and resolving the algebraic system that results from the discretization. Stability and dispersion effects control the number of degrees of freedom needed to resolve the waves in the material. We recently developed numerical methods that do not suffer from non-physical (numerical) dispersion and are unconditionally stable, independently of physical parameters and computational discretization. These methods are designed to attain good accuracy by picking trial spaces for the solution which have good approximation properties in the norm of interest to the sought exact solution, while the corresponding weighting space is computed on the fly to warranty stability. This technology generalizes finite elements and allows for the accurate modeling of large physical domains, orders of magnitude larger than what alternative state-of-the-art methodologies do. That is, when stability and numerical dispersion do not cloud the simulation results, the effects of heterogeneities boundaries, their reflection as well as the high frequency effects can be captured in the simulations. This new discretization technology for wave propagation removes the dispersion and stability constraints from the simulation in the frequency domain. In this presentation we describe this new approach based on a Discontinuous Petrov-Galerkin method that computes “optimal test functions”, which by definition, guarantee uniform stability and they ensure that the corresponding DPG solution becomes the “best solution” over a given subspace. Numerical results in two dimensions validate the theoretical predictions.
ERA: An Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings
Essam Mansour (KAUST)
Abstract
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. In his talk, I will introduce our research on a disk-based suffix tree construction method, called Elastic Range (ERA), which works efficiently with very long strings that are much larger than the available memory. ERA partitions the tree construction process horizontally and vertically. ERA minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERA also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared- nothing architectures. ERA indexes the entire human genome in 21 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.
The STAPL Parallel Container Framework
Lawrence Rauchwerger (TAMU)
Abstract
The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. In this work, we present the STAPL Parallel Container Framework (PCF) that is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers, without requiring the programmer to deal with concurrency or data distribution issues. The PCF provides a large number of basic parallel data structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The PCF provides a class hierarchy and a composition mechanism that allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate STAPL pContainer performance on a CRAY XT4 massively parallel system and show that pContainer methods, generic pAlgorithms, and different applications provide good scalability on more than 16,000 processors.
The Exascale: Why and How
David Keyes (KAUST)
Abstract
Sustained floating-point computation rates on real applications, as tracked by the ACM Gordon Bell Prize, increased by three orders of magnitude from 1988 (1 Gigaflop/s) to 1998 (1 Teraflop/s), and by another three orders of magnitude to 2008 (1 Petaflop/s).Computer engineering provided only a couple of orders of magnitude of improvement for individual cores over that period; the remaining factor came from concurrency, which is approaching one million-fold. Algorithmic improvements contributed meanwhile to making each flop more valuable scientifically. As the semiconductor industry now slips relative to its own roadmap for silicon-based logic and memory, concurrency, especially on-chip many-core concurrency and GPGPU SIMD-type concurrency, will play an increasing role in the next few orders of magnitude, to arrive at the ambitious target of 1 Exaflop/s, extrapolated for 2018. An important question is whether today’s best algorithms are efficiently hosted on such hardware and how much co-design of algorithms and architecture will be required. From the applications perspective, we illustrate eight reasons why today’s computational scientists have an insatiable appetite for such performance: resolution, fidelity, dimension, artificial boundaries, parameter inversion, optimal control, uncertainty quantification, and the statistics of ensembles. The paths to the exascale summit are debated, but all are narrow and treacherous, constrained by fundamental laws of physics, cost, power consumption, programmability, and reliability. Drawing on recent reports, workshops, vendor projections, and experiences with scientific codes on contemporary platforms, we propose roles for today’s researchers in one of the great global scientific quests of the next decade.
Some Results on PDE in Math Biology
Rana Parshad (KAUST)
Abstract
In this talk we present some results on some PDE models for control of invasive species. These are species that are non native to an environment, and most often cause ecological and economic damages. There are currently not many practical methods for their control. We will also briefly present some results on diffusive planktonic systems.
Estimation of the Drag Coefficient Using the Ocean’s Response to a Hurricane: An Inverse Problem Approach
Sarah Zedler (TAMU)
Abstract
When wind forces the ocean, a fraction of the momentum is transferred beneath the surface to generate currents and turbulent mixing of the water column. The variable of proportionality representing that fraction, traditionally modeled as a function of wind speed, is referred to as the drag coefficient. At low to moderate wind speeds (safely in the range 11-20 m/s), the relationship between wind speed and the drag coefficient is relatively well established as a monotonically increasing function. However, in very high winds, the nature of the air-sea boundary layer changes, and evidence suggests that the drag coefficient levels off or even decreases with wind speed. Most estimates of the drag coefficient at high wind speeds have been using atmospheric data sets. In this talk, we describe an inverse problem setup whereby sea surface temperature and/or surface currents are assimilated into an ocean only model, and the drag coefficient is adjusted to achieve the smallest model minus data misfit, for a small array of ocean measurements.
Amgad Salama (KAUST)
Abstract
Flow and transport phenomena in porous media are widespread. They include, for example, groundwater contamination, carbon sequestration, petroleum exploration and recovery, material design, chemical separation processes, and many others. However, accurate mathematical and numerical simulation of flow and transport remain challenging. Several reasons may be highlighted that may explain the difficulties associated with the accurate modeling of transport phenomena in porous media. The first has to do with the fact that the governing equations are usually nonlinear with possibly rough and discontinuous coefficients, whose solutions are often
singular and discontinuous. The second is related to the scale of the problem in the sense that flow and transport in porous media are often solved on large scale domains which require the solution to be carried out on massive computers with the computational algorithm adopted for parallel computations. The third may be the presence of several interplaying phenomena including phase change, capillarity, partitioning between phases, structural deformations, etc. Furthermore it is important to realize the large degree of uncertainty associated with real porous media parameters. At the Computational Transport Phenomena Laboratory (CTPL) at KAUST, we are interested in working with many of the aforementioned difficulties. As an example, we are currently interested in applying advanced discretization methods (e.g. finite elements, finite volumes, and finite differences) to the governing equations with emphasizing on local mass conservation and compatibility of numerical schemes. Another important solution step is the design of fast and accurate solvers for the large-scale linear and nonlinear algebraic equation systems that result from discretization. Solution techniques of interest include multiscale algorithms, mesh adaptation, parallel algorithms and implementation, efficient splitting or decomposition schemes, and others. In this presentation I’ll highlight on the current activities within our group emphasizing on two of the research projects that are currently going on. This includes CO2 sequestration into deep geologic formations and the flow-induced instabilities associated with the dissolution of sequestered CO2 into brine aquifers.
Space-Time Capture-Recapture Models
Nelis Potgieter (TAMU)
Abstract
Capture-recapture experiments are frequently used to estimate the population size of a species. The traditional approach used for this estimation does not incorporate spatio-temporal information. As it is often the case that traps are spread out across a geographical area, the addition of spatio-temporal information can prove valuable. A method for incorporating this through the use of covariates is investigated and is compared to the standard approach which does not take this information into account. We show that the estimator of population size is often robust against misspecification of the spatial and temporal dependence. However, these dependence parameters are often of intrinsic interest as well. The results of a Monte Carlo simulation study are also reported.
Automatic Traveltime Picking Using Local Time-frequency Maps
Christos Saragiotis (KAUST)
Abstract
The arrival times of distinct and sufficiently concentrated signals can be computed using Fourier transforms. In real seismograms, however, signals are far from distinct. We use local time-frequency maps of the seismograms and its frequency derivatives to obtain frequency-dependent (instantaneous) traveltimes. A smooth division is utilized to control the resolution of the instantaneous traveltimes to allow for a trade-off between resolution and stability. We average these traveltimes over the frequency band which is data-dependent. The resulting traveltime attribute is used to isolate different signals in seismic traces. We demonstrate the effectiveness of this automatic method for picking arrivals by applying it on synthetic and real data.
Multiscale and Multisource Phase Encoded Seismic Inversion
Gerard Schuster (KAUST)
Abstract
Full waveform inversion (FWI) of seismic data has the potential to provide unprecedented views of the earth’s velocity and density distributions. The idea is to find the velocity and density models that can best predict all of the waveforms in observed seismic data dobs generated and recorded with seismic experiments. There are two significant problems with this method: lack of robust convergence to the true solution and enormous computational demands to generate simulated data dsyn for millions of synthetic seismic traces. To partly overcome the problem of getting stuck in the local minima of the data misfit function ||dobs-dsyn||^2, multiscale methods initiate the iterative gradient optimization methods with low-pass filtered data. These low-pass filtered data have many fewer local minima in the misfit function and so the gradient method tends to converge rather quickly to a somewhat accurate representation of a smoothed earth model. Higher frequencies are then allowed into the input data and the gradient method will tend to converge to higher-wavenumber estimates of the model. This multiscale strategy is continued until acceptable earth models are computed. The second significant problem of enormous computational cost is tackled by a multisource phase-encoded inversion method. It is shown that the cost of FWI can be reduced by several orders of magnitude if the seismic data are phase-encoded and blended together to form a much smaller data set to be inverted. Examples of FWI are shown for both synthetic data and field data. The following issues are still a subject of research: 1). comprehensive inclusion of the actual physics into the waveform simulations, 2). Correct parameterization of the model to achieve a unique and accurate solution, 3). Accurate assessment of solution reliability.
The Local Ensemble Transform Kalman Filter: Theory and Applications
Istavan Szunyogh (TAMU)
Abstract
This talk describes one particular ensemble based data assimilation algorithm, the Local Ensemble Transform Kalman Filter (LETKF). It will be shown that, in addition to the estimation of the state, the LETKF can efficiently estimate the bias in the observations and the short term forecasts that serve as the prior estimate of the state. Particular examples for the application of the LETKF will be shown with implementations on global models of the terrestrial and Martian atmospheres and on a limited area atmospheric model for the northwest Pacific for the 2004 Typhoon season.
Ensemble H∞ Filtering for Data Assimilation into Large Scale Models with Intermittent and Fast Varying Regimes
Ibrahim Hoteit, Xiaodong Luo and Umer Altaf (KAUST)
Abstract
Kalman-based filters are generally inefficient when applied to highly intermittent and fast varying processes. This contribution addresses these difficulties through the H∞ filtering theory. The H∞ filter is by design a robust filter that provides a systematic way to accommodate potential uncertainties, such as errors in specifying initial conditions, the model error and the observation error, that often arise in realstici data assimilation applications. To this end, its optimality is based on the minimax rule, in contrast with the traditional Bayes’ rule used in the Kalman filter (KF), to update prior information. In doing so, the HF produces more robust posterior estimates than the KF in the sense that the posterior estimation error has a finite growth rate with respect to the uncertainties in assimilation; except for the special case when it is simplified to the KF itself. In this contribution, we propose a new ensemble H∞ fitler. The original form of the H∞ filter contains global constraints in time, which may be inconvenient for sequential data assimilation problems. Therefore, we introduce a variant that solves some time-local constraints instead, and hence we refer to our filter as the time-local H∞ filter (TLHF). By analogy to the Ensemble Kalman Filter (EnKF), we also propose the concept of ensemble time-local H∞ filter (EnTLHF). We outline the general form of the EnTLHF, and discuss some of its special cases. In particular, we show that the EnTLHF provides a general framework for conducting covariance inflation in the EnKF-based methods. We use numerical assimilation experiments with a storm surge model in the Gulf of Mexico to study the relative robustness of the TLHF/EnTLHF in comparison with the corresponding KF/EnKF when dealing with systems with intermittent regimes.
Panels
Panel A (Christine Ehlig-Economides, Yalchin Efendiev)
Room 108 Mitchel Physics Building
The panel will address how the interplay between seismic data and fluid dynamics reduce uncertainty in reservoir management.
Moderator: Christine Ehlig-Economides, Professor of Petroleum Engineering, Texas A&M University
Paneliests:
Craig Calvert – Senor Reservoir Geoscience Consultant, Exxon-Mobil Upstream Research
Gerard Schuster – Professor of Geophysics, KAUST, Saudi Arabia
Xiao-Hui Wu – Research Specialist, Exxon-Mobil Upstream Research
Mike King – Professor of Petroleum Engineering, Texas A&M University
——————————————————————————
Panel B
Room 102 Mitchell Physics Building
David Keyes, the dean of Mathematical and Computer Sciences and Engineering will meets with students, post-docs, and faculty about opportunities at KAUST.