minus plus magnify speech newspaper atomic biology chemistry computer-science earth-science forensic-services globe info math matrix molecule neuroscience pencil physics pin psychology email share atsign clock double-left-chevron double-right-chevron envelope fax phone tumblr googleplus pinterest twitter facebook feed linkedin youtube flickr instagram
Researcher at his computer station


Research & Scientific Expertise
Current research is being undertaken in the following areas: 

Database, Data Mining & Machine Learning (DDML) Research Group


We develop algorithms for machine learning i.e., automatically learning models, patterns, and skills from potentially big data. We then use these algorithms to discover new approaches to designing artificially intelligent systems, as well as to find novel computational solutions to biological, medical, and other complex real-world scientific/technological problems.


Geology | Medical data

Current Projects

The Data Mining Group has a number of project in areas such as interactive pattern mining, subgraph mining algorithms, graph classification, etc.  Details of their projects can be found here.

Fast Reinforcement Learning Using Multiple Models and State Decomposition: This project is a collaboration with Yale University and is funded by a grant from the National Science Foundation (NSF). Intelligent behavior in both natural and man-made systems consists in being repeatedly successful in achieving some desired goals in diverse, observably different situations on the basis of past experience. Learning is central to such behavior, since in both cases, mechanisms have to exist which yield rapid improvement with minimum a priori information. In fact, organizing, coordinating, and executing diverse tasks such as manipulation of effectors, obstacle avoidance, path planning, scene analysis, tracking which are common to both classes of systems, involve learning. The principal objective of this project is to address this important problem using two different methods: (a) the use of multiple identification models, and (b) decomposition of high dimensional state and action spaces. The project elaborates on the different ways in which (a) can be used to improve convergence. In (b) multiple agents with lower dimensional state spaces are used in place of high dimensional state and action spaces to overcome "the curse of dimensionality". The research described above will find application in situations where rapid learning is mandatory. One such area is the control of a fleet of Plug-in, Hybrid, Electric Vehicles (PHEVs). Given a fleet of vehicles, the objective reduces to a complex optimization problem of orchestrating switching between internal combustion engines and electric engines, under a variety of constraints.

WRESTORE (A secure decision support system for coordination of adaptation planning among Food, Energy, and Water actors in the Pacific Northwest): This project is a collaboration with Oregon State University and is funded by a grant from the National Science Foundation (NSF)/U.S. Department of Agriculture (USDA). Given the increasingly strong evidence for emerging climate change and economic trends, coordination of adaptation decisions for managing limited natural resources - such as water and arable land - in food, energy, and water (FEW) sectors, are expected to become increasingly critical. The goal of this project is to establish a novel, intelligent, secure, and human computation-based decision support system that will enable local and regional community actors to coordinate and co-identify robust adaptation decisions for natural resources management in FEW systems, when chronic and/or acute physical and socio-economic perturbations occur.

Computational modeling of grievances and political instability through global media: This project is funded by an INSPIRE grant from the National Science Foundation (NSF).  This project focuses on developing point process models for measuring the level of cross-excitation between social media, web content, and conflict and political instability. A specific application involves predicting instability around elections in Nigeria using election-related Twitter posts.

Algorithms for crime prediction: There are several ongoing projects aimed at designing algorithms for crime prediction and testing predictive policing software in randomized control field trials. This is joint work with the Los Angeles Police Department and collaborators at UCLA.

CAREER: Self-adjusting Models as a New Direction in Machine Learning: This study explores a new class of machine learning algorithms that produce self-adjusting models that can accommodate new classes observed in data in offline as well as online learning scenarios. The project aims to (i) use non-parametric models to dynamically incorporate the changing number of classes; (ii) develop new online and offline inference techniques to accommodate new classes as they emerge; (iii) automatically associate newly discovered classes with higher-level groups of classes in an attempt to identify potentially interesting class formations; and (iv) develop partially-observed tree models containing observed and unobserved nodes, where observed nodes represent existing classes and unobserved nodes are introduced online to fill the gaps in the existing data hierarchy that become evident only with the arrival of new data.

Computational Methods to Explore Big Bioassay Data for Better Compound Prioritization: Bioassay data represent an extremely valuable source of experimental Big Data with rich content that have been substantially produced in the early stages of drug discovery for testing chemical compound bioactivities and identifying promising drug candidates. However, the power of such Big bioassay data has not been fully unleashed, particularly for the purposes of discovering novel knowledge and improving drug development. This is largely due to the fact that the exploration of a much larger space of bioassays has been fundamentally hindered by the less developed ability to identify and utilize the relations across bioassays. In this project, the PI and her team will develop novel computational methods and tools that can effectively explore a wide range of heterogeneous bioassays, identify experimentally unrevealed relations among them, and utilize the novel knowledge derived from them so as to improve compound prioritization. The research will bring scientific impacts and shed light on fully utilizing the existing wealth of Big Data, stimulating knowledge distillation in innovative manners, establishing visionary conceptual hypotheses and developing novel analytical techniques correspondingly. This research aims to solve critical problems in drug discovery through Big Data means, and has a great potential to improve drug candidate identification through accurate compound prioritization, and thus it will have far-reaching economic and societal impacts.

Mining Drug-Drug Interaction Induced Adverse Effects from Health Record Databases: Recent advances in large-scale electronic health record database techniques provide exciting new opportunities to the study of drug safety. Drug-drug interactions (DDIs), a major cause of adverse drug events (ADEs), are a serious global health concern, and a severe detriment to public health. The scale of DDIs involving three or more drugs (also called high-order DDIs) has posed a prohibitory challenge for its molecular pharmacology and clinical research, which motivates alternative strategies such as mining health record data. This project aims to develop large-scale computational strategies and effective software tools for mining high-order DDI effects from health record databases, in order to yield novel discoveries in drug safety, and ultimately to benefit national health and well being.

Imaging & Visualization Research Group


We conduct theoretical and applied research in the areas of Computer Vision, Computational Biology and Neuroscience, Medical Image Computing, Machine Learning and Imaging for forensics. Our goal is to develop novel, automated and user-guided computational methods that can provide robustness, accuracy and computational efficiency in the analysis of visual data.

We work toward solutions to existing problems, as well as explore different scientific disciplines, where our research can contribute with useful interpretation, quantification and modeling.


Medical Images | GIS | Digital Forensic | Transportation

Current Projects

VITAL (Visual Information Translation Analysis & Learning): Our focus is on handling data uncertainties in classification, modeling and prediction from image data. Our hypothesis is that application of mathematical/computational methods can especially help with ambiguities in the data, outliers and incomplete data, and can ultimately help create new hypotheses and directions in different domains. While keeping our core theoretical background in Computer Vision and Pattern Recognition, the application domains of our interest are robotics, (pre-clinical) computational neuroscience and physiology, as well as (clinical) biomedical imaging.

Modeling the structure and dynamics of neuronal circuits at single neuron resolution: The overreaching hypothesis of this research is that the brain is a highly adaptive system defined by specific structure and dynamics, as a whole and at the single cell level. Although this is a fundamental hypothesis, it has been difficult to test using live animals, quantitatively through numerical modeling, or even qualitatively through observation. By bringing together leading-edge imaging technologies and computationally intensive image analytics, we initiated the pursuit of what makes the brain function as a whole throughout life and continuously adapt to various changes such as aging, disease, drug treatment, and injury. We aim at explaining how synaptic connectivity is established in vivo, an important question in Neuroscience today. In this direction, our immediate goal is the spatiotemporal reconstruction of the larval Drosophila Central Nervous System at single-cell resolution, from specialized imagery. The uniqueness of our approach, and the main difference from existing efforts, is the bottom-up reconstruction of neuronal circuits and their dynamics, from single neuron modeling, to graph-based annotation of connectivity maps.

Discovering and quantifying protein interaction networks within single neurons: A key to vitalizing the knowledge of proteome is systematic methods that link individual protein interactions to specific cellular outputs. Our team participates in a cross-disciplinary effort towards describing social sensor systems that directly quantify and visualize interactions between proteins within living animals. Our work so far suggests that restricted protein interactions play the decisive role for developmental processes. At a more global level, we aim at mapping protein interaction networks, and associating them with the structural development of neuronal morphologies.

Visual Analytics of Neuroimaging Data: This is a collaborative project with Indiana University's Department of Radiology and Imaging Sciences. We aim to develop human brain image analysis and visualization techniques and tools for the visual exploration and analysis of human brain image data. Pattern recognition techniques are applied to detect imaging biomarkers for various conditions. Visualization techniques are developed to visualize the brain connectome network's topology, attributes, clusters, markers, genetic associations, and their correlations, within the context of volumetric anatomical features. Visual analytics techniques are also being developed for the analysis tasks such as diagnostic biomarker detection. This project is currently funded by NIH-NIBIB and by IUPUI's Imaging Technology Development Program (ITDP).

Health Care Data Visualization: This is a collaboration with researchers in the Regenstrief Institute and the School of Informatics to develop new visualization techniques and an interactive visualization system for large healthcare data sets. Such a system offers a real time and web-based solution for the effective use of large scale electronic health record systems by allowing system level integration of the human's visual capabilities into the overall health data based decision making system. We developed a novel concept space approach to compress large, heterogeneous, and historical patient and public health data into a single, intuitive and comprehensive visualization. New spatiotemporal visualization techniques were developed for large public health datasets that involve geographical and population wide information. This project has been funded by the US Department of Defense (US Army).

Information Visualization Algorithms: We are interested in developing various general purpose information visualization algorithms. Some examples include (1) Gene Terrain, a large scale graph visualization technique based on scattered data interpolation; (2) Spiral Theme Plot, a time-series data visualization technique; and (3) Color Time Curves, a spatiotemporal data visualization technique. These techniques have been applied to various data analytics and visualization applications such as disease biomarker detection using disease networks and protein-protein interaction network; healthcare data visualization, city traffic data visualization, and text visualization for online review data and unstructured text data.

3D Facial Image Analysis for FAS diagnosis: This was a collaboration with NIH Collaborative Initiative on Fetal Alcohol Spectrum Disorders (CIFASD). We have developed 3D image analysis techniques for Fetal Alcohol Syndrome diagnosis. The focus is on enhancing our understanding of FASD dysmorphology through the processing and analysis of 3D facial images. We have also developed mouse models for facial and brain phenotypes as a function of the dose and stage of embryonic development of the alcohol exposure. New applications of 3D Micro-video-imaging and Micro-computed tomography (Micro-CT) imaging of facial and underlying bone/cartilage allow high resolution analysis of surface-to-bone/cartilage craniofacial dysmorphology from fetal ages to young adulthood. This project was funded by several NIH grants.

Volume Graphics: This research focused on volume rendering algorithms and volume graphics techniques for interactive volumetric modeling systems. I have developed several algorithms for deformable volume rendering and transfer function design in volume visualization. I have also developed a framework of hardware assisted techniques and voxelization algorithms that would allow 3D modeling operations to be carried out interactively in a volume graphics environment. Our results demonstrate that it is possible to achieve high performance volume graphics and volume modeling with the architecture of existing graphics subsystems without any special hardware design. This project was funded by an NSF grant.

Intelligent Vehicles: This research on autonomous and safety driving develops computer vision algorithms for guiding intelligent vehicles in the road without collision. It uses image, video and sensing technologies to achieve the task of safety driving. This includes a variety of functions such as the detection of road edges, lane marks, pedestrians, bicyclists, and the time-to-collision with other vehicles with vehicle borne cameras. The results predict potential dangers and improving driving safety. We develop many motion based approaches to effectively achieve these goals using pattern recognition and data mining based on large scale naturalistic driving video taken in different weather and illumination conditions. we also use a data driven approach to investigate knowledge of the driving environment to enhance AI functions of vehicle sensors and driving video to high dimensional feature space followed with machine learning methods. We area also interested in building a driving interface that makes drivers aware of surrounding traffic, monitoring traffic using network cameras in road infrastructure, large visual survey of road environments, and driving information sharing via V2V and V2I communication.

Networking & Security Research Group


Our research includes Internet architecture, cyber infrastructure for sciences and engineering, wireless sensor networks (WSNs) and Internet of Things, ad hoc mobile networks, Software Defined Networking (SDN), networking and communication security, and various trust, security and privacy issues in real life applications such as health care, personal genomics, social networking and electronic voting.


Sensor Networks | Network Security | Cryptography

Current Projects

Revocable, Privacy-preserving and User-centric Biometric authentication: User authentication an identity management are the first-gate of defense and access protection for cyber systems. The objective of this research is to design innovative secure fusion and key extraction algorithms to address the issues of biometric template security, irrevocability, user privacy, and universal identity in user authentication. We develop a user-centric authentication model and an active authentication system. This model enables users to utilize their biometrics as universal identity to access different systems, intrinsically shields user biometrics, and provides a principle protection framework against a wide range of attacks.  This system is able to facilitate and protects individuals' diffused online activities across cyberspace.

Transparent and Stepwise Verifiable Online Voting: Current assumptions regarding voting in physical booths, hardware and software at polling places, and trusted human supervision in voting schemes, contradict society's trend toward enabling interactions from anywhere, at any time. Some form of remote electronic voting (e-voting) can potentially be a solution in Internet-dominant environments. However, apart from having conflicting requirements such as voter anonymity and vote verifiability, remote e-voting faces some unique problems such as vote-selling and voter-coercion. This work aims to investigate and develop a fully transparent, stepwise verifiable, and assurable remote electronic voting technology with everlasting privacy and resistance to vote-selling and voter-coercion.

Medical Information System Security and Health and Genomic Data Privacy: This work investigates security and privacy issues in personal medical and genomic data, and develops secure and privacy-preserving health information sharing, controlled access, and secure computation technology using advanced cryptographic techniques.

Trusted Collaborating Computing: This work investigates security and privacy issues in multiple user environments including secure group communication, group key management, (hierarchical) access control and controlled data sharing, and privacy-preserving multiple party computation.

Security and Privacy in Social and Mobile Networks: This work utilizes graph structures to represent complex relations among entities involved in social or mobile networks and designs efficient secure and privacy-preserving techniques to protect social/mobile networks from exploitation.

Software Engineering, Distributed and Parallel Computing (SEDPC) Research Group


Our research investigates and exploits the distributed and parallel models of computation to create innovative high-performance, secure, and quality-aware software systems for various real-world applications. We conceive, design, and develop innovative tools, system environments, and concrete prototypes to demonstrate the impact of our research.


Distributed Systems | Software Engineering | Cloud Computing | Knowledge and Data Engineering

Current Projects

eDOTS (Enhanced Distributed Object Tracking System): eDOTS is an active academic research project involving the creation of an opportunistic indoor tracking system with the goal of providing highly accurate tracking estimates in an indoor environment. Indoor tracking is significantly more difficult than its outdoor counterpart due to the nature of indoor environments (lack of wireless signal reception or line of sight) and the tolerance for estimated error with the positional estimate. This work focuses on the discovery and classification of sensors in previously unknown environments and then using optimization techniques in an attempt to find the optimal subset of sensors to be used for tracking. Other areas that are covered as part of this project are multi-sensor data fusion and the use of positioning techniques and algorithms (Wi-Fi, RFID, Bluetooth, Vision, NFC, GPS, and Inertial sensors) for determining an objects given location.    

TruSSCom (Trustworthy Software Service Selection and Composition): The TruSSCom project is developing a comprehensive framework for the design, selection, and composition of trustworthy distributed systems from existing software services using principles of trust models, subjective logic, multi-level specifications and matching, theory of evidence, and machine learning. Unlike prevalent approaches that focus on one view or a specific application domain, this research is creating generic models for trust of individual software services and their ensembles by considering the internal and external views, associated formalisms, prediction analysis, and their applications to real distributed systems from domains of indoor tracking, cyberbullying detection and vehicle-to-vehicle collaborations.

ExaFSI (An Exascale Fluid-Structure Interaction Solving Framework): This project aims to design new scalable algorithms to enable unprecedented large-scale simulations for numerical fluid-structure-interaction simulations. FSI problems are ubiquitous in a wide variety of science, engineering, medical, and biological domains. The research tackles the challenges of minimizing memory accesses, communication cost, synchronization, I/O cost, and load rebalancing as well as CPU/GPU optimizations. It also studies building cost performance models, seeking lower bound, and finding optimal ways to reach the lower bound.

ParTask (A High-Performance and High-Productivity Task Parallelism Model and Library): This project targets designing a new generic task-based parallel programming library to support different scientific domains such as dense/sparse matrix computations, computational fluid dynamics, big graph processing, and machine learning. The new programming library is an extended C library with a simple API. It is able to achieve both high performance and high productivity at the same time due to its simplified interface and efficient task-scheduling runtime system. The project has the potential to combine the ecosystem of HPC and the ecosystem of Big Data via a common programming model and runtime system meanwhile achieving the highest performance.

DataBroker Computing (Creating a Unified Framework to Integrate Simulation/Modeling with Data Analysis Applications): New approaches and methodology are to be developed to support in-situ active data analysis when simulation and analysis are combined in a virtuous circle with significantly increased performance and productivity. The project creates novel ways to unify simulations and data analysis and achieve the optimal performance. The research introduces new data analysis programming API, efficient runtime systems, and new abstractions to optimize data movement, data space management, and resource co-scheduling. It will demonstrate an innovative high performance and easy-to-use big data processing system.

Distributed Simultaneous Localization and Mapping (SLAM): This project explores the methods and algorithms for generating a map of an unknown environment while simultaneously localizing an agent in a distributed computing framework. The goal is to incrementally build a map (i.e., 3D geometry) consisting of stable natural features in the environment as multiple mobile agents move in the environment. Computer vision methods are used in both building the map and localization. In this framework, the mobile agents with cameras (e.g., mobile phones or tablets) can come into the environment, build a local part of the environment, and communicate with other agents in order to contribute to the construction of the global map. These mobile agents can exit the environment, leaving their contribution to the map behind, while other mobile agents can enter and use the map for localization. Or if they are moving in unmapped areas, they can contribute their part of the map.

Automated Detection and Quantification of Liver Biopsy Images in Non-alcoholic Fatty Liver Disease (NAFLD): The goal of this project is to develop image processing and machine learning algorithms in order to automatically analyze liver biopsy images with the goal of assessing disease stage in non-alcoholic fatty liver disease. The methods use texture and shape based features and in order to detect and quantify various features (e.g., macro- and micro-vesicular steatosis, lobular and portal inflammation, fibrosis) of the liver in different stages of the disease. Collaboration with Dr. Samer Gawrieh, IU School of Medicine, Department of Gastroenterology.

Automated Detection and Quantification of Diabetic Retinopathy in Microscopic Images of Retinas: Enumeration of acellular capillaries is used as a marker to assess the experimental diabetic retinopathy (DR) and response to the pharmacological treatment. Traditional approach to quantify acellular capillaries is manual counting either directly under the microscope or using the captured images. The goal of this project is to develop an automated method to improve the quantification of acellular capillaries in rodents by using computer-based image processing algorithms. Collaboration with Dr. Ashay Bhatwadekar, IU School of Medicine, Department of Ophthalmology.

Craniofacial Reconstruction and Recognition from Skulls: Forensic facial approximation is a useful technique for estimating facial morphology of deceased individuals when other forensic methods have failed to achieve identification. Current practice is to manually and physically construct the face on a physical model of the skull. Faces are rebuilt on skulls, using average tissue depth measurements at a small number of locations on the face that are acquired from cadavers and various types of 2D and 3D imaging (e.g. x-rays, ultrasonic echo location, ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), cone beam computed tomography (CBCT)) or by estimating and rebuilding facial musculature. The process relies on artist's interpretation and experience in rebuilding the face shape. The methodology associated with estimating facial appearance has room for substantial improvement. The goal of this project is to use three-dimensional cone beam computed tomography (3D CBCT) images and machine learning methods based on multivariate statistical modeling and multidimensional shape space to develop a novel, standardized, and accurate method for approximating facial form from unidentified craniofacial remains, with the goal of eliminating individual artistic interpretation out of the process and improving replicability. The final method will also involve interactive tools in order to modify facial shape based on such factors as age and body mass index (BMI). Collaboration with Dr. Katherine Kula, IU School of Dentistry, Department of Orthodontics and Oral Facial Genetics.

Environment is the Laboratory for Cutting-Edge Computer Science

Yao Liang, Ph.D. Professor
Give Now