
Optimizing drug-like molecules traditionally relies on trial-and-error approaches or brute-force computational searches, making it difficult to explore chemical space efficiently. This project introduces a latent space optimization framework that utilizes transformer-generated molecular embeddings to refine molecular properties and identify optimal drug candidates.
Highlights of this approach:
- Optimization in structured latent space – Instead of optimizing raw molecular descriptors, the model refines properties within a learned chemical space.
- Efficient exploration of molecular candidates – Identifies compounds with desirable pharmacokinetic profiles, balancing bioavailability, metabolic stability, and toxicity.
- Seamless integration with predictive models – Connects molecular design and property prediction, reducing the need for exhaustive screening.
This data-driven approach streamlines molecular optimization, reducing reliance on computationally expensive searches and enabling the more efficient discovery of novel drug candidates.


When a cell perceives an external stimulus, multiple biochemical reaction pathways are triggered simultaneously in order for the cell to respond to the stimulus by changing the activities and expression levels of intracellular molecules. Therefore, the quantitative characterization of the biochemical reaction pathways is critical in understanding the dynamic behaviors of the cells. With the rapid improvements in experiment techniques, it has become easier for researchers to measure the dynamics of a reaction pathway. However, how to process and interpret these data still remains a challenge for researchers. To this end, mathematical modeling has become attractive since it can be used to integrate datasets from diverse sources to verify existing hypotheses. At the same time, the model-based optimal design of experiment techniques can be used to guide future experiments to test alternative hypotheses so that the information from future experiments will be maximized. Currently, our group is developing systematic approaches to integrate diverse and complex datasets through mathematical modeling, sensitivity analysis, and parameter estimation so that the resulting model can be served as a surrogate of the real cellular process to test hypotheses and design an optimal experiment to validate new ones.
The entire superstructure of shale gas development can be divided into the following three sub-processes: 1) hydraulic fracturing; 2) wastewater management; and 3) shale gas processing. Currently, these three sub-processes are studied independently without considering their interactions. However, it is very important to understand the complex connections among these sub-processes as they are dependent on each other. In hydraulic fracturing, it is very important to create fractures with optimal propped fracture geometry as it will lead to maximum shale gas production. However, as hydraulic fracturing requires a huge amount of water resources, the profit generated by the extraction of shale gas accompanies environmental concerns, particularly many water-related issues. One of the important concerns is that a certain amount of the injected fracturing fluid flows back to the surface as wastewater, containing high concentrations of various contaminants. Thus, developing an environmentally sustainable and economically viable water management plan along with optimizing production is crucial for wastewater treatment and supplying sufficient freshwater to drilling sites. Furthermore, since shale gas is a hydrocarbon mixture mainly consisting of methane, it requires additional processing units for its subsequent use. Motivated by these
In hydraulic fracturing, the proppant-filled fracture at the end of pumping strongly influences the fluid conductivity of natural oil and gas. Therefore, it is very important to create optimal propped fracture geometry by designing pumping schedules to increase the recovery of shale hydrocarbon. Currently, the pumping schedule is designed offline and applied to a hydraulic fracturing process in an open-loop manner, which may lead to poor process performance if there are large disturbances and plant-model mismatches. Furthermore, the propped fracture geometry depends on the interaction between simultaneously propagating multiple fractures (stress-shadow effects) and the interaction between propagating hydraulic fractures and pre-existing natural fractures in naturally fractured unconventional reservoirs. Motivated by this, first, we focus on developing a high-fidelity process model of hydraulic fracturing processes to understand these interactions. Then, we develop a model predictive control framework for the design of pumping schedules to achieve optimal propped fracture geometry in unconventional reservoirs, which is directly related to the overall efficiency of the operation.