I'm a PhD student in the Computer Science Department at UT Austin. I'm interested in learning algorithms, especially in the context of robotics and computer vision.
I recently completed a Student Researcher Program at Google Brain Robotics in Mountain View, CA, lasting about 19 months.
I got my Master's in Computer Science from University of Florida, working mostly on theoretical computer science and approximation algorithms. I got my Bachelor's degree in Computer Engineering from Sharif Univerisity of Technology.
Before starting graduate school, I worked as a software engineering. Examples of my work include: lead software developer for a startup, light-weight web framework open-sourced in 2002, and library functions contributed to TensorFlow.
My talk on sample-efficient learning of robot table tennis in a virtual reality environment.
High-resolution videos are available on the project website.
I successfully defended my PhD thesis. Thanks to my adviser Risto Miikkulainen, and my supportive committee members, Sergey Levine, Luis Sentis, Scott Niekum, Aloysius Mok.
Our paper on sample-efficient learing of robot table tennis is on arXiv. Videos
We have published a Google AI blog post covering our recent work on predicting object motion and depth from video.
I completed a Student Researcher Program at Google Brain Robotics, lasting about 19 months.
Our follow-up work on unsupervised depth and ego-motion prediction is accepted to AAAI 2019.
The vid2depth codebase for our upcoming CVPR paper is released in the TensorFlow Models repository.
We have released the bike video dataset. See some sample snippets from the videos on the project website.
vid2depth was featured in Google I/O '18 .
Our work on unsupervised learning of scene depth and camera motion just by observing the movement of pixels in raw monocular video (vid2depth) is accepted to CVPR 2018 in Salt Lake City, Utah.
I'm presenting our video prediction paper at IEEE Intelligent Vehicles Symposium in Redondo Beach, CA.
In computer vision, my focus has been on unsupervised and self-supervised learning to extract information from readily-available sources of data. In particular, I have worked on applying deep learning and geometry to estimate scene depth and camera motion just from analyzing the movement of pixels in raw single-view videos.
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos AAAI, 2019 Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova This paper presents a refined unsupervised depth and motion prediction model that is capable of predicting depth and motion of dynamic objects in addition to the motion of the camera, all from raw single-view (monocular) video. In addition, if multiple frames are available at inference time, a refinement process produces more accurate depth and motion estimates. |
---|
Future Semantic Segmentation Using 3D Structure ECCV 3D Reconstruction meets Semantics Workshop, 2018 Suhani Vora, Reza Mahjourian, Soeren Pirk, Anelia Angelova Given a stream of monocular video frames sparsely labelled with semantic segmentation maps, this method estimates the 3D structure of the scene and uses that to predict the semantic segmentation of future frames. |
---|
Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints CVPR, 2018 Reza Mahjourian, Martin Wicke, Anelia Angelova This paper applies deep learning and geometry to estimate scene depth and camera motion just from analyzing the movement of pixels in raw single-view videos. The neural network estimates 3D point clouds for each frame and the camera motion between adjacent frames. Transforming the point clouds based on the estimated camera motion and aligning them in 3D provides the supervisory signal for learning both depth and camera motion without ground truth. |
---|
Geometry-Based Next Frame Prediction from Monocular Video IEEE Intelligent Vehicles, 2017 Reza Mahjourian, Martin Wicke, Anelia Angelova A recurrent neural network with convolutional LSTM cells is trained to predict depth from a sequence of monocular video frames. The memory in LSTM cells allows the network to The depth prediction along with the camera trajectory is then used to compute a prediction for the next frame. |
---|
In robotics, my focus has been on developing approaches that are sample-efficient enough that learning algorithms can be used to solve complex robotic tasks. My work has explored active learning for robotics. I have worked on applying hierarchical learning to robotics in setups where the low-level control problems are solved using optimal control and model-free learning is used only for high-level behaviors. Our recent work on learning robot table tennis is such an approach, which trains zero-shot striking skills based on dynamics models trained from observing human games in a virtual reality environment, and applies model-free reinforcement learning tactfully to discover novel game-play strategies.
In studying reinforcement learning, I have worked on understanding the properties of learning algorithms and problem domains that contribute to the success or failure of learning approaches. I have studied the impact of domain properties like ergodicity and stochasticity on reinforcement learning with self-play. I have also worked on meta-learning to discover effective feature sets for reinforcement learning.
Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play arXiv preprint, 2018 Reza Mahjourian, Risto Miikkulainen, Nevena Lazic, Sergey Levine, Navdeep Jaitly This work studies sample-efficient learning of complex policies in the context of robot table tennis. Human demonstrations in a virtual reality environment are used to train dynamics models for the game objects, which together with an analytic paddle controller allow any robot anatomy to play table tennis without training episodes. Self-play is used to train cooperative and adversarial game-play strategies on top of model-based striking skills trained from human demonstrations. Further experiments demonstrate that more flexible variants of the policy can discover new strikes not demonstrated by humans and achieve higher performance at the expense of lower sample-efficiency. The high sample-efficiency demonstrated in the evaluations show that the proposed method is suitable for learning directly on physical robots without transfer of models or policies from simulation. |
---|
Task Planning with Guided Policy Search Preprint, 2016 Reza Mahjourian, Risto Miikkulainen Discovering suitable cost functions allows Guided Policy Search (GPS) to solve tasks that require planning for intermediate goals. As the animation in the video shows, direct optimization may lead to local optima. |
---|
Neuroevolutionary Planning for Robotic Control PhD Proposal, 2016 Reza Mahjourian, Risto Miikkulainen In this work, an evolutionary strategy is applied to discover robotic controllers for an object manipulation task. For simple control tasks, controllers with precise behavior are learned. However, when the task is complex enough that it require strategy and planning, finding solutions becomes hard. This work proposes a new evolutionary method to discover and complete subtasks leading to completion of an original objective. |
---|
Robotic Control Through Neuroevolution BEACON, 2014 Reza Mahjourian, Risto Miikkulainen This work studies the impact of neural network architecture on efficiency of neuroevolution (NEAT) on object manipulation tasks using the Atlas robot. |
---|
An Evolutionary Feature Discovery Method for Reinforcement Learning GECCO submission, 2013 Reza Mahjourian, Peter Stone This work presents a meta-learning approach for generating and evaluating candidate feature sets for reinforcement learning with linear function approximators (Gradient-Descent Sarsa(λ)). |
---|
Studying Impact of Domain Ergodicity and Stochasticity on Reinforcement Learning with Self-Play Preprint, 2011 Reza Mahjourian, Prateek Maheshwari, Risto Miikkulainen This work studies hypotheses on why reinforcement learning worked so well for backgammon in TD-Gammon. Does backgammon have particular properties that make it easier for reinforcement learning and self-play to work? Can these properties be exploited to design better general learning algorithms? Follow-up experiments show domain stochasticity to have a strong impact on reinforcement learning with self-play. |
---|
Optimizing Selection of Training Samples for Robotics Learning Problems Preprint, 2011 Reza Mahjourian, Peter Stone Uses an ensemble of neural networks and selects samples by prioritizing data points where the networks in the ensemble disagree the most about predictions (most variance). |
---|
An Approximation Algorithm for Conflict-Aware Broadcast Scheduling in Wireless
Ad Hoc Networks The ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2008 Reza Mahjourian, Feng Chen, Ravi Itwari, My Thai, Hongqiang Zhai, Yuguang Fang This paper introduces and proves correctness of a constant approximation algorithm for minimum-latency conflict-aware broadcast scheduling in wireless networks. A constant approximation algorithm is a polynomial-time solution to an NP-hard problem such that the solution is within a constant multiple of the optimal solution to the problem. |
---|
An Architectural Style for Data-Driven Systems International Conference on Software Reuse (ICSR), 2008 Reza Mahjourian This paper describes the design of XPage, a light-weight web application framework, which is also published as open-source software in 2002, and deployed in six data management apps by the author. It is designed specifically for data management applications and allows the developer to specify each application page at a very high level by specifying the data sources and attributes that it retrieves or modifies. |
---|
Software Connector Classification and Selection for Data-Intensive Systems International Workshop on Incorporating COTS Software into Software Systems, 2007 Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian This work explores the role of software connectors in systems specifically designed for distributing large volumes of data. |
---|