Introduction to Large-scale Optimization
Table of Contents
1 Course Description
Although I give a quick review on linear and nonlinear optimization, I presume that the students in this class have been previously exposed to the fundamental ideas of mathematical programming. The material that I present on linear optimization covers a fair amount of methods used for solving large-scale linear (integer) programming problems. There are only two lectures that are reserved for nonlinear optimization. Therefore, I try to cover relatively recent methods, which are used particularly in machine learning. The part on large-scale nonlinear optimization can be considered as a gentle start to another graduate level course that I teach: "Optimization for Big Data."
The focus of this class is on methods and computations. My objective is to discuss the main points so that a student can get started with a project on her own. Many of our graduate students start their theses with a mathematical model. In fact, the modeling part is where their main contribution lies. Then, they aim at finding an appropriate method to solve the resulting problems. I hope the structure of this course caters their needs.
I have to admit that the algorithmic structure of the lectures is preserved at the expense of the mathematical depth. Fortunately, the theoretical aspects of many algorithms in large-scale optimization are relatively straightforward to follow, if you have seen before the main tools in optimization theory, such as; convexity, duality, polyhedral theory, convergence results, and so on.
I have taught this course in Spring, 2016 to a class of 15 graduate students. Except the review lectures (one and six), we reserved one lecture for discussing selected papers, where several students presented the core material in the papers. These papers are listed at the end of each related lecture. To help the students with their presentations, I have also prepared a small note.
2 Software
I have prepared my course notes using Jupyter Notebook with GNU Octave kernel. Please note that GNU Octave is free and it is available for almost all platforms. There is even an online server that you can try without installing GNU Octave on your computer. If you get GNU Octave working, then you can test all the code snippets in the lecture notes. Aside from the base installation, I have only used the optimization and the statistics packages of GNU Octave.
3 Course Outline
- Review on Linear Optimization
- Lagrangian Relaxation and Duality
- Cutting Plane Methods
- Dantzig-Wolfe Decomposition and Column Generation
- Benders Decomposition and Delayed Constraint Generation
- Review on Nonlinear Optimization
- Data Sampling
- Function Decomposition
Keywords: Lagrangian relaxation; cutting plane methods; column generation; delayed constraint generation; decomposition; nonlinear optimization; machine learning.
4 References
- H. Paul Williams, Model Building in Mathematical Programming, 4th Ed., NY:John Wiley, 1999.
- R. Kipp Martin, Large Scale Linear and Integer Optimization: A Unified Approach, Boston:Kluwer Academic Publishers, 1999.
- Nocedal, J., Wright, S. J., Numerical Optimization, 2nd Edition, New York:Springer, 2006.
- Bertsekas, D. P., Nonlinear Programming, 2nd Edition, Athena Scientific, Belmont, Massachusetts, 2003.
- Faigle, U., Kern, W., and Still, G., Algorithmic Principles of Mathematical Programming, Dordrecht, Boston, Kluwer Academic Publishers, 2002.