Structure Optimization
Structure optimization
Structure optimization has been the central topic of most of my work during my PhD and post-doc and has been the focus of my publications. Rather than describing what this means for each paper I want to provide the relevant context here.
What is a “structure”?
Any scientist likes to believe that their particular field has universal importance, but only the materials scientist can say that literally everything around you is made of the subject of their studies - materials. Everything we interact with is a material of some kind from clothes through medications to electronics.
All of these materials are made of atoms1 and we all have some intuition that the properties of an object depend of the material it is made of. We know that a plastic cup is less brittle than one made of glass. The properties of a material depend on both which atoms it is made of and how those atoms are arranged. I think most people are less aware of how important arrangement of atoms is. Consider for example molecules made of hydrogen, carbon and oxygen, one such molecule is fat, an essential part of our bodies and a largely the secret behind tasty food. Another is acetone, a central component in nail polish remover. They differ only in the number and spatial arrangement but consist of the same atomic species.
Materials are created through all kinds of processes some naturally occuring and some through techniques human materials scientist have discovered over millenia. Steel, a material that has been found in nearly 4000 year old archaeological sites, is a mixture of iron and carbon. The iron used to create steel comes from iron ores that are most commonly oxides, meaning they are made of iron and oxygen. The oxygen can be removed by offering it a chemical partner it prefers more than iron - such as carbon in a reaction that produces carbon-dioxide. Iron on its own is rather soft and ductile, adding a little bit of carbon (up to about 2%) turns it into steel with improved properties. Adding small amounts of different metal can further enhance the steel e.g. to improve hardness which is desirable for e.g. knife making. Forming different alloys such dramatically change the properties, this is however not entirely controlled by just the elements that iron is mixed with but also the resulting spatial configuration of the constituent atoms.
Typically we cannot control the exact configuration of atoms in a material, they arrange themselves in specific ways - typically in ways that minimize their energy2. There are ways levers we can pull to nudge nature towards specific configurations, one such lever is temperature - or really how rapidly we change the temperature. Temperature is a macroscopic property describing the average movement of atoms, at extremely low temperatures atoms don’t have the ability to move much at all and are locked in position with the neighbours. Whereas at high temperatures atoms can move around almost entirely freely and interactions with other atoms have relatively little importance. If molten steel is cooled rapidly the atoms won’t have time to arrange themselves completely. If however the steel is cooled more slowly the atoms have enough time to moved around and find a configuration that they ‘like’ before it gets so cold that they are locked in place3.
Nature kind of takes care of the configuration of atoms - under the conditions that we can impose, such as the cooling rate. The physics at play at the scale of atoms is quantum mechanics, a part of physics that we have little intuition for as its effects are mostly noticable at the exactly the scale of atoms. One area of physics we all have some intuition for is classical mechanics. We can all predict what would happen in this scenario; you’re putting glasses in the cupboad but on the way a glass slips out of your hand. Noone would say, well the glass is going fly into the cupboard, land in its intended spot and come to rest4. We can all predict that the glass is going to fall to the floor and probably shatter. The reason the glass falls is due to gravity or more specifically to reduce its gravitational potential energy5. The gravitational potential energy of the glass is given by
\[ \begin{equation} E(h) = mgh \label{eq:gravity} \tag{1} \end{equation} \]
Where \(E\) is the energy, \(m\) is the mass of the glass and \(h\) is its height above the floorboards. Without you resisting gravity on behalf of the glass nature will act to reduce \(E\) which it does by reducing \(h\). In this example the glass interacts with gravity or perhaps you’d say it interacts with the work through gravity. On the scale of atoms in a material interactions are not through gravity but primarily through electrons and protons, electrons especially have a dysfunctional relatioship with each other. Erwin Schrödinger came up with a way of stating this dysfunctional relationship mathematically, when applied to a material this result in the time-independent many-electron Schrödinger equation which we can write as
\[ \begin{equation} E(\mathbf{R})\Psi = \left[\sum_{i=1}^N \frac{\hbar^2}{2m_i}\nabla_i^2 + \sum_{i=1}^N V(\mathbf{r}_i, \mathbf{R}) + \sum_{i<j}^N U(\mathbf{r}_i, \mathbf{r}_j) \right]\Psi. \label{eq:many-schrodinger} \tag{2} \end{equation} \]
Which on the left-hand side includes the energy \(E(\mathbf{R})\) as a function of the atomic positions \(R\) and the wavefunciton \(\Psi\), and on the right-hand side terms that can be interpreted as the kinetic energy of electrons, the interaction between electrons and the protons in the atoms nucleus and finally the dysfunctional electron-electron interaction. I won’t dwell on the details and complexities of this equation, we just need to know that it provides a way of calculating the energy based on the position of the atoms \(R\) in analogy to how the previous equation provided a way of calculating the energy based on the height of the dropped glass. As with the glass nature again wants to reduce the energy, now \(E(\mathbf{R})\), by arranging the atoms in certain ways.
For gravity equation (1) provides an easy equation that we can solve by hand to find out the \(h\) for which \(E\) is minimum, e.g. \(E(h=0) = 0\) if we measure \(h\) from the floor. However, equation (2) cannot generally be solved by hand - however we can solve it computationally albeit approximately but with enough accuracy. What this means is that with a powerful enough computer we can take a set of atomic positions \(R\) and get a computer to calculate the energy of that configuration \(E(R)\).
What is “optimization”?
Many every day decisions can be thought of as the result of an optimization algorithm, like the route we take to work might be chosen to optimize travel time or the what is served for dinner could be decided based on optimising enjoyment (or following a diet). In more formal terms we might say that we are interested in finding the optimal input values for some objective function. The objective function is may be a function of many variables but it has to output a single number6, or in math \(f: \mathbb{R}^N \rightarrow \mathbb{R}\).
Depending on the objective function the problem might be discrete, that is one where the input variables can only take specific values, or continuous where the inputs can take any value in some range. For continouos problems we sometimes have access to the gradient of the function, by following the gradient we can find a local minimum. The simplest way of following the gradient is gradient descent, where the input variables are updated according to
\[ \mathbf{x}_{i+1} = \mathbf{x}_{i} - \alpha \nabla f(\mathbf{x}_{i}) \]
This is depicted for the Rastrigin function in the figure below with gradient descent applied to four red points each finding a local minimum.
A more difficult problem is finding the global minimum, which is the set of input variables that lead to the lowest value among all possible inputs. Many interesting optimization problems are very high-dimensional and the number of local minima can be very large so algorithms that can handle these are of interest. Generally these algorithms are iterative with each iteration being an attempt at finding the globally optimum solution. One of the simplest such as algorithms is a random search, where each iteration consists of the following steps
- Make a random set of input variables \(\mathbf{x}_i\).
- Locally optimize starting from \(\mathbf{x}_i\) to find a local minimum \(\mathbf{x}_i^{lm}\).
Where \(i\) denotes the iteration number. In order to determine whether an algorithm’s performance is good we need to formalize what that means. One way of doing so is to consider the iteration \(i_{\mathrm{opt]}\) at which the algorithm discovers the optimal solution for a particular objective. The random search algorithm, surprisingly, has an element of randomness - so the iteration at which it discovers the optimal solution will be different each time it is run. This means that \(i_{\mathrm{opt}}\) is a random variable and its distribution depends on the algorithm and objective function the algorithm is applied to. The figure below shows a histogram of \(i_{\mathrm{opt}}\) and the empirical cummulative distribution function (CDF) calculated from this data. The slider controls how many data points for \(i_{opt}\) the histogram contains which also influences the CDF.
Footnotes
Perhaps a particle physicist would find this statement somewhat primitive, but for much of materials science it is good enough.↩︎
Consider playing football on a very uneven field with many small ‘hills’, when the ball is kicked around it has enough kinetic energy to overcome all of the hills. However, if left untouched the ball will roll down and come to rest in a valley - a position that minimizes its gravitational potential energy.↩︎
Imagine a puzzle with 1000 pieces completely scrambled on your desk, if I tell you have 5 minutes to build the puzzle as much as possile you’re unlikely to finish it but will probably have managed to build part of it. This is analogous to a material being cooled rapidly. On the other hand, if I gave you a full day you’d probably be able to finish the puzzle having moved every piece into its optimal configuration. This is like the material being cooled only very slowly.↩︎
I guess maybe this has happened for a few lucky people, but it is at least not a very common occurence.↩︎
If you look up the definition of “energy” in a textbook you are likely to get the answer “energy is the ability to do work”. Then you look for the definition of work and get “work is the energy transferred to or from an object via the application of force along a displacement”. Then you might look up the definiton of force and displacement - perhaps leading to need to look up more defintions. Lets just say that in this case: energy is the ability to fall.↩︎
Disregarding multi-objective optimization.↩︎