Optimizing Your Transportation Network can Reduce Your Carbon Footprint

In the past few years, I have worked on optimization models for different sectors of the Transportation industry that all have one thing in common: the goal always includes minimizing non-revenue trips, either explicitly defined as the objective function or captured as a cost to minimize. A non-revenue trip, sometimes called a ferry or deadhead, is a move to reposition resources at a new location. In a transportation scheduling or routing model, the majority (hopefully) of the moves yield a revenue: a commercial flight delivers paying customers to a destination, the post office delivers postage-paid mail to our homes, a delivery truck drops off the TV a customer purchased. These revenue trips require resources: airplanes, trucks, trains, pilots, drivers, fuel, etc… Non-revenue trips require the same resources but without generating the revenue required to pay for these resources. Regardless of the company and its products or services, unnecessary non-revenue trips are a waste of time, money, and resources.

An objective I have yet to see in any real-world optimization model I have worked on or encountered is to minimize the impact on the environment. That may be a little surprising since corporations are more focused on going green and helping the environment now than ever before. Recycling is common-place and my mother’s company even provides reduced fare for employees willing to take the bus to work rather than drive their own cars. While I am certain it is possible to include green objectives in a model, I haven’t had a client say “Minimize my emissions or my carbon footprint“. Or have I?

After all, an unnecessary non-revenue trip wastes more than time and money; most modes of transportation require non-renewable fossil fuels and produce emissions that are harmful to the environment. I do want to distinguish between unnecessary and necessary non-revenue trips. It’s generally impossible to eliminate non-revenue trips completely — sometimes a resource is needed in a location where none already exist. But often an optimization model can find ways to sequence stops to reduce the total number of non-revenue trips and minimize the distance of remaining non-revenue trips. Even just reducing non-revenue miles can save fuel and reduce emissions while saving the company money and time. Companies don’t need to explicitly model “green” objectives to help the environment.

Companies that utilize optimization to improve the efficiency of their operations should take an additional step and try to track the overall reduction in non-revenue trips and miles, and other areas of resource waste, as part of their effort to go green and become more environmentally friendly. A carbon footprint calculator like the one at Carbon Footprint can even help a company calculate the reduction in their carbon footprint as a direct result of their optimization efforts. Whether directly modeled in the objective or a by-product of other goals, companies can have a positive impact on the environment through the use of optimization and that benefits everyone.

This blog post is a contribution to INFORMS’ monthly blog challenge. INFORMS will summarize the participating blogs at the end of the month.

Posted in Uncategorized | Tagged , , , | Leave a comment

When the Data Changes Before the Model Even Finishes Solving

This month’s INFORMS blogging challenge topic is OR and analytics. Last Wednesday I was thinking seriously about this topic, in particular how OR and analytics, although separate fields, can benefit from each other, when I saw Michael Trick’s post outlining what each field can learn from the other. His analysis was very insightful and one of his points really hit home with me — #1 of what Operations Research can learn from Business Analytics:

It is not just the volume of data that is important:  it is the velocity.  There is new data every day/hour/minute/second, making the traditional OR approach of “get data, model, implement” hopelessly old-fashioned.  Adapting in a sophisticated way to changing data is part of the implementation.

Data volumes are increasing, data is coming from multiple sources in a variety of forms and states of completeness and cleanliness, and data is constantly changing. I am starting to see the effect ever-changing data can have on optimization more and more as my clients adapt the way they use our models . Many of our models are operational scheduling models that plan for a 24-48 hour planning horizon which generally starts a day or two in the future. The model sets the schedule for this future time period and was often designed to run only once a day. But when you plan this far in advance, things are bound to change. New tasks or assignments are added, existing ones are modified or may even be removed. How does this updated data affect the model’s results? Can it be incorporated into the current solution or is a completely new solution needed? What do we do when a model requires 30 minutes or an hour to solve but the data changes every minute? These needs are often not captured in the original business requirements for the optimization model but need to be addressed if the model is going to be effective in a real-time environment with volatile data.

Sometimes the model solves fast enough and schedules far enough in advance that it can be run continuously with data updates incorporated into each new solve. However this can result in solutions that change dramatically with each run which can be disruptive in a business environment. Consider a model that schedules workers for shifts. After the first run, a worker could be scheduled for a 8am shift. But after the next model run, the solution now has the worker scheduled for a 8pm shift. This is a pretty significant change. It also prevents the users from being able to notify the worker about his/her upcoming scheduled shifts because the schedule is constantly changing. One way that we have mitigated this problem is to place a higher value on the existing schedule in the objective function which prevents the optimization model from changing the current solution unless the change would result in substantial savings or benefits.

It may not be possible to continuously run an optimization model through a “full” solve because of a lengthy run time. One of our scheduling models essentially solves a set partition problem where the bulk of the model’s processing time is spent defining the feasible sets and only a fraction of the time is actually spent solving the optimization problem. In this case we need two modes: “full” mode and “update” mode. Full mode generates the entire space of feasible sets and then solves the resulting optimization problem. The model then switches to update model where it modifies, adds, and removes sets based on any data modifications that have occurred and solves the new optimization problem. These updates are significantly faster than regenerating all of the feasible sets so update mode runs in a fraction of the time that full mode requires.  We offset an initial long run-time with subsequent quick updates.

Finally, rather than attempting to retrofit an existing optimization model built to handle a static data set, we have started to assume that our models need to be capable of incorporating fluctuating data and design this into the models from the outset rather than wait for the client to ask for this ability down the road. It is much easier to design and build flexibility into an optimization model from the start than to try to add it at a later date. Our clients’ business needs are constantly evolving and we are working hard to anticipate their future needs and build Operations Research tools that evolve with them. We must adapt to the changing frontier of data: more data from multiple sources that changes frequently and often has not been “sanitized” for use in an optimization model.

Are you seeing an increased need to incorporate changing data into your Operations Research models, and if so, how are you handling this new and difficult requirement?

This blog post is a contribution to INFORMS’ monthly blog challenge. INFORMS will summarize the participating blogs at the end of the month.

Posted in Uncategorized | Tagged , , | Leave a comment

Why is SpORts So Popular?

Several years ago I attended a SpORts (OR in Sports) session at the annual INFORMS meeting (I think it was New Orleans –> San Francisco 2005) and the room was packed. All the chairs were taken and latecomers resorted to standing against the back wall. Not only was this one of the most attended sessions of the conference (at least compared to the other sessions I attended), it also had an unusually high level of presenter-audience interaction.

In my experience it is fairly normal for a presenter to receive few, or even zero, comments and questions after a presentation. But in this SpORts session, several members of the audience interacted with the presenters; probing the data and methods, questioning the results and providing their own ideas for improvement. It made me wonder: What is it about OR applied to sports that made the audience so much more engaged than with other applications?

I think the answer lies in the accessibility of the data and results of a sports application of OR. Often only a handful of people know enough about an OR problem to be able to fully understand the problem’s data and judge the quality of potential solutions. For instance, in an airline’s crew scheduling problem, few people may be able to look at a sequence of flights and immediately realize the sequence won’t work because it exceeds the crew’s available duty hours or the plane’s fuel capacity. The group of people who do have this expertise are probably heavily involved in the airline industry. It’s unlikely that an outsider could come in and immediately understand the intricacies  of the problem and its solution.

But many people, of all ages and occupations, are sports fans. They are familiar with the rules of various sports, the teams that comprise a professional league, and the major players or superstars. This working knowledge of sports makes it easier to understand the data that would go into an optimization model as well as analyze the solutions it produces.

If I remember correctly, one of the presentations during that INFORMS SpORts session was about rating/ranking NFL quarterbacks. Professional football is one of the most popular sports in the United States and even those who aren’t a fan can probably name a NFL quarterback. And those that are fans probably have their own opinions about how good each quarterback is and how the various quarterbacks compare to each other. People not connected with the project are familiar with the basic data components and can even generate their own potential solutions and judge the solutions generated by others.

This fluency in the problem makes the methods and results more interesting to the audience because they can understand aspects of the algorithm and determine for themselves its efficacy. The audience doesn’t have to trust the authors that 5.6 is the optimal solution. And the results can even validate their own personal opinions (“I knew Peyton Manning was a better quarterback than Tom Brady!”). I am sure there are other reasons why sports is such a popular application of OR but regardless it is nice to see it generating enthusiasm and interest for OR.

This blog post is a contribution to INFORMS’ monthly blog challenge. INFORMS will summarize the participating blogs at the end of the month.

Posted in Uncategorized | Tagged , , | 2 Comments

A Side Effect of Optimization – A Spring Cleaning for Your Data

When executives and managers start to plan for new optimization projects within their operations, they usually consider the basics like hardware requirements, time to design and implement the model, and the need for user testing and acceptance. But data cleanup is an under-estimated, and often under-appreciated, necessity when implementing a new model, or even when just revamping a current one. Essential data may not be correct, may not be in a usable form, or may not even exist.

Bad or missing data can have a lethal effect on an optimization model. If data is missing  completely the model won’t be able to run. If data is incorrect, the model may give results that are valid based on the inputs but make no sense to the users. Model output problems caused by incorrect data can be particularly difficult to diagnose, especially if the initial assumption is that the data is complete, clean, and correct. This leads the OR team down a path of debugging issues in the model that don’t exist. Often the team will only check the data after exhausting all possibilities within the actual model.

This happened to me recently while working on a new MIP model to replace an older model currently in production. Since the previous model has been in use for years by the client, the team assumed the data was fine. After all, any data issues would have been found during the implementation and continued use of the previous model. However, the previous model had some logic flaws that caused it to sometimes return infeasible results. The users got accustomed to blaming the model for solution problems without investigating the actual cause of a particular issue.

When we started user testing on our new model, we took a structured approach to validating its logic. When the users reported an issue with the model’s output, we tracked the issue to the exact spot in the model’s logic so that we could fix it. Sometimes the issue wasn’t in the model’s logic but was instead in the data. One day we were tracking down an issue reported by the users when we realized that a key piece of input data for the model was not always accurate. In fact, when it was correct it was due to luck and not good data.

Since this was pretty fundamental data for the model, there was the obvious question of how the problem had not been identified in all the years that the previous model was in production. The answer is simply that the previous model’s logic issues masked the data issue. The users saw issues in the model’s output often enough that they became immune to them and just assumed any issues in a solution were the result of bad logic rather than bad data. Only when we began to rigorously identify the cause of reported issues did we start to see the data problems that had existed all along.

The moral of the story? There are two: 1) Never assume the data is complete, clean, and correct – always verify and 2) When starting a new OR project always leave time for data cleanup. Even data that has been used for years can have flaws.

Posted in Uncategorized | Tagged | 2 Comments

When The Correct Answer Isn’t Right Part 1: Objective

In a previous post I started discussing what to do when an optimization model’s correct answer isn’t the “right” answer according to the user. I am currently working on developing a scheduling MIP for a client that generates an optimal solution but the client did not feel it was the “right” solution. The users had a clear idea of certain tasks that should always be scheduled together, even if it potentially resulted in a sub-optimal schedule. In their minds, pairing the tasks was a requisite part of the schedule.

In my initial post, I discussed why it’s not good enough to be optimal; the model also needs to be “right”. Then I outlined a few areas I could investigate to help tune the model towards the “right” solution. Since then I have probed deeper into these areas and, with the help of the users, been able to tune the model’s optimal solution more towards the solution the users are looking for.

Today I’d like to share the results of my investigation into the first area: the objective function. The objective function defines the goal of a model so it’s important that the objective function clearly reflects the goals of the model. If a model’s goal is to reduce costs, minimizing the various costs of a model is the correct approach. But if the model’s goal is to increase profitability, an objective function that solely minimizes costs may be overlooking key aspects that impact profitability.

Talking with the client, I gathered more detail about the true objective of the model. Originally I had believed the goal of the model was to minimize the cost of the schedule. This cost is made up of several components, including the cost produce an item, the cost to use each machine, and a slack cost for not completing a task (otherwise the model won’t schedule any tasks because not producing tasks costs $0 without a slack cost).

What I learned from deeper discussions with the client was that the users felt that not scheduling together their paired tasks resulted in a lost opportunity cost. Although this wasn’t a concrete cost like the cost of resource time or construction materials, not incorporating these pairings led to a poorer schedule in the mind of the users. So I added an additional cost for each of these pairs of tasks that were not scheduled together. The cost was large enough to encourage the model to try to schedule them together whenever possible, but not so large as to force the model to not schedule other tasks to make room for the pairs. As a result, the model tends to schedule the majority of these pairs, although it is sometimes neither feasible nor globally optimal to schedule all of the pairs every time.

In the case where the objective function is mostly correct but is missing a key element or two, it is generally easy enough to add the missing pieces as long as there is enough information to determine their costs, distance, time, etc… This was the case for modeling the opportunity cost of not scheduling pairs of tasks together. But sometimes the basic goal of the objective function needs to be rethought, like when a model is minimizing cost when the real goal is to minimize empty miles. I ran into a similar situation with a scheduling model whose original objective function was maximizing the number of trips scheduled. With this objective, each trip was equally as important to schedule as every other trip.

During testing I realized that this objective function caused the model to prefer scheduling shorter trips rather than longer trips because more trips could be scheduled if they were all short. However, after talking about the results with the client, it became clear that the revenue from the trips was based on the mileage. This often made longer trips more desirable than a group of shorter trips. Clearly the model was solving towards the wrong goal. Instead of maximizing the number of trips scheduled, the model should maximize the revenue made from the trips. Longer trips with greater revenue should be given scheduling priority over shorter trips that generate less revenue.

These are just two examples of how the definition of the objective function can steer the model towards an optimal solution that isn’t the “right” solution. Most of the time, the objective function is pretty close to the model’s goal, but not quite 100% correct. Discussing actual and expected results with the users can help clarify the gaps between the objective function and the model’s actual goal and get the model back on track towards achieving the users’ goal. In a comment to my initial post, Paul Rubin reminded us of the importance of maintaining a dialog with the users. A modeler should not create a model and then just throw it over the wall to the users and move on to the next project. The modeler and the users should talk during all stages of the development to make sure the modeler understands the data, the model’s goal, and the users’ expectations. The model is a tool for the users and their insight is invaluable in the modeling process.

Up next: Part 2 Constraints

Posted in Uncategorized | Tagged | Leave a comment

Optimizing the House of Representatives

The other day I read an op-ed piece in the NY Times that was discussing the sizing for the US House of Representatives. I’m not really here to discuss the politics of resizing the House of Representative nor what happens within their chamber. There are many ways OR could help the House, but here we will discuss ideas for ways to optimize the size of the House and the districts that make it up.

The first step is to find the optimal number of seats to have in the House. There is only an upper bound set in the Constitution, no more than 1 for every 30,000 citizens. This gives any algorithm built to find the optimal size of the House a lot of freedom. We could look at statistical analysis to find what size groups work together best or look for the number of citizens a member should represent such that each individual’s vote is more meaningful and special interests would have less sway on the House.

The second step is to build the optimal districts, and I mean in the sense that districts are evenly represented across the state, not optimal in a political advantage sense. We could build a model that found the optimal districts for voting equality so that more districts are in play which would force the representatives to be held accountable for their voting practices. There would obviously be constraints on the lower and upper bound for constituents in each district and the districts would need to be contiguous. We could even break down each state into very small grids (like a city block) and use that as the base so that the districts would have to be regularly shaped (i.e. would need to be fairly square shaped) as to avoid any appearance of gerrymandering, using the properties of the population in each grid to satisfy our constraints on constituent makeup.

These are just a few potential ways OR could help improve the US House of Representatives, leave your comments below on any other (non-political) ways OR could help.

This blog post is a contribution to INFORMS’ monthly blog challenge. INFORMS will summarize the participating blogs at the end of the month.

Posted in Uncategorized | Tagged , , | Leave a comment

When The Correct Answer Isn’t Right

WordPress has an interesting feature that shows searches that have lead to your blog. Recently I saw a search for “what if the correct answer isn’t the right answer?” This search resonated with me because I have recently run into the same problem with a scheduling MIP model I am developing for a client. The model’s objective is to schedule as many of the available tasks as possible to minimize total cost while staying within resource constraints.

The basic functionality of the model is complete and we have started user testing with a power user to help us fine-tune the results. The model is generating good results using production-quality data, but one complaint we keep hearing is that certain task sequences desirable to the users are not found in the optimal schedule. So while the model is generating the correct answer based on its objective, constraints, and input data, the users are not 100% happy with the results. So what’s an Operation Researcher to do?

My first instinct was to try to prove to the users that the model’s globally-optimum solution is better than a schedule built around the desired task sequences. After all, if these sequences were really the best way to schedule these tasks, the model would choose them. Maybe I could show how far from optimality a schedule containing the desired sequences could be. I might even be able to show the effect each sequence has on the overall schedule. But this type of analysis is difficult to perform and even more difficult to present to users.

And even though they are presented with facts and data, users may still find it difficult to accept the model’s solution. This is a key issue because the success of a new model often comes down to one thing: Do the users accept the solution? If they don’t, no matter how fast, optimal, or cutting-edge the model is, the users are likely to work around it rather than with it.

Rather than trying to beat a square model into the users’ round hole, I decided to take a step back and reconsider some of the key aspects of the model, like the objective function and the constraints. Over the next few posts I’ll walk through a few of these keys areas that I took another look at as I continued to fine-tune my model towards my users’ needs. Hopefully I’ll find some ways to generate an optimal solution that will be more acceptable to the users.

Posted in Uncategorized | Tagged , | 4 Comments