When the Data Changes Before the Model Even Finishes Solving

This month’s INFORMS blogging challenge topic is OR and analytics. Last Wednesday I was thinking seriously about this topic, in particular how OR and analytics, although separate fields, can benefit from each other, when I saw Michael Trick’s post outlining what each field can learn from the other. His analysis was very insightful and one of his points really hit home with me — #1 of what Operations Research can learn from Business Analytics:

It is not just the volume of data that is important:  it is the velocity.  There is new data every day/hour/minute/second, making the traditional OR approach of “get data, model, implement” hopelessly old-fashioned.  Adapting in a sophisticated way to changing data is part of the implementation.

Data volumes are increasing, data is coming from multiple sources in a variety of forms and states of completeness and cleanliness, and data is constantly changing. I am starting to see the effect ever-changing data can have on optimization more and more as my clients adapt the way they use our models . Many of our models are operational scheduling models that plan for a 24-48 hour planning horizon which generally starts a day or two in the future. The model sets the schedule for this future time period and was often designed to run only once a day. But when you plan this far in advance, things are bound to change. New tasks or assignments are added, existing ones are modified or may even be removed. How does this updated data affect the model’s results? Can it be incorporated into the current solution or is a completely new solution needed? What do we do when a model requires 30 minutes or an hour to solve but the data changes every minute? These needs are often not captured in the original business requirements for the optimization model but need to be addressed if the model is going to be effective in a real-time environment with volatile data.

Sometimes the model solves fast enough and schedules far enough in advance that it can be run continuously with data updates incorporated into each new solve. However this can result in solutions that change dramatically with each run which can be disruptive in a business environment. Consider a model that schedules workers for shifts. After the first run, a worker could be scheduled for a 8am shift. But after the next model run, the solution now has the worker scheduled for a 8pm shift. This is a pretty significant change. It also prevents the users from being able to notify the worker about his/her upcoming scheduled shifts because the schedule is constantly changing. One way that we have mitigated this problem is to place a higher value on the existing schedule in the objective function which prevents the optimization model from changing the current solution unless the change would result in substantial savings or benefits.

It may not be possible to continuously run an optimization model through a “full” solve because of a lengthy run time. One of our scheduling models essentially solves a set partition problem where the bulk of the model’s processing time is spent defining the feasible sets and only a fraction of the time is actually spent solving the optimization problem. In this case we need two modes: “full” mode and “update” mode. Full mode generates the entire space of feasible sets and then solves the resulting optimization problem. The model then switches to update model where it modifies, adds, and removes sets based on any data modifications that have occurred and solves the new optimization problem. These updates are significantly faster than regenerating all of the feasible sets so update mode runs in a fraction of the time that full mode requires.  We offset an initial long run-time with subsequent quick updates.

Finally, rather than attempting to retrofit an existing optimization model built to handle a static data set, we have started to assume that our models need to be capable of incorporating fluctuating data and design this into the models from the outset rather than wait for the client to ask for this ability down the road. It is much easier to design and build flexibility into an optimization model from the start than to try to add it at a later date. Our clients’ business needs are constantly evolving and we are working hard to anticipate their future needs and build Operations Research tools that evolve with them. We must adapt to the changing frontier of data: more data from multiple sources that changes frequently and often has not been “sanitized” for use in an optimization model.

Are you seeing an increased need to incorporate changing data into your Operations Research models, and if so, how are you handling this new and difficult requirement?

This blog post is a contribution to INFORMS’ monthly blog challenge. INFORMS will summarize the participating blogs at the end of the month.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s