Performance considerations
Multipliers
Processing the Raedfast model begins with your baseline data, and greatly multiplies it — through multiple years, multiple model steps, multiple scenarios. A single baseline record pulled from ActivityHistory into ActivityModel can easily generate a hundred records of model output — as it is multipled by, say, five years, four scenarios, and an average of five applicable rules in each year.
Generated model output then can become very large, and processing times can become seriously extended.
Outlined below are some of the ways you can minimise model size and processing time.
How Raedfast holds data
First it will be helpful to understand how Raedfast stores the data.
Raedfast automatically aggregates data on load, and at each model step. This means that the system does not hold multiple records with the same dimension values, but aggregates identical records into a single record, summing up the measure values as it does so.
Raedfast also does not store all-zero records — that is, records of which all the measures are zero.
Eliminating unnecessary detail
One way to speed up model processing is to load less data into your baseline.
Baseline data is multiplied in the model through steps, scenarios and years. So the more baseline record you have the more you are multiplying up.
Since Raedfast automatically aggregates data, the higher the level of your baseline data the more it will aggregate and the fewer records it will be stored in.
So do not load data at a lower level than you need.
There are two criteria of need:
you need a given distinction in order to define modelling rules in the appropriate terms
you need a given distinction in order to analyse model output in appropriate terms
If you do not need a given level of detail for either of the above purposes, consider omitting it from the model baseline. On these grounds you may choose to omit a dimension entirely (omitting HRG for example), or you may choose to define a dimension at a higher level (loading only the first four characters of HRGs for example), or you may choose to include certain elements of a dimension which are of particular interest and roll the rest up to say ‘Other’ (for example, you may wish to keep delivery HRGs but roll the others together).
Adding detail later
You should resist the temptation to model in too much detail. It is very easy to change a Raedfast model further down the line, so we recommend that you start at a higher level and add more detail if and when you find you need to.
Compressing your baseline data
Raedfast’s two baseline functions (BaselineFromHistory and BaselineFromModel) allow you to compress the data you pull in as your baseline, by a percentage you choose.
This facility in effect samples the data randomly. If for example you choose to compress the data by 90%, Raedfast will randomly take one record in ten, and multiply the measure values of these records by 10. The result is that the volume of data being processed in the model will be one tenth of the total, but the measure values should be close to the values the entire data set would have generated.
This facility is particularly useful for developing and testing scenarios quickly. When you are ready to publish your scenario you can if you wish turn off compression and process with the full baseline data set.
Lumpiness of data
Note that the data in the ActivityHistory table is aggregated, and the sample taken is therefore from aggregated data. A very common group of patients might be aggregated to a handful of records, all of which might be missed by the sampling process. Conversely an outlier case (perhaps an exceptionally long stayer) might be picked up by sampling and multiplied.
You can check the effect of compression by comparing a compressed scenario to an uncompressed.
Live use of compression
That said, if the data is sufficently granular and compression does not distort it much, you might consider running your live model with compressed data, given that even the uncompressed baseline data set is only a sample taken from history.
And if the compressed data is lower or higher than the uncompressed overall, you can use Raedfast functions to adjust it up or down.
Allocating resources in the baseline year only
You can save processing time by allocating resources (using the AllocateResource function) in the first year of the model only.
Since your model will normally be carrying the total of each year’s steps forward into the baseline of the next year, allocated resources such as Beds and Theatres will be carried forward too, so you do not need to wait for these to be recalculated in every subsequent year.
Reallocating at need
You do however need to watch for modelling changes in subsequent years which should properly affect resource allocation. For example, if you move activity from elective to day case provision, you may need to reallocate Beds for that body of patients in that year, as the existing allocation will be based on the original Elective POD.
Keeping the original allocation
In some cases however you may want to leep the original resource allocation. For example, you may judge that elective cases moved to day surgery should keep their original Theatres attribution, as these cases are likely to be more complex on average than the typical day surgery case.
Summary
So the simplest, safest but slowest approach is to allocate resources in every year. The faster way, which is generally recommended, is to allocate resources in the baseline year and let them roll forward, being careful to reallocate when necessary.
Omitting some years from the model
If you are modelling many years ahead you may not need to enter assumptions for every year. For example, in a twenty year model the only year-specific assumptions to be entered into many years may be assumptions about demographic change.
So you might wish to model years 1, 2, 3, 4 and 5 explicitly, then year 10, then year 15 and finally year 20. Changes pertaining to intervening years not modelled separately can be expressed as assumptions relating to the years which are modelled. So a percentage for demographic change in year 10 would represent the cumulative change in years 6 to 10, and so on.
Splitting scenarios across years
You may find that your assumptions naturally split into logical groups. For example:
baseline data cleaning
fairly firm assumptions about short-term change
more speculative and variable assumptions about the longer term
This being so, it may be advantageous from the point of view of model processing time to put each group into separate scenarios. You might have a baseline scenario which cleans the baseline data. This might then provide the baseline for one or two scenarios which model short-term changes. And these latter might in turn provide the baseline for a larger set of scenarios modelling different assumptions about the longer term.
Using this approach, you can process your single clean baseline scenario and leave it. Then you can work on your short-term scenarios without waiting for the data cleaning to run all the time. And finally you can work on your longer-term scenarios without waiting for the data cleaning and short-term scenarios to run every time you make a change.
Then when you come to analyse the output you can create a rollup of the data cleaning scenario, one of the short-term scenarios, and one of the longer-term scenarios linked to it, so that the impact of all three is added together and you can refer in your analysis to a single roll-up scenario encompassing data cleaning, short term assumptions and long-term assumptions.
Collapsing certain dimensions after year X
You may wish to consider reducing the detail of the model in later years.
Your assumptions about (and analysis of) the near future may need to be more detailed than you assumptions about later years, for which your assumptions may be more broad-brush. This being so, you may be able to reduce the volume of data in the model, and therefore reduce the processing time, by taking out some redundant detail from later years. For example, you may need to hold data by HRG for the first few years, but not for later years. So your baseline step for year 5 say might pull the total of all steps for year 4, as normal, but pull the data into a single HRG element (Hrg NA perhaps), so that for year 5 and subsequent years distinction by HRG will not be available.
This approach need not involve collapsing an entire dimension to a single element. It might involve collapsing the dimension into a smaller number of elements — for example, you might collapse the Age dimension into just Adult and Child.
You can use the BaselineFromModel function to collapse dimensions in this way, or you can use the Move or the Copy functions.
Generating new records
Generating new bodies of activity within the model obviously increases the volume of data, and thereby increases processing time.
New activity may be generated for two principal reasons:
to model activity not available in historical baseline data, by copying available data which has similar characteristics in terms of model dimensions — for example GP consultation data might be modelled by copying outpatient data; or activity for a hospital for which one does not have data might be approximated by copying a hospital for which one does have data, but changing the Commissioner and GpPractice values. In such instances one is creating wholly new cases.
to model the splitting of existing cases into two parts, delivered by different providers or points-of-delivery — as when some bed days are moved out to the community, or bed days are split between general wards and critical care, assessment and so on.
Approaches to modelling resource consumption
The first of these is straightforward, but the second is more complicated, as there are two possible approaches. Take x-ray for example. You can handle this by creating a new measure called x-ray, and copying a percentage of activity from the cases measure to this new measure. The new measure will lengthen the ActivityModel record and therefore increase the size of the model. Or you can create a new sub-pod called x-ray, and copy a percentage of activity from other sub-pods to this one. This latter approach obviously increases the size of the model by generating new records.
Which approach to choose depends on a number of factors:
impact on model size and processing time
ease of analysis of outputs
consistency with the approach taken in related issues
If you are copying a large volume of activity to a new sub-pod then it would be more efficient to use a measure. If on the other hand only a small volume of activity would be copied, a sub-pod will be more efficient.
As regards ease of analysis, the drawback of creating many measures is that you cannot in Analysis Services treat measures like a dimension — you must select each measure you want to see individually, and you cannot within the scope of the pivot table create a sum of several measures as you could sum several sub-pods, (although you can of course use Excel formulae to add the measure together outside the pivot table). So if you have only a single x-ray category, then you should probably create a measure for it called say ImagingRooms, rather than use a sub-pod. If however you wish to distinguish six different types of imaging then you might prefer to create the single ImagingRooms measure and use sub-pods to distinguish the six types, rather than create six different measures.
In respect of consistency, if you use a measure to hold beds in a certain type of ward — say critical care — you will probably want to take the same approach for other ward types, such as assessment.
If you do make extensive use of sub-pods, consider collapsing unnecessary dimensions when you copy data into them, so that the additional activity generated through the sub-pod will aggregate to a higher level and consume less space. With this approach use of sub-pods may be more efficient than the user of measures.