What is happening when FileMaker Server becomes overloaded (and how to avoid it)

nicklightbody

9 years ago

Editor’s Note: Today I am pleased and honored to present the first in what I hope will become a series of articles by guest author Nick Lightbody of Deskspace Systems Ltd.

Summary: we will describe, discuss and illustrate the statistics that enable you to understand the why and how of FileMaker Server performance and suggest means of delivering a predictable and acceptable performance to your users.

Why this is important

FileMaker Server 13 is a wonderful and very reliable product, provided (as with any product) you recognise, understand and work within its limits.

However, Server is a binary product, in the sense that it either performs “good” or it performs “bad” — very slowly, but very reliably — as it grinds through its backlog until its load has reduced sufficiently for it to catch up on its queued calls and return to “good” mode.

The Deskspace server performance test shown in fig 1 illustrates a common scenario as the number of users increases and suddenly performance declines – dramatically.

fig1 – user numbers increase until Server chokes – suddenly and dramatically – with little warning.

There really is very little middle ground, so when you look at the server statistics and watch the graph crawling along the floor — thinking that you are not really using its full capacity — you may in fact be deluding yourself, as we will illustrate.

An understanding of what server hardware resource is required to ensure that a specific number of users receive a consistently good service is clearly essential but such information is — surprisingly — a little hard to come by.

FMI themselves suggest that Server – if you wish to use FM WebDirect – requires a separate CPU core — effectively a separate CPU — to handle each pair of concurrent remote calls efficiently in a smaller deployment and then gradually fewer cores as the server power and user numbers increase. It is not entirely clear what resource is required if one does not plan to deploy FM Webdirect.

fig2 – FMI Recommended Hardware Configurations for Server inc FileMaker WebDirect

That is a great deal of horse-power – but we should note that FMI have reduced their recommended cores per user to about half of what they advised just a few months ago.

And that’s a “remote call”, not a “remote user”. Depending on the complexity of what it is being asked to do, Server can handle quite a few calls a second, so it would be interesting to work out how many… and perhaps to relate the number of remote calls to calls per user?

Whilst FMI’s own technical recommendations are a good starting point they can appear more than a little conservative when compared with many people’s own experience — where often 4 cores appear to support 20 or more users. So what is going on?

Research

To investigate this we are using a method of testing FileMaker Server with “virtual clients” — server side scripts whose completion we do not await. Hence we can send off a series of autonomous scripts (each simulating a client using server) from a single client side UI, watch the CPU history and statistics in Server Admin Console, watch the event log recording the statistics for each transaction, and load up the Server to the point of near choking by adding or disconnecting virtual clients.

These current comments apply only to FMP or FMGo connections, and not yet to WebDirect which we will test on another occasion.

Using admin console statistics

The true load factor on FileMaker Server itself – ignoring for these purposes the load caused by slow networks and slow data storage i/o – is not the number of FileMaker clients but the number and frequency of remote calls, one of the 11 statistics observable in the FileMaker Server Admin Console under Statistics. For our investigation we need to turn on the following statistics — but remember that these numbers are each a sample of a single moment in Server’s operation (at the time the sample is taken), so turn the sample frequency in Admin Console up to every 3 seconds to get a better idea of what is happening, then turn it down again when not required since measuring anything also affects what you are measuring — in this case by creating load, so a higher frequency will slow down normal operation.

(1) Remote calls/sec – this represents the server load – each call being a significant set of instructions
(2) Remote calls in Progress – at any single moment – when the sample is taken – often zero
(3) Wait time (ms)/call – this shows the effect of load – the output – delivering the user experience

The point at which performance and hence user experience starts suffering is indicated by spikes in the Wait time (yellow) but is determined by the Remote Calls in Progress (pale blue) moving above the floor of the display and remaining there.

Choking on “busy” users

A typical situation is shown in fig 3, where “Busy” virtual users are being added rapidly to a 4 core core MacMini host. There is a small spike at 14:14, but after additional users arrive to bring the total to 15 at 14:15. At 14:16 the Remote Calls in Progress lifts off the floor and stays up with values of 12 – 18. The server has insufficient speed to recover until the load is significantly reduced, hence the queue is congested, server chokes, everything slows and nearly stops.

Choking is simply that: the rate of new calls on server exceeding its ability to deal — hence an ever increasing queue builds up which takes time to be dealt with by the Server and then, eventually, cleared.

This choking characteristic is why folk may be misinforming themselves when they look at the stats and think their server has much more spare capacity available than is in fact the case. The moment Remote Calls in Progress exceeds the number of cores available in the CPU, the risk of a suddenly escalating choke arises. The choke develops very rapidly — each delay multiplying further delays behind it — so the apparent surplus capacity disappears in an instance.

Supporting more less active users

However, if users with a lower level of activity are introduced, in this case “Fairly Inactive users”, server will support a much higher number, as shown in fig 4. Until that is many of them do something load creating at the same time, in which case congestion and a choke will arise but likely be short lived, because since the general load is low there is little load bearing on top of the congestion to escalate the choke.

Observations

We can observe from fig 3 that a level of below 50 Remote Calls / sec (dark blue) seems sustainable for this server, but that when the level moves above 50 the server moves beyond its ability to clear the backlog without significant delay. However, there must be more to it than that since in fig 4 at 15:08 Server suffers a minor congestion with Remote Call / sec below 30.

As the server has 4 cores we can make an initial theory that each core can safely handle 10 – 13 calls a second, but that when that capacity is exceeded choking will result. This clearly requires refinement based on the inconsistency in the preceding paragraph.

Getting the best out of FileMaker Server

The foundation of the performance we are observing is the speed and capacity of the server hardware (virtual or real) hosting FileMaker Server, the number of CPU cores, their speed in GHz, the amount of RAM in Gb, the amount of FMS cache you have selected in the FMS Admin Console, the efficiency of the operating system, the efficiency with which Filemaker Server itself deals with its work and uses the 64 bit architecture it has available and, finally, how well written is the solution/app you are running.

In order to get the best out of our deployment we can consider the following options:

A. ensure that we have a good idea of the load that will be created by our intended user cohort — perhaps split them into less active and busy users and estimate that two less active users are roughly equivalent to one busy user;

B. plan to support no more than say 4 busy users — or equivalent less active users — per CPU core with something like a mid range MacMini, provided you fit as much RAM as it will take, which is currently 8 or 16 Gb depending on the age of the machine;

C. provided our solution is well written and efficient consider using cloud based hosting on virtual machines so that server resources can easily be increased if required to cope with increased demand;

D. ensure that we are using the most up-to-date version of FileMaker Server available as this software becomes faster with every new version that is released;

E. consider writing a token controlled flow-control system to regulate the load that your solution/app applies to FileMaker Server. This is like trains travelling in opposite directions on a single line: a “train” (a load creating instruction to server) is only permitted to use the line when it has obtained a token. By restricting the supply of tokens we can control the number of calls being heaped upon Server, to protect it from being over burdened. This is what we have done and fig 5 illustrates the detailed performance we obtained when testing several WAN servers recently, including with token flow-control turned on and off.

fig 5 – detailed results of testing three different WAN servers

We assess deployments using a Productivity Index for the systems capacity (larger is better) and a User Experience Index (smaller is better) to predict the goodness — or otherwise — of the user experience. When we turned token flow-control off you can see (in the red boxes) that the Productivity declined from 51 to 45 and the User Experience declined from 25 to 410;

F. consider improving the efficiency of our solution/app — the less we ask FileMaker Server to do the more it will get done and solutions which were written or started many years ago will certainly be capable of being much improved with a more modern approach to designing and building. Things that took hundreds of steps a few years ago can now be done in very few with a commensurate reduction in server load. We need to simplify our solution and play to FileMaker strengths. Consider removing all features that are really not required by most users. It can be surprising at how much you can improve performance by just removing unnecessary scripting; relationships and layout objects;

G. we must of course use styles and themes well — bite the bullet and get rid of classic if we have delayed.

Nick Lightbody
17 December 2014

A career in international sailboat racing, followed by 15 years in the legal profession, before founding Deskspace Limited at the beginning of the new millennium, informs Nick’s approach to the conception, creation and iterative development of better products. He built his first FileMaker solution in early 1998. Since January 2011 he has been dedicated to developing a more efficient method of deploying FileMaker based Apps over the web — this has led since April 2012 to a focus on simplicity, reduced feature sets and speed.