Since the weekend I’ve been trying to work out why the code to upload data to my Cosm feed was falling over after random periods of time. Sometimes it lasted for a few hours before dying, but others it lasted just minutes. There was no readily visible pattern, so it was time to get Googling and also accumulate a bit more data so I could get a better handle on where the problem was occurring. All I could see from the Cosm feed graph was that all of a sudden no further updates were being received and the graphs flatlined. I had nothing to tell me if it was because I was no longer getting any data from the CurrentCost itself, or I was getting data but was failing to post it to Cosm via the EtherCard library.

Cosm data upload flatlines

To validate the data from the CurrentCost, I put in some additional debugging to check that values were still being received and sure enough, every six seconds or so it spat out a line which matched that on the CurrentCost’s display – so they were still being updated, but not being posted.

Not knowing what may be up inside a library that I’d never really used before, I added some more debugging to the console and additional metrics to the Cosm feed. The console now showed me the HTTP request going out and (through the recently added ether.tcpReply() function) the HTTP response coming back. I added current available memory (using the MemoryFree library from the Arduino Playground) and stash.size() & stash.freeCount() (the Stash is implemented in the EtherCard library and uses the RAM inside the ENC28J60 Ethernet controller as a scratchpad) to the Cosm feed, restarted and waited for the data to come rolling in. Sure enough, after twenty minutes or so, it died in a pathetic whimpering kind of way …

There was no apparent drop in available memory on the Arduino, nor was their a change in the Stash size (though I hadn’t expected there to be) but what was apparent was the way the freeCount() on the Stash dropped off and never recovered, until there were no free buffers. At this point, the HTTP debugs show that the outbound request becomes corrupted – which is why the Cosm feed stops updating. The requests still get sent, it’s just that they’re full of crap, so Cosm ignores them.

Death of the Stash

So what’s the solution?“, I hear you ask. Well, now that I had an idea of where the problem might be I tracked down a number of discussions of similar sounding problems, most of which seemed to be being experienced by users in the Nanode community (as the Nanode uses the same ethernet chip and thus the same library). Figuring that if the freeCount() kept dropping and never recovered, I could try re-initialising the Stash before it got into a terminal state. Taking inspiration from EtherCard::begin() I added the following just before I tried to use the Stash as a quick’n’dirty hack to see if I could fend off the crash;

if (stash.freeCount() <= 3) {
  Stash::initMap(56);
}

Success! Here we can see the stash.freeCount() dropping over time (third line in the graph) and then getting re-initialised … and all the while we’re still posting power and temperature readings with no flatlining, which was the original point of all of this, if you remember.

EtherCard library Stash::freeCount() drops over time then recovers

As always seems to be the way with these things, once I’d found a solution (of sorts) I was then able to do a more targeted Google and found a discussion on the nanode-users group where the same solution was mooted. Looking back over the debug log, it would also seem that the freeCount() dropped when a corresponding reply was not found for a request, as noted there by SomeRandomBloke.

I’ll tidy up the code with the new features in and post that up later on.

UPDATE (01/06/12): The problem with the Stash freeCount() dropping appears to be related to the frequency of updates; if I throttle it so that it only sends every second update to Cosm (i.e. once every 12 seconds instead of every 6) then the freeCount() never drops. Whilst re-initialising the stash is a bit of a hack, it’ll do just fine till Jean Claude is able track down the underlying cause of the problem.