Real time data displayed on this website have seen interruptions to the updating of the data on a number of occasions. These ongoing issues should now be solved with the introduction of an automated process to deal with this. Most recently an 8 hour outage occurred during the overnight hours. Measures are in place to deal with these issues, including automated monitoring of data reaching the website in a timely manner.
These issues started with the introduction of a Weatherlink Live data capture and streaming device used to collect the data from our on-site sensors measuring the weather. This device was brought into use in order to expand the data in collect and publish, which was not possible for technical reasons with its predecessor.
Due to this system relying on our residential home network (including wi-fi connectivity) to a greater deal than the previous implementation of the weather system there are occasional temporary dropped connections between the Weatherlive Live device and the Raspberry Pi computer used to collate, store and upload the data within our network. As a consequence of this, sometimes the Raspberry Pi computer loses connection to the Weatherlink Live device, and doesn’t regain a successful connection.
The impact of this is the software continues to run and upload data that have not updated correctly. In the most recent outage the data rolled over to the new day, but continued to used data from the previous day.
Usually, however a reboot of the Raspberry Pi computer fixes the issue. However if the Raspberry Pi has lost connection to the network, then it makes more difficult to reboot it, as the Raspberry Pi operates in a headless configuration with no keyboard, mouse or monitor. Usually cutting power to the computer and turning it back on has been the only option to get the system back up and running.
So in view of this, a bash script was created to manage these issues in an automated manner. The script handles a) the data collection software is not running because it crashed for some reason, and b) the data collection software is running but the data is not current because it lost connection within the network. When these situations occur the software is shut down, and the system rebooted. The script is ran as a scheduled cron task every 5 minutes on the Raspberry Pi computer. The Raspberry Pi computer on boot up is configured to auto-start all of the required systems, thereby getting the data back online. The scripts allows for the time periods for these two time periods to be defined, and to write the actions taken to a log file when triggered in the script.
The bash script is as the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
#!/bin/bash #Declare script variables CMX_INSTALL_FOLDER='/home/pi/CumulusMX' CMX_DATA_FOLDER='/data/' CMX_DIAGS_FOLDER='/MXdiags' CMX_DATA_STOPPED_AGE_TRIGGER_MINUTES=5 CMX_PROGRAM_STOPPED_AGE_TRIGGER_MINUTES=15 SCRIPT_LOG_FILE='/home/pi/.local/CMXdataStopCheckLog.txt' #PROD_{START;END}_TIME defines the time of data to permit running the script. #Restarting CMX near the daily rollover can causes data integrity issues PROD_START_TIME="001500" PROD_END_TIME="234500" #Determine if script should run based on the time range CURR_TIME=`date +"%H%M%S"` if [[ $CURR_TIME -ge $PROD_START_TIME && $CURR_TIME -le $PROD_END_TIME ]]; then CONT_WITH_SCRIPT=1 else exit fi #Define current month/ year and file path to log file CURR_MONTH=$(date +%b) CURR_YEAR=$(date +%y) LOG_FILE_NAME="${CURR_MONTH}${CURR_YEAR}log.txt" CURR_LOG_FILE="${CMX_INSTALL_FOLDER}${CMX_DATA_FOLDER}${LOG_FILE_NAME}" DIAGS_LOG_FILE="${CMX_INSTALL_FOLDER}${CMX_DIAGS_FOLDER}" #Determine epoch time CURR_TIME_SINCE_EPOCH_SECONDS=$(date +%s) #Determine age of diags file since Epoch time in seconds DIAG_FILE_MODIFIED_SINCE_EPOCH_SECONDS_FLOAT=$(find "$DIAGS_LOG_FILE" -type f -printf "%T@\n" | sort | tail -1) DIAG_FILE_MODIFIED_SINCE_EPOCH_SECONDS=$(printf "%.*f\n" "$p" $DIAG_FILE_MODIFIED_SINCE_EPOCH_SECONDS_FLOAT) DIAG_FILE_LAST_MODIFIED_SECONDS=$(($CURR_TIME_SINCE_EPOCH_SECONDS-$DIAG_FILE_MODIFIED_SINCE_EPOCH_SECONDS)) #Determine if CMX is not running AND have been stopped for too long - shut down and restart the system if [ $DIAG_FILE_LAST_MODIFIED_SECONDS -gt $(($CMX_PROGRAM_STOPPED_AGE_TRIGGER_MINUTES*60)) ] then echo "Not running for at least $CMX_PROGRAM_STOPPED_AGE_TRIGGER_MINUTES minute/s. Shutting down CMX and restarting system" if pidof mono >/dev/null then MONO_RUNNING_FLAG=1 else echo "Kill mono" $(killall mono) fi printf "$(date +'%d/%m/%Y %H:%M:%S'): Shutting down CMX and restarting system \n" >> $SCRIPT_LOG_FILE echo "Reboot" $(sudo reboot) fi #Determine if CMX is running if [ $DIAG_FILE_LAST_MODIFIED_SECONDS -gt $(($CMX_DATA_STOPPED_AGE_TRIGGER_MINUTES*60)) ] then #echo "Not running" CMX_STOPPED_FLAG=1 exit else #only proceed if CMX is running - can't check for data stopped if CMX is not running! CMX_STOPPED_FLAG=0 #echo "Running OK" fi #determine if data has stopped by the age of modifed date of log file since Epoch time in seconds LOG_FILE_MODIFIED_SINCE_EPOCH_SECONDS=$(date -r $CURR_LOG_FILE "+%s") LOG_FILE_LAST_MODIFIED_SECONDS=$(($CURR_TIME_SINCE_EPOCH_SECONDS-$LOG_FILE_MODIFIED_SINCE_EPOCH_SECONDS)) #Set Data stopped flag according to age of last modifed date #if data has stopped shut down CMX and restart the system if [ $CMX_STOPPED_FLAG -eq 0 ] && [ $LOG_FILE_LAST_MODIFIED_SECONDS -gt $(($CMX_DATA_STOPPED_AGE_TRIGGER_MINUTES*60)) ] then #DATA_STOPPED_FLAG=1 echo "CMX data have stopped but CMX is running. Shutting down CMX and restarting system" printf "$(date +'%d/%m/%Y %H:%M:%S'): CMX have stopped updating data for at least $CMX_DATA_STOPPED_AGE_TRIGGER_MINUTES minutes. CMX is running. \n" >> $SCRIPT_LOG_FILE if pidof mono >/dev/null then MONO_RUNNING_FLAG=1 else echo "kill mono" $(killall mono) fi printf "$(date +'%d/%m/%Y %H:%M:%S'): Shutting down CMX and restarting system \n" >> $SCRIPT_LOG_FILE echo "reboot" $(sudo reboot) else #Data updating OK DATA_STOPPED_FLAG=0 #echo "Data updating OK" #exit fi |
Whilst these measures will need monitoring over time, and with possible tweaks/ or improvements, this should help with increasing the availability of the data on this website.
If you find this type of blog post of value, please free free to comment or share on this post. You can also contact us if you have any questions directly related to the information on this website.