You can determine the disk utilization per directory using the below
sudo su
du -ch /var/tmp/log
du -ch /var/tmp/log/tenant-*/VSN0-*
du -ch /var/tmp/archive
du -ch /var/lib/cassandra
The diskspace is usually taken up the "/var/tmp/archive" folder, the below procedure should be followed to clean up the archives. If you find that the disk utilization in /var/tmp/log is on the higher side (say anything > 1G) please raise a case with TAC, it would be indicative of backlog build up and would need analysis.
Note: The log collector would push a copy of the log towards the DB and also maintain the copy in the /var/tmp/archive folder. DB takes care of cleaning up the older data using the "retention limit" - 90 days is the retention limit on analytics while 7 days is the retention limit for "search" logs. The /var/tmp/archive folder would need to be manually cleared up using the below procedure, we don't auto-clear the archives because customer may need the archive data for auditing purposes.
Transfer the archive to some external server using the below script.
versa@versa-analytics:~$ sudo /opt/versa/scripts/van-scripts/log-archive-transfer.py --src /var/tmp/archive --dst /var/tmp --dst-host 10.192.84.112 --user versa
where 10.192.84.112 is an external server which is accessible from the analytics node.
Now that we have transferred the archive files to an external server, we can clean-up the archive locally on the node
You can copy the below script into the machine and run it to delete anything older than X days
To run the script
sudo /var/tmp/tar-del.sh <start> <end>
For example: 30 to 365 days ago
sudo /var/tmp/tar-del.sh 30 600 <<< this would delete all archive data >30 days upto 600 days
You will have to create the tar-del.sh file first, please copy/paste the below lines in tar-del.sh file - you can create this file in /var/tmp folder.
versa@analytics1:/var/tmp$ cat tar-del.sh
#!/bin/bash
startday=$1
endday=$2
tmp1="/tmp/tmp_$$.txt"
for tenant in /var/tmp/archive/tenant-*; do
echo $tenant
for vsn in $tenant/VSN0-*; do
echo $vs
cd $vsn
for i in `seq $startday $endday`
do
date_point=`date +%Y%m%d -d "$i days ago"`
echo $date_point
echo "Deleting files from $tenant $vsn for date $date_point" >> $tmp1
# find . -name "$date_point.*.gz" -print >> $tmp1
# to delete
find . -name "$date_point.*.gz" -type f -delete
done
done
done
# comment this out later
exit
rm -rf $tmp1