You can determine the disk utilization per directory using the below


 sudo su

 du -ch /var/tmp/log

 du -ch /var/tmp/log/tenant-*/VSN0-*

 du -ch /var/tmp/archive

 du -ch /var/lib/cassandra



The diskspace is usually taken up the "/var/tmp/archive" folder, the below procedure should be followed to clean up the archives. If you find that the disk utilization in /var/tmp/log is on the higher side (say anything > 1G) please raise a case with TAC, it would be indicative of backlog build up and would need analysis. 


Note:  The log collector would push a copy of the log towards the DB and also maintain the copy in the /var/tmp/archive folder. DB takes care of cleaning up the older data using the "retention limit" - 90 days is the retention limit on analytics while 7 days is the retention limit for "search" logs.  The /var/tmp/archive folder would need to be manually cleared up using the below procedure, we don't auto-clear the archives because customer may need the archive data for auditing purposes.




 Transfer the archive to some external server using the below script.


 versa@versa-analytics:~$ sudo /opt/versa/scripts/van-scripts/log-archive-transfer.py --src /var/tmp/archive --dst /var/tmp --dst-host 10.192.84.112 --user versa


 where 10.192.84.112 is an external server which is accessible from the analytics node.


Now that we have transferred the archive files to an external server, we can clean-up the archive locally on the node


 You can copy the below script into the machine and run it to delete anything older than X days 

 

To run the script

 sudo /var/tmp/tar-del.sh <start> <end>

 For example: 30 to 365 days ago

 sudo /var/tmp/tar-del.sh 30 600               <<< this would delete all archive data >30 days upto 600 days


You will have to create the tar-del.sh file first, please copy/paste the below lines in tar-del.sh file - you can create this file in /var/tmp folder.


 versa@analytics1:/var/tmp$ cat tar-del.sh 

 #!/bin/bash 

 startday=$1

 endday=$2

 tmp1="/tmp/tmp_$$.txt" 

 for tenant in /var/tmp/archive/tenant-*; do

 echo $tenant

 for vsn in $tenant/VSN0-*; do

 echo $vs

 cd $vsn

 for i in `seq $startday $endday`

 do

 date_point=`date +%Y%m%d -d "$i days ago"`

 echo $date_point

 echo "Deleting files from $tenant $vsn for date $date_point" >> $tmp1

 # find . -name "$date_point.*.gz" -print >> $tmp1

 # to delete

 find . -name "$date_point.*.gz" -type f -delete 

 done

 done

 done

 # comment this out later

 exit

 rm -rf $tmp1