Tuesday, October 13, 2015

Checking file hashes against Palo Alto Network's Wildfire to find their verdicts

I had a list of files I needed to check to see if they were malware. There are a few ways to approach this. The first is to upload each file manually one at a time onto the wild fire portal and have wildfire check to see if they're either benign or malicious.

Or automate this to into the simplest and most efficient way. I chose the latter.

The first thing I did was do a hash checksum of all the files. I chose md5 hashes for this example, but you can do sha256 as that is supported by wildfire as well.

On a linux machine I issued the following command on some files.

user@ubuntu-vm:~/$ md5sum *
1842b2365bf67121462d0cc026fb9300  Test.pdf
cd75d3e263ff0d1d13aad24cdb9f2593 flashplayer19_ha_install.exe

Now that I have those hashes I put them into a text file. I also added a few hashes I knew were malware just verify my script.

Next I used pan-python to create my script that would query Wild Fire. Now there are two ways to approach this. One is to send each hash one at a time and get the results back for each query. If you send each request one at a time, you'll use up your daily allotment of API queries
the more hashes you have. The other method is to send a  bulk query. This would reduce the number of queries you have, but you only get to do a bulk of 500 hashes at a time.

Here's the code that I used.

import csv
import pan.wfapi

apikey = 'YOUR_API_KEY'
# loop through all the hashes and put this into a List
with open("sample-hashes.txt", "r") as ins:
  rows = csv.reader(ins)
  L= []
  for hash in rows:
    print hash[0]

#make an API query to WF and do a bulk check
  WF = pan.wfapi.PanWFapi(api_key = apikey)
  print('hashes %s submitted' % L)

#print the xml response
  xml_response = WF.response_body
  print xml_response

I imported the python csv library to read my file that contained all my hashes. The reason I used csv was because I orginally recieved the hashes as a csv file with each filename associated with the hash value as you can see above there are filenames after each hash. So in this demo, I cropped out all the other fields and only used the hashes. If the file was delimited, I could read only the row with the hash value.

Then I imported the pan-python library that would allow me to run the query.

The loop is used so I can append to a list with all the hashes I found in the text file.
Then I would make a query to wildfire using the API key.

The api key can be found in your wildfire account on the wildfire portal.

$ python bulk-wf.py
hashes ['1842b2365bf67121462d0cc026fb9300', 'ad9e1502d3fd341608fa4730a1609f8d', 'bf1373d10842e96c85bf73a97ddec699', '36944ab907576c10d217911ee6acc3c9', '4b20dd78c13433f4ec47853bfecddc61', '7309a9b75819dfd1496391fc75016d90'] submitted


A verdict of 1 means the file is malicious, while a verdict of 0 means the file is benign.

If the verdict was -102 that means that the file is unkown and I would have to upload it to wildfire to have it further examine.
 I could make some more logic in my script to print the actual verdict, but it would require me to manipulate the response code. I would need something like the library xmltodict so I could specifically retrieve and do a evaluation only the verdicts.

Now this is good to cut down the amount of traffic that would be needed to be sent across a network and evaluate known good or known bad files in a very fast way. Then only when you need to upload the unkowns you can then limit that number of files to a smaller group.

No comments:

Post a Comment