Tuesday, October 13, 2015

Checking file hashes against Palo Alto Network's Wildfire to find their verdicts

I had a list of files I needed to check to see if they were malware. There are a few ways to approach this. The first is to upload each file manually one at a time onto the wild fire portal and have wildfire check to see if they're either benign or malicious.

Or automate this to into the simplest and most efficient way. I chose the latter.

The first thing I did was do a hash checksum of all the files. I chose md5 hashes for this example, but you can do sha256 as that is supported by wildfire as well.

On a linux machine I issued the following command on some files.

user@ubuntu-vm:~/$ md5sum *
1842b2365bf67121462d0cc026fb9300  Test.pdf
cd75d3e263ff0d1d13aad24cdb9f2593 flashplayer19_ha_install.exe

Now that I have those hashes I put them into a text file. I also added a few hashes I knew were malware just verify my script.

Next I used pan-python to create my script that would query Wild Fire. Now there are two ways to approach this. One is to send each hash one at a time and get the results back for each query. If you send each request one at a time, you'll use up your daily allotment of API queries
the more hashes you have. The other method is to send a  bulk query. This would reduce the number of queries you have, but you only get to do a bulk of 500 hashes at a time.

Here's the code that I used.



import csv
import pan.wfapi


apikey = 'YOUR_API_KEY'
# loop through all the hashes and put this into a List
with open("sample-hashes.txt", "r") as ins:
  rows = csv.reader(ins)
  L= []
  for hash in rows:
    print hash[0]
    L.append(hash[0])

#make an API query to WF and do a bulk check
  WF = pan.wfapi.PanWFapi(api_key = apikey)
  WF.verdicts(hashes=L)
  print('hashes %s submitted' % L)

#print the xml response
  xml_response = WF.response_body
  print xml_response


I imported the python csv library to read my file that contained all my hashes. The reason I used csv was because I orginally recieved the hashes as a csv file with each filename associated with the hash value as you can see above there are filenames after each hash. So in this demo, I cropped out all the other fields and only used the hashes. If the file was delimited, I could read only the row with the hash value.

Then I imported the pan-python library that would allow me to run the query.

The loop is used so I can append to a list with all the hashes I found in the text file.
Then I would make a query to wildfire using the API key.

The api key can be found in your wildfire account on the wildfire portal.



$ python bulk-wf.py
1842b2365bf67121462d0cc026fb9300
bf1373d10842e96c85bf73a97ddec699
36944ab907576c10d217911ee6acc3c9
4b20dd78c13433f4ec47853bfecddc61
7309a9b75819dfd1496391fc75016d90
hashes ['1842b2365bf67121462d0cc026fb9300', 'ad9e1502d3fd341608fa4730a1609f8d', 'bf1373d10842e96c85bf73a97ddec699', '36944ab907576c10d217911ee6acc3c9', '4b20dd78c13433f4ec47853bfecddc61', '7309a9b75819dfd1496391fc75016d90'] submitted

<wildfire>
    <get-verdict-info>
        <sha256>6864e1fa5e0145c2f1ce6f403a3554fe9576287929b0e9e4e5fadb50915bb65e</sha256>
        <verdict>1</verdict>
        <md5>36944ab907576c10d217911ee6acc3c9</md5>
    </get-verdict-info>
    <get-verdict-info>
        <sha256>93cbeff02c16e7e09e41aa94ee37a3dae51849f14d335485fc936297b400ce04</sha256>
        <verdict>1</verdict>
        <md5>4b20dd78c13433f4ec47853bfecddc61</md5>
    </get-verdict-info>
    <get-verdict-info>
        <sha256>5683cc43393bfd01b5533a3c710c39d62387cbd5bdf9588f8b3c1dc13933473c</sha256>
        <verdict>1</verdict>
        <md5>7309a9b75819dfd1496391fc75016d90</md5>
    </get-verdict-info>
    <get-verdict-info>
        <sha256>fd54decc2b89c9ca00f4e6de39a1a9d677bd1c5a8ceb6d6b5b25eedc8d332e28</sha256>
        <verdict>0</verdict>
        <md5>1842b2365bf67121462d0cc026fb9300</md5>
    </get-verdict-info>
<\wildfire>


A verdict of 1 means the file is malicious, while a verdict of 0 means the file is benign.

If the verdict was -102 that means that the file is unkown and I would have to upload it to wildfire to have it further examine.
 I could make some more logic in my script to print the actual verdict, but it would require me to manipulate the response code. I would need something like the library xmltodict so I could specifically retrieve and do a evaluation only the verdicts.

Now this is good to cut down the amount of traffic that would be needed to be sent across a network and evaluate known good or known bad files in a very fast way. Then only when you need to upload the unkowns you can then limit that number of files to a smaller group.

No comments:

Post a Comment