my first IronPython program
After seeing the cools stuff Chris Anderson is doing with Python in his AvPad program, I dicided to take his advice and learn Python. I read an ebook on the flight to Redmond, and today I wrote my first program. It uses IronPython and the .NET framework to make a webspider that searches for mp3s and downloads them.
It works great on the one site (secret site) I tried it on, but I’m sure it will barf on anything else. Here’s the code anyways.
from System.Net import *
from System.IO import *
import System
import re
def GetPage(url):
s = WebRequest.Create(url).GetResponse().GetResponseStream()
sr = StreamReader(s)
page = sr.ReadToEnd()
sr.Close()
return page
def GetLinks(pagestring):
links = ()
rex = re.compile(\”href=\”([^\”]*)\”\”)
m = rex.search(pagestring)
while m != None:
links = links + m.groups()
pagestring = pagestring[m.end():]
m = rex.search(pagestring)
return links
def GetPageLinks(url):
return GetLinks(GetPage(url))
def FixURL(url):
return url.replace(\”&\”, \”&\”)
def FixFile(file):
return file.replace(\”%20\”, \” \”)
def DownloadFile(file, intofile):
intofile = FixFile(intofile)
s = WebRequest.Create(file).GetResponse().GetResponseStream()
input = BinaryReader(s)
Directory.CreateDirectory(Path.GetDirectoryName(intofile))
output = BinaryWriter(File.Open(intofile, FileMode.OpenOrCreate))
size = 1024*8
filepart = input.ReadBytes(size)
while filepart.get_Length() > 0:
output.Write(filepart)
filepart = input.ReadBytes(size)
output.Close()
s.Close()
def SpiderPage(baseurl, intodir):
urls = [baseurl]
for url in urls:
print \”Searching: \” + url
try:
links = GetPageLinks(FixURL(url))
except IOError:
print \”failed!!!\”
mp3s = [url + link for link in links if link[-4:] == ‘.mp3′]
newurls = [url + link for link in links if link[-1:] == ‘/’ and link != ‘/’ and link[:1] != ‘/’]
urls.extend(newurls)
for mp3 in mp3s:
print \” Downloading: \” + mp3
DownloadFile(mp3, intodir + mp3[len(baseurl):])
How’s it look? Anything that I could have done better?