Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Photobucket gallery downloader
#1
This is just a script I made that will download every image in a public photobucket gallery.

[Image: 2lawzm.png]

just enter a directory and a photobucket gallery link

Code:
import urllib, re, os, xml.parsers.expat
imgs = []
DLbarLen = 40
def mkdir():
    global DLlocation
    DLlocation = raw_input('Enter download folder:\n')
    if DLlocation:
        root,  tail = os.path.splitdrive(DLlocation)
        os.chdir(root+os.sep)
        try:
            os.makedirs(DLlocation)
            os.chdir(DLlocation)
            DLlocation = os.getcwd()
        except:
            try:
                os.path.exists(DLlocation)
                os.chdir(DLlocation)
                DLlocation = os.getcwd()
            except:
                print "Entered bad pathname."
                mkdir()
    else:
        print "Did not enter pathname."
        mkdir()
def getimgs():
    def start_element(name, attrs):
        if name == 'media:content':
            imgs.append(attrs['url'])
    site = raw_input('Enter photobucket gallery:\n')
    site = re.split('[a-zA-Z0-9_.?=-]*$',site)[0] + 'feed.rss'
    print '\n'
    parse = xml.parsers.expat.ParserCreate()
    parse.StartElementHandler = start_element
    gallery = urllib.urlopen(site)
    source = gallery.read()
    parse.Parse(source, 1)
    total = len(imgs)
    imgName = re.compile('[a-zA-Z0-9_-]+\.(jpeg|gif|png|jpg)$')
    for url in imgs:
        pos = imgs.index(url)+1
        img = imgName.search(url)
        file = img.group()
        barnum = pos*DLbarLen/total
        blanknum = DLbarLen-barnum
        bars = '='*barnum
        blanks = ' '*blanknum
        if pos == total:
            print '\033[ACompleted.'+' '*55
        else:
            print '\033[ADownloading image '+str(pos)+' of '+str(total)
        print "["+bars+blanks+"]"+"\033[A"
        DLpic(url, file)
    print "\nFinished downloading all "+str(total)+" images in gallery!\033[A\n"
def DLpic(url,filename):
    sock = urllib.urlopen(url)
    img = sock.read()
    location = open(DLlocation+os.sep+filename, "wb")
    location.write(img)
    location.close()
mkdir()
getimgs()
what im working on: http://pastebin.com/m672e6491
im gonna parse PB feed.rss's for all media:content url attributes, and for normal pages parse all a href attributes that end with a .jpg
[Image: sig.php]
Reply
#2
Awesome script ^___^ Great job Nyx-
[Image: nv70ad.png]
Terrorcore, unleash, extermination
Hyper real, cold blood, determination
fudge them, I like this sensation
Incredible, I from the annihilation
Reply
#3
hehe thanks fallen ^__^ i've updated it alot today. hopefully soon it will be useful for multiple image galleries hosting methods
[Image: sig.php]
Reply
#4
Thats...


Reply
#5
Looks pretty neat, would be good in a GUI, good job.
Reply
#6
Nice script Nyx,
I've never visited PB, but will now for sure.
Waiting for the Updates! Big Grin
Reply
#7
That is some sexy script. Glad to see your enjoying what you've been learning. lol
Reply
#8
Hey does anyone in here have ideas for my problem:

Alot of the script is hardcoded, and im trying to stop that, so can anyone think of ideas of how to tell which images are inside a gallery, and which are not. Photobuckets gallery images start with th_(thumbnail), which makes it easy to grab all of them, then photobucket has 5 images at the very top that are the users latest uploads, so i just slice off the first 5 images from the viewing list.

But thats a terrible way to do it, now it will slice 5 images off the image list even if you are downloading a page without the recent uploads. And the page length is hardcoded, so if its a page with 40 per it will mess up, and only photobucket uses th_, so removing that is my plan too, and that really only leaves me one choice, and thats to figure out a way to tell which images are a part of a gallery(nearly any) and which are not apart of the gallery.

once i get that part done(will probably take ages) i will need to figure out how to tell to get to the next page, since i have a hardcoded ?start= because thats what photobucket uses to travel the pages, but like i said before not every page i want to use this script on is going to be photobucket.

i know photobucket has full RSS feeds, but not every site does so im trying to avoid using RSS(i might edit just this script with the RSS method just to make it faster)
[Image: sig.php]
Reply
#9
How about getting the size of the image.
Thumbnails are small, but that can make one small image appear as thumbnail.

I just don't understand this line:
Quote:images are a part of a gallery(nearly any) and which are not apart of the gallery

It may be because I don't use PB.
Reply
#10
(12-13-2009, 05:10 PM)Master of The Universe Wrote: How about getting the size of the image.
Thumbnails are small, but that can make one small image appear as thumbnail.

I just don't understand this line:
It may be because I don't use PB.

a gallery site will sometimes have pictures on the page that are not apart of the gallery. so i need a way to find which images are in the gallery and which images are not a part of the gallery. but i think theres way to many exceptions to that since sites with images range from just about anything, so whos to tell which are a part of a section and which are not lol. i'll just do if not in imglist: append to imglist so i atleast dont download a picture more then once. so for photobucket that will give me a right arrow and a left arrow along with all the images in the gallery, and other sites it might include banners, and stuff of that sort.
[Image: sig.php]
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)