The Artima Developer Community
Sponsored Link

Computing Thoughts
Automatically Synchronize Your Site
by Bruce Eckel
April 28, 2007
Summary
The most interesting part of this is not the tool -- although that was fun to create -- but the process of letting go that allowed me to build it.

Advertisement

Backstory: I started using Zope many years ago. I've slowly discovered that Zope is great, amazing, spectacular, as long as you live and breathe Zope (it's especially good for big, industrial-strength, powerful customized sites). But Zope pretty much reinvents the world, so if a web site is something that you do casually on the side, the Zope world is going to look different than your normal world. So you forget the Zope world every time, and either have to relearn it when you come back, or end up just using the bare bones basics of Zope. That's what I did.

As an example: the most trivial and basic thing you could want to do, take the fields in an input form and store them: Very nontrivial in Zope. Using raw Python, PHP, Perl, on your server: fairly trivial.

So I've been slowly trying to shift away from Zope into something clean, basic, and straightforward. I've been moving towards the Unix philosophy: instead of one, big, do-all tool (Zope), piece things together using a lot of little tools. So: CSS, HTML, PHP (but only simple PHP for simple things), standalone Python or a framework like TurboGears (for more complex things). Cheap server (GoDaddy, in my case) from a service that's big enough I don't have to worry about uptime and maintenance issues (and so my system can easily be transferred to customers with low budgets). And very simple development tools.

Perhaps they've gotten better, but I'm still smarting horribly from my initial experience with web page development tools. There turned out to be no way they could isolate you from the underlying code, so you ended up hacking on horrible HTML. After looking in vain for a tool that would generate readable (that is, editable) HTML, I reverted to hand-coding everything, just so it was maintainable. I've been fairly happy with that decision.

CSS, although it is far from living up to its promise, was a big aid in this direction. They wanted to allow you to annotate your HTML with styles, defined in a single place so they could be easily changed. Noble and desirable, so probably the stupidest thing in CSS is that you can't actually create your own styles, as in <MyStyle>, but that you must use the hacky and verbose style="MyStyle" which was a sad and costly compromise. And of course there's the fact that CSS doesn't behave the same on all platforms, something else it promised to do. Despite all that, it's a big improvement. And the pages seem to come up a lot faster, which makes sense because pages are now primarily content and have much less markup (which can be cached), thus there's a lot less to move across the wire.

So now my pages are relatively readable and so not very hard to write and maintain by hand (which is something that Zope also provided). Thus I am able to move away from Zope and as I do so things actually seem to get better in a number of places (I know, Zope can also do CSS just fine. My problem isn't that, it's the programing issues).

The point of this article is another feature of Zope that I "couldn't live without": through-the-web editing. Zope allows you to create and edit objects from any machine, using only a web browser. I've actually used this feature a handful of times while traveling, via internet cafes. But hardly ever. Nonetheless, I was convinced I needed it and that's what I found fascinating. I didn't think about it very hard before making this decision, because if I had I would have questioned the value of it.

Through-the-web editing is so amazingly primitive that I have often found myself copying the contents of a page into UltraEdit, performing more sophisticated edits there, and pasting it back in. That should have clued me in, but it took awhile.

I searched for through-the-web editing tools that might be installed on Godaddy. These are a bit vague -- the descriptions I read did not make it clear exactly what they did, but the most promising open-source one appeared to be the unfortunately-but-unforgetably-named "fckeditor." I didn't get around to trying it before I had my epiphany.

Which was this: I just want to edit files, using powerful editing tools (including python tools I might write myself), and just have the results magically appear on the server. Through-the-web editing was holding me back, a lot. And as happens lately when I have one of these epiphanies, I always wonder what else I'm stuck on that's holding me back.

Anyway, here's my solution. This program watches the entire subtree from where it's started, and anytime a file becomes newer than the last time the program recorded the modification date, that file is copied to the server. If you add new directories, those are automatically created on the server. The program repeats every second (printing '.' when there's nothing to do, so you know it's alive), which means you get reasonably responsive results.

The directory and file information is stored in a text file using Python's pprint (pretty-printing), so the file is both readable and editable. This information is stored as the text representation of a tuple of a list and a dictionary; to recover this information all I do is eval the contents of that file, which happens on program startup. Every time the directory is successfully updated to the server, the information is stored to disk.


"""
updateSite.py

by Bruce Eckel, www.MindView.net

Mirror (every repeatInterval seconds) the current directory
hierarchy onto a web site using ftp. Only updates files that
have changed since the last update. Does NOT compare this
hierarchy with the destination web site. If something has
changed since last time, it's pushed onto the server.

Place this program at the root of your web site tree on your
computer, then just double-click it to run it. It will run in
the background until you kill the window.

"""
repeatInterval = 1.0 # seconds
fileData = "updateSiteData.txt"
excludeFiles = (fileData, "updateSiteConfig.txt")

site, user, password = eval(file("updateSiteConfig.txt").read())

import os, ftplib, traceback, threading, pprint

class SiteUpdater(object):
    def __init__(self):
        # Read in old file data:
        if os.path.exists(fileData):
            self.oldDirs, self.oldFiles = eval(file(fileData).read())
        else:
            self.oldDirs = []
            self.oldFiles = {}
        self.__connect()
        self.updateSite()

    def __connect(self):
        self.ftp = ftplib.FTP()
        self.ftp.connect(site, port=21)
        self.ftp.login(user, password)
        print "Connected"

    def __del__(self):
        self.ftp.close()

    @staticmethod
    def getTree():
        changedFiles = {}
        newDirs = []
        for dir, subdirs, files in os.walk("."):
            newDirs += [os.path.join(dir, d) for d in subdirs]
            for path in [os.path.join(dir, f) for f in files if f not in excludeFiles]:
                changedFiles[path] = os.path.getmtime(path)
        return (newDirs, changedFiles)

    @staticmethod
    def __flush():
        """
        Force all files and dirs to be up to date
        """
        file(fileData, 'w').write(pprint.pformat(SiteUpdater.getTree()))

    def getUpdateLists(self, newDirs, changedFiles):
        dirUpdates = [path for path in newDirs if path not in self.oldDirs]
        fileUpdates = []
        for path in changedFiles:
            if path in self.oldFiles and self.oldFiles[path] == changedFiles[path]:
                continue
            fileUpdates.append(path)

        # Sorting dirUpdates produces short directories first:
        return sorted(dirUpdates), sorted(fileUpdates)

    def updateSite(self):
        newDirs, changedFiles = self.getTree()
        dirUpdates, fileUpdates = self.getUpdateLists(newDirs, changedFiles)

        if not fileUpdates and not dirUpdates:
            print ".",
        else:
            print
            try:
                for d in dirUpdates: # Create new directories
                    dir = d[1:].replace('\\', '/') # Strip leading '.'
                    self.ftp.mkd(dir)
                    print "created", dir
                for f in fileUpdates:
                    dir, name = os.path.split(f)
                    dir = dir[1:].replace('\\', '/') # Strip leading '.'
                    self.ftp.cwd(dir)
                    self.ftp.storbinary('STOR ' + name, file(f, "rb"))
                    print "updated", f
                # Only store the updates if we get through the whole list:
                file(fileData, 'w').write(pprint.pformat((newDirs, changedFiles)))
                self.oldDirs = newDirs[:]
                self.oldFiles = changedFiles.copy()
            except:
                traceback.print_exc()
                # Close and restart if there are any ftp problems:
                print "Session timed out; reconnecting"
                self.ftp.close()
                self.ftp = None
                self.__connect()
        threading.Timer(repeatInterval, self.updateSite).start()

if __name__=="__main__":
    try:
        SiteUpdater()
    except:
        traceback.print_exc()
        raw_input("Press Enter...")

Most ftp servers will disconnect after awhile. This program detects a disconnect because an exception is thrown when you try to write to a disconnected ftp object; in that case it automatically closes the connection and opens a new one (there may be a more elegant way to do this, and I'd love to hear about it). This way, it can just keep quietly running and updating in the background. All I have to do is edit the files in my tree and they show up on the server.

I've been using this for a couple of weeks and so far it's been very nice. Now I can edit files with a sophisticated editor, so I'm much more productive. And I don't miss through-the-web editing.

The only other things I'd like to be able to do on the Godaddy site -- and I've read bits and pieces about Apache servers in general that make me think this is possible -- is:

Pointers appreciated.

Talk Back!

Have an opinion? Readers have already posted 7 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Bruce Eckel adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.

This weblog entry is Copyright © 2007 Bruce Eckel. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use