An Introduction to Rsync (part 1)

Rsync is a UNIX command that can keep two file folders synchronized. (For Windows users, see Use Linux Commands and Shell Scripts directly in Windows.) In part one of this article, I will describe what rsync is and how it works, and I’ll give examples of how it can be used. In part two, I’ll go through a detailed example, step by step.

Rsync is basically a copy command on steroids. The number of options available for rsync is exhaustive (something on the order of 100 different switches). See the man pages for details (”man rsync”).

rsync -vrut --filter='. searchcopy_filelist.txt'
work /cygdrive/e/work_search


As is common with Unix commands, the command line switches can be spelled out, in which case they begin with two hyphens (”–filter”), or they are abbreviated to a single letter, in which case they began with one hyphen (”-f”). single letter versions of the command line switches can be combined, thus “-vrut” is the equivalent of “-v -r -u -t”. The two different styles of switches can be intermixed. Casing is significant. (”-b” and “-B” are two completely different switches.)

The example above (which is supposed to be all on one line) consists of five switches plus the specification of the source folder and the specification of the destination folder.

  • -v is the “verbose” switch
  • -r is the “recursive” switch, which says to process subfolders
  • -u is the “update” switch, which says not to overwrite corresponding files that already exist in the destination folder, if they have a newer timestamp
  • -t is the “times” switch, which says to preserve the timestamp when making a copy (as opposed to using the current time)
  • –filter is the switch that allows for precise control of determining which files and/or folders are transferred. In this case, we are using a rule that says read more rules from a file, namely “searchcopy_filelist.txt”
  • The first command line argument that does not begin with a dash, namely “work” in this case, is the name of the source folder (relative to the current folder).
  • The last command line argument that does not begin with a dash, namely “/cygdrive/e/work_search” in this case, is the name of the destination folder
  • If there are more than two command line switches that do not begin with a dash, then all but the last one are considered source folders, which are processed successively.

Lickety Split: Rsync’s main claim to fame is that it’s fast when operating remotely because, if the remote target system already has an old copy of a particular file, then Rsync will only transmit the differences. This process is referred to as the rsync remote-update protocol. This means that rsync can be executed as frequently as desired without sacrificing bandwidth resources.

Precise Control: Rsync is also quite powerful with respect to specifying exactly which files and folders are to be synchronized and how. The two folders being synchronized do not need to be exact copies. One can be a subset, or superset, of the other, according to (partial) matches on subfolder names, the file names, and/or the filename extensions. The matches are defined using regular expressions, thus there is quite a bit of power there.

Local vs. Remote: The R in “rsync” stands for remote, but the folders could be local to each other as well. As far as rsync is concerned, “local” means that both file folders are controlled by the same computer, either on the same disk drive or different disk drives. “Remote” means that the file folders are controlled by two different computers connected across a network. For rsync to work remotely, two copies of rsync need to be running. First, rsync needs to be started in daemon mode at one end of the connection, and then another copy of rsync would be fired up (in client mode) at the other end.

Possible Uses: Here’s a short list of some possible uses for rsync, ending with an old classic. (Note that the terms “original” and “mirror” are my designations. The rsync documentation refers to them as “source” and “destination”.) In all of these examples, the assumption is that the original location is regularly being updated with changes and new additions, and that the mirror(s) need to keep up with those changes.

File Types Original Location (Source) Mirrored Location (Destination)
Digital photographs, videos, MP3 files/podcasts, and other media A main library containing all of the files An extract of the library, containing only the “interesting ones”
Business documents and spreadsheets The master repository for a company An extract of the master repository taken on the road by an individual employee, containing only the documents that are pertinent for that employee
Website pages, program code, vector graphics, or any other “source” files Working folders that contain both source files and derivatives (compiled objects, rasterized graphics, etc.) A shadow copy of the working folder that contains only the source files, without the derivatives, and with or without any library source code and/or any sample code provided with the libraries
Software and other files intended to be downloaded The official repository Multiple download mirrors, in various physical locations around the world

This list is by no means exhaustive. It is only intended to spark the imagination. In the case of making a shadow copy of the source code from a set of working folders, for example, there could be many different motivations. Perhaps, there is a legal obligation to place a copy of the source code into escrow. Perhaps, there is simply a need to make a snapshot of the files for posterity (although a version control system such as Subversion would be a better choice for that). Or, perhaps, a subset of the working folders sans library code/examples would be easier to search and navigate for developers newly assigned to the team who are not yet familiar with the CodeBase.

Comments

  1. The tips the new author of CodeJacked is writing are way out of my interest and league.

  2. Thierry: First of all, thank you for taking the time to say so. This is exactly the kind of feedback we can use. As a matter of fact, we are working on putting together a reader survey, and you can look for that to appear sometime next week. Also, you may have noticed recently that the bylines now appear more prominently. This is because I am no longer the only blogger here. Vladimir and Stephanie will be contributing on a regular basis. Plus, I’m working on lining up some guest bloggers for the future. So, I hope you’ll stick around at least a little bit longer. In the meantime, if you have any specific suggestions for topic areas you would like to see covered in the future, please don’t hesitate to send them to tips@codejacked.com.

  3. Nice article… Informative and not too deep. Exactly what I was looking for to get rsync working between my media centre and file server for my CD rips…

Post a Comment


Your email is never published nor shared. Required fields are marked *



© 2006-2007 Maxim Software Corp.  All rights reserved.