Bazaar

Bazaar

 




Wiki Tools

  • Login
  • Create Profile
  • FindPage
  • RecentChanges
  • Page History
  • Attachments

Analysis of bzr update over HTTP for bzr 1.6b3

Measurement

Doing bzr update when there is nothing to do takes longer than I would expect. So I used -Dhttp and here is the summary of the transaction:

1   0.341   0.341   Startup, about to connect()
2   0.474   0.133   POST /repo/branch/.bzr/smart
3   0.884   0.410   404 Not Found, Keep-Alive = True
4   0.885   0.001   About to connect()
5   1.101   0.216   GET /repo/branch/.bzr/branch-format
6   1.295   0.194   200 OK, GET /repo/branch/.bzr/branch/format
7   1.498   0.203   200 OK, GET /repo/branch/.bzr/repository/format
8   1.703   0.205   404 Not Found, POST /repo/.bzr/smart
9   2.113   0.410   404 Not Found, GET /repo/.bzr/branch-format
10  2.329   0.216   200 OK, GET /repo/.bzr/repository/format
11  2.522   0.193   200 OK, GET /repo/.bzr/repository/pack-names
12  2.728   0.206   200 OK, HEAD /repo/.bzr/repository/shared-storage
13  2.933   0.205   200 OK, GET /repo/.bzr/repository/pack-names
14  3.137   0.204   200 OK, GET /repo/branch/.bzr/branch/last-revision
15  3.341   0.204   200 OK, <stop>

Analysis

  1. For some reason we connect() twice. We don't keep the transport alive after the first POST (1-4). Which is a bit strange considering we do later on when the POST fails (8-9). Other than that one time, we do manage to keep the connection alive (as seen by the max=XXX decreasing)
  2. We spend a lot of time finding the repository and opening it. If we only opened the repository when we needed it, we would save 7 round trips. Out of approx 15, that would be half of the round-trip time.
  3. If we do decide to always open the repo, we still shouldn't be reading pack-names twice (11, 13)
  4. Having to probe for .bzr/branch-format before probing for .bzr/branch/format also adds a fair amount of overhead.

So given our current data layout, and desire to always open the repository, I think we could shave off maybe 2 round trips.

If we are always going to open the repository connected to a branch, I think the smart protocol should be written, such that doing Branch.open returns the information about the containing repository. It could even trivially always return Branch.last_revision_info as well as Repository.shared_storage, Repository.no_working_trees, etc. It would be nice if we could return repository-specific information, like pack-names at the same time. It also should try to bypass checking for both .bzr/branch-format and .bzr/branch/format. The BzrDir information should also be obtained with the same single round-trip.

Comparison with bzr+ssh

Comparison with doing a similar operation on a local network, using bzr+ssh.:

1   0.322   0.322   Setup, hpss call 'BzrDir.open repo/branch'
2   0.412           'ssh implementation is OpenSSH'
3   4.125           result ('yes'), 'BzrDir.open_branch'
4   4.135           result ('ok'), 'BzrDir.find_repositoryV2'
5   4.160           result ('ok', '..', 'no', 'no', 'no')
                        'BzrDir.open repo'
6   4.169           result ('yes'), 'BzrDir.find_repositoryV2'
7   4.180           result ('ok', '', 'no', 'no', 'no')
                        'Repository.is_shared'
8   4.211           result ('yes',)
                        'Branch.last_revision_info'
9   4.324           result ('ok', revno, revision_id)

bzr+ssh analysis

  1. We do a bit better, in that we don't open the pack-names files.
  2. However, you can still see that we probe for the BzrDir, and then for the branch object. And then we ask where the repository is, find out it is in the containing directory, and then issue another find in the parent directory. (We might be doing better here when it is multiple levels up, I don't know if it would do a single jump for ../.., or 2 like we do for plain http.
  3. BzrDir.find_repositoryV2 is smart enough to return the location of the repostiory, as well as extra meta-information about it. However, the specific extra information is a bit limited, and doesn't cover what we really care about. Specifically it returns:

    path, rich_root, tree_ref, external_lookup = self._find(path)

    These are probably still necessary, but it seems like it would be good to return the other simple booleans, like shared-storage.

  4. The overall time is certainly dominated by connecting and starting a remote bzr instance. Partly this is because my remote machine is a bit slow.