On Jun 14, 2007, at 5:03 PM, Kris Kennaway wrote: >> It's at least arguable that doing queries against a data set >> including a bunch of repeats is "skewed" in a more realistic >> fashion. :-) A quick look at some of the data sources I have handy >> such as http access logs or Squid proxy logs suggests that (for >> example) out of a database of 17+ million requests, there were only >> 46000 unique IPs involved. > > There were still lots of repeats, just some of them were repeated > hundreds of thousands of times - I stripped about a dozen of those > (googlebots, I'm looking at you ;-), leaving a distribution that was > less biased to the top end. Heh, yes, it's surprising how happy a webspider is to crawl around a heavily-interlinked site. :-) Perhaps someone ought to add a: Crawl-delay: 600 ...statement to http://www.freebsd.org/robots.txt...? >> You might find it interesting to compare doing queries against your >> raw and filtered datasets, just to see what kind of difference you >> get, if any. > > Cached queries perform much better, as you might expect. As an > estimate I was getting query rates exceeding 120000 qps when serving > entirely out of cache, and I dont think I reached the upper bound yet. Sure, anything cached or anything the nameserver is authoritative for is going to be directly answerable without having to do an external recursive query. >> What was the external network connectivity in terms of speed? The >> docs suggest you need something like a 16MBs up/8 Mbs down >> connectivity in order to get up to 50K requests/sec.... > > I wasn't seeing anything close to this, so I guess it depends how much > data is being returned by the queries (I was doing PTR lookups). I > forget the exact numbers but it wasn't exceeding about 10Mbit in both > directions, which should have been well within link capacity. Also > the lock profiling data bears out the interpretation that it was BIND > that was becoming saturated and not the hardware. OK, thanks for the info. Maybe I'll get a chance to run some numbers of my own testing, if I can free up some time from WWDC.... >> [ ... ] >>> It would be interesting to test BIND performance when acting as an >>> authoritative server, which probably has very different performance >>> characteristics; the difficulty there is getting access to a >>> suitably >>> interesting and representative zone file and query data. >> >> I suppose you could also set up a test nameserver which claims to be >> authoritative for all of in-addr.arpa, and set up a bunch (65K?) /16 >> reverse zone files, and then test against real unmodified IPs, but it >> would be easier to do something like this: >> >> Set up a nameserver which is authoritative for 1.10.in-addr.arpa (ie, >> the reverse zone for 10.1/16), and use a zonefile with the $GENERATE >> directive to populate your PTR records: >> >> [ ...zonefile snipped for brevity... ] >> >> ...and then feed it a query database consisting of PTR lookups. If >> you wanted to, you could take your existing IP database, and glue the >> last two octets of the real IPs onto 10.1 to produce a reasonable >> assortment of IPs to perform a reverse lookup upon. > > I could construct something like this but I'd prefer a more > "realistic" workload (i.e. an uneven distribution of queries against > different subsets of the data). I don't have a good idea what > "realistic" means here, which makes it hard to construct one from > scratch. Fortunately I have an offer from someone for access to a > real large zone file and a large sample of queries. Ah, very good, then. While I expect there to be quite a difference between recursive queries vs. authoritative/locally answerable queries (after all, that seems to be why both dnsperf and resperf were created as distinct programs), I'm not convinced that there is too much difference between doing reverse lookups for one set of IPs versus another if those IPs are all in zones the server is authoritative for. -- -ChuckReceived on Thu Jun 14 2007 - 22:26:06 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:12 UTC