I Buy It

http://thisisnthappiness.com/post/3926904033/habitual-drunkenness-is-usually-psychologists

Habitual drunkenness is usually, psychologists inform us, the result of the inability to accommodate oneself wholly to reality. It is often a vice in that unfortunate class of people who have imperfectly coordinated artistic facilities. They yearn vaguely for something other than the world they know but they lack the capacity to create a world nearer to their hearts’ desire. Still more, do they lack the capacity to attain a comprehensive vision of the beauty emanate in this world. Neither the art of escape nor the art of revelation is possible to them. Nevertheless they have perceptions they cannot use and impulses that never come to fruition. Drink, or some other drug, by relieving their sense of impotence and by blurring the unfriendly outlines of the real world brings them solace and becomes a necessity.

 

via Google Books.

What’s Magical About 1272 Bytes?

So here’s a bit of Windows Vista Ultimate 64-bit arcana for you… I was doing some research on the performance and efficiency of relative cluster sizes, and because of this I wanted to know how many files of certain sizes were on my disk. So I started running some searches, with various cluster sizes that I was considering, hoping to get some data points against which to run some statistical analysis. Here’s what I ended up with, running Vista’s file search in “non-indexed” mode, and choosing to include hidden and system files:

File Size File Count
<64KB 72,781
<16KB 53,480
<8KB 42,696
<4KB 31,542
<2KB 15,822
<1KB 19,528
<0.5KB 10,058

Did you notice something odd? That’s right, the number of files <1KB in size is greater than the number of files <2KB in size! This is mathematically impossible, of course.

Using a manual binary search algorithm, I finally arrived at the magic point: something weird happens between the 1272 and 1273 byte count, as the following two screen shots illustrate (click for larger versions, look at the upper right and lower left of each).

search for files <1273 bytes in size

search for files <1272 bytes in size

Logically, the second search should yield slightly fewer results, assuming there are a couple of files on the drive that are exactly 1273 bytes (in reality, there are exactly 15 1273-byte files–this should be the delta between the two searches). In fact, the second search yields more than twice as many!

I was hoping I could narrow down what was going on by searching for specific file types instead of the *.* pattern, but as soon as I did that, everything seemed to work. Interestinly, if I then went back to the *.* pattern, the 1272B search produces a correct (lower) number! However, if I then run a 1KB search I get the higher number again, and if I repeat the 1272B search I again get a higher number.

Pretty strange, huh?

In case you’re wondering:

Intel E8400

8GB DDR800

Windows Vista Ultimate, 64-bit, SP1 and all “important” updates applied

Seagate 1TB SATA at default cluster size

Be the Google

Would you like 500mb of web hosting, plus Python, plus Django, plus a lot of Google database and application goodies? Would you like it for free? Then, my friend, what you need is Google App Engine. No mention yet of what the pricing is like after you hit your 5 million hits per month. But trust me, if you’re at 5 million hits per month, you don’t care.

Sorry Amazon S3, but you can basically suck it.

Update: um, yeah, it’s wait-listed. Though if you have Google Apps, you’re probably in.

Return a Record for Each Date Between Two Dates in SQL Server >= 2005

Blogging this so I don’t forget it…

It used to require some fairly ugly, resource intensive hacks (cursors, temp tables, etc.) to emit an inclusive list between two data points when the source data might not include an entry for every point (for example, a calendar, where not every day contains an event). In SQL Server 2005 and above, this is trivially easy, with a Common Table Expression (CTE) and a Recursive Query. To emit one record for every date between 1/1/2008 and 1/31/2008, you do this:


WITH datecte(anydate) AS (SELECT CAST('1/1/2008' AS datetime) AS anydate
UNION ALL
SELECT anydate + 1 AS anydate
FROM datecte AS datecte_1
WHERE (anydate < CAST('2/1/2008' AS datetime) - 1))
SELECT anydate
FROM datecte AS datecte_2

If you need more than 100 days (the recursion limit is 100), add this to the end:

OPTION (MAXRECURSION 1000)

The fact that they stop recursion short at 100 by default would seem to indicate that this is an expensive procedure, but even if you’re just using this to produce a dummy table with all the dates for several years, it’s a nice shortcut.

I just tried the following query, which emits a record for every day between 1/1/2000 and 12/31/2020:


WITH datecte(anydate) AS (SELECT CAST('1/1/2000' AS datetime) AS anydate
UNION ALL
SELECT anydate + 1 AS anydate
FROM datecte AS datecte_1
WHERE (anydate < CAST('1/1/2021' AS datetime) - 1))
SELECT anydate
FROM datecte AS datecte_2
OPTION (MAXRECURSION 10000)

On my P4-641+ the script emits 7671 records in 0 (that’s zero) seconds and “spikes” the processor to all of 3%. Granted this is not a complex query, but at least we know the recursion (if it really is recursion internally, which I doubt) isn’t expensive by itself.

Vista… 64-bit… Where’s My Headroom?

Other than a couple of virtual machine beta builds, I had managed to stay out of Vista entirely until the last month or so. Since then I’ve tried to install on three machines–a client’s Dell Optiplex, which never was able to boot after install, and two home-built systems. This weekend I built a brand new system out of all Vista-logo components. It booted; Vista reported the hardware compatible; it even got a 5.8 experience index score. But I had continuous crashing of both IE and Windows Explorer. Also, on what should have been basically the fastest hardware available, the Vista with SP1 install took over 90 minutes.

I’m walking away. My current approach for my development machine is going to be Windows Server 2008 Standard 64-bit. Again I have certified components, but 64-bit in itself represents a struggle in terms of driver and application compatibility. We’ve had 64-bit CPUs in our machines for going on 5 years, and 64-bit Windows options for almost as long, and yet you still cannot run common programs and drivers in the environment–Flash, TWAIN, most VPN software, the list of things you can’t do (or do well) is astoundingly comprehensive.

We’re heading toward a very real wall here: 32-bit versions of Vista (as with other flavors of Windows) are limited to 4GB of RAM. Yet that is simply not enough for Vista plus any serious suite of applications. At the same time, 64-bit Windows still isn’t a truly viable desktop for most users. Out of necessity, I’ll compromise on many fronts–multimedia capability, peripheral compatibility, native software availability–but some of this stuff isn’t easily virtualizable, so I’m looking at the possibility of having to keep 32-bit systems around (for example for scanning, connecting to client VPNs, etc.). I’m really starting to feel hemmed in.

I guess I could take a step back here and look at it from the Mac perspective. It works because it’s broken; it’s broken because it works. That is, by forcing a switch to 64-bit Server I’m pruning the 16- and 32-bit dead wood that’s keeping me in the 4-gigabyte sandbox. Apple users long ago embraced obsolescence as a feature. Vista and 64-bit computing may (finally) force the Windows side of the PC world to wake up to this. Or maybe not. There’s a downside to the Mac example: “performance” in an absolute sense is to some extent irrelevant, and scaling up doesn’t necessarily have to be as smooth or cheap as we’re used to, as long as the chrome is shiny and doesn’t peel off too obviously.

This issue bears similarities to the current Internet Explorer 8 web standards argument–do we break the web (force IE8 standards mode, cripple billions of web pages) to move toward the Platonic ideal of standards? Do we break the PC ecosystem (Vista, 64-bit) for the hope of increased functionality and capacity in the next generation of platforms (available today, but considered unusable by the consumer)? You know it’s a tough question when even Joel Spolsky can’t tell you the answer. But generally, culturally, we’re not long-term investors, certainly not when the benefits are nebulous and far off and the pain points are obvious and immediate. As Joel argues, for web standards under IE this is a late-bound issue–they can throw the switch any time to go back to a more relaxed mode. But the issue of Vista running out of memory and 64-bit versions not being ready for prime time is a lot harder to resolve.

G-Archiver and the Risks of Random Downloading

This is a pretty amazing story about a free utility with a malicious back-end twist.

This is so bad that I assumed it was a hoax. However, I downloaded the program, installed it (on a virtual machine), decompiled it, and verified that it is, in fact, “phoning home” with your gmail user name and password. Yikes.

The manufacturer’s page has been updated to indicate that this “was in no way intentional,” but does it really matter?

A fluke among flounder

Bad Behavior has blocked 331 access attempts in the last 7 days.