Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #67 For 15 May 2000

By Zack Brown

Table Of Contents


Many thanks go to Kenneth Topp, who did some research after last week's issue and found that the article on 'movb' in Issue #66, Section #1  (18 Apr 2000: 'movb' Instruction On Intel) had a precursor in Issue #47, Section #1  (20 Nov 1999: spin_unlock() Optimization On Intel) . Good eye, Kenneth! Thanks a lot!

Thanks also go to "Ender" for the suggestion that the printer-friendly pages should not have links to the printer-friendly pages ;-). This should now be fixed. Thanks, Ender!

Mailing List Stats For This Week

We looked at 1213 posts in 5060K.

There were 435 different contributors. 183 posted more than once. 155 posted last week too.

The top posters of the week were:

1. Status Of Tekram DC395U Driver; Development Process Explored

27 Apr 2000 - 2 May 2000 (3 posts) Archive Link: "SCSI Driver Tekram DC395U"

Topics: Disks: SCSI

People: Alan CoxKurt Garloff

Mattias Kunkel gave a pointer to the Tekram DC395U driver home page, and asked that it please be included in the main kernel sources. Alan Cox replied, "Its really for the author to ask. Im sure Kurt will decide when his driver is ready for that." And Kurt Garloff (the maintainer) explained, "Yes. I am currently cleaning up the thing and chasing some rare, odd problems. As a bug in a SCSI driver is not exactly fun, I want to be careful on this. I hope I can submit it soon to both 2.2.15 and 2.3.99." End Of Thread.

2. Technical Restrictions On Posts To The linux-kernel Mailing List

28 Apr 2000 - 6 May 2000 (12 posts) Archive Link: "Does the list reject mails with attachments?"

Topics: Disks: SCSI, Spam

People: David FordMarc LehmannSteve DoddAlan CoxJamie Lokier

Discussions having to do specifically with the linux-kernel mailing list itself were first covered in Issue #1, Section #1  (11 Jan 1999: Spam On linux-kernel) when some spam hit the list. Then in Issue #4, Section #11  (1 Feb 1999: Mailing List Problems) it came up that subscriptions would be dropped if messages bounced. Some more spam arrived for Issue #6, Section #1  (11 Feb 1999: Spam On linux-kernel) and Issue #7, Section #1  (18 Feb 1999: More Spam From . Then for Issue #11, Section #5  (17 Mar 1999: Hunt For A linux-kernel Mailbomber) someone bombed the list with many 80K messages, and folks tried to hunt the attacker down. A routing problem with the list servers was covered in Issue #12, Section #11  (26 Mar 1999: Routing Problem For linux-kernel) . Then Issue #14, Section #1  (5 Apr 1999: linux-kernel Troubles) covered a problem where someone subscribed the linux-kernel list itself to a bunch of japanese mailing lists. In the same issue, in Issue #14, Section #10  (7 Apr 1999: Aggressive Spam Filtering) , there was another discussion of how to filter spam. Issue #16, Section #8  (19 Apr 1999: linux-kernel Slowdown) covered a problem with the mailing list server running out of disk space, causing messages to be delayed.

Over a month passed, and then another discussion of slow propagation of list mail came up in Issue #21, Section #6  (24 May 1999: linux-kernel Mail Delays) . Then again over a month later, covered in Issue #27, Section #3  (2 Jul 1999: linux-kernel Mailing List Errors) , it appeared that mail written by one person was being accidentally attributed to another in the email headers themselves. Covered in Issue #29, Section #7  (19 Jul 1999: Linus' Email Address Forged By Spammers) , someone forged Linus' email in order to propagate some spam. Then, covered in Issue #32, Section #18  (17 Aug 1999: linux-kernel Under Attack) , someone subscribed linux-kernel to many completely unrelated mailing lists.

Again more than a month later, in Issue #37, Section #16  (27 Sep 1999: Problems With The linux-kernel Mailing List) , a number of problems with the linux-kernel list were reported, though no discussion took place. More delays due to backlog were covered in Issue #38, Section #7  (30 Sep 1999: Mailing-list Problems) .

Over three months later, in an embarrassing moment, linux-kernel was actually bitten by Y2K in Issue #51, Section #2  (3 Jan 2000: Y2K Strikes linux-kernel Mailing List) . Another delay due to lack of disk space was covered in Issue #52, Section #5  (11 Jan 2000: linux-kernel Mailing List Problems) . Finally, in Issue #55, Section #8  (10 Feb 2000: Standards Of Behavior In linux-kernel) there was a discussion of standards for behavior in list discussions.

This week, three months after the last report, Christian Zietz noticed that if he sent any attached files with his mail, it wouldn't be redistributed. David Ford explained, "the LM rejects emails that are over 40K in size. try posting a URL to the patch instead of the patch itself, or break it down into multiple patch sets." Marc Lehmann added:

The list software additionally breaks any mime mails send to the list, so people on the digest receice just a piece of junk most of the time.

So do not use attachments at all, please ;) It does not work regardless of the mailer people use :(

(Although this is a problem with the list software)

Alan Cox didn't see this problem, and asked if it only affected the digests; and Mark and Jamie Lokier replied that it only affected the digests, which stripped out the mime headers. Jamie also added that digests broke the "References" header. Steve Dodd went on, "While we're at it, vger seems to drop Resent-* from messages. If it was adding its own, I'd understand (though I don't think it's a good idea). As it is, if I bounce a message from say linux-kernel that I think should have gone to linux-fsdevel too, nobody has any idea how it got there.." but there was no reply.

3. Modularizing Elevator Code

28 Apr 2000 - 2 May 2000 (11 posts) Archive Link: "elevator code in kernel"

People: Jens AxboeStephen C. TweedieJeff V. MerkeyKip MacyArjan van de Ven

Kip Macy asked why the elevator sorting code needed to be hard-coded into the kernel rather than modularized to be optional. He pointed out that high-performance I/O devices could organize writes across the disk platter to minimize disk head movement more intelligently themselves. Jens Axboe replied, "You raise some good points, and I am making the elevator code more modular as we speak. That will give different drivers the opportunity to select the elevator (or not select) the one they want. But elevator reordering is really not that expensive and can work on much larger sets of data then the typical drive."

Arjan van de Ven also replied to Kip, pointing out that the elevator code did more than just order write requests; according to his experiments, it also merged about 70% of them. Skipping the sorting, and trying to merge only the current request with the previous one, would only merge about 60% of requests. Stephen C. Tweedie concluded from this, "So for every 30 requests we are left with with the elevator, we have 40 request without it. That's an increase of 33% in the number of requests we have to deal with. That sounds like a worthwhile optimisation to me." Kip agreed, saying he hadn't thought of the elevator code as merging requests, only as sorting them.

Elsewhere, Jeff V. Merkey added from a different angle, "For those of us with our own LRU that has to co-exist with the Buffer Cache, having it in the kernel beneath the Linux Buffer Cache handles the problem of preventing mutiple caches from "thrashing" the disk (since all the IO's or ordered beneath the buffer cache, including those injected from other sources in the ystem) . I think what's there is just fine and should be left the hell alone -- it works great."

4. NTFS Troubles; Windows Partition Formats

29 Apr 2000 - 2 May 2000 (11 posts) Archive Link: "crash while reading win2k ntfs partition"

Topics: FS: NTFS, Microsoft

People: Jeff V. MerkeySteve DoddDavid Weinehall

In the course of discussion, Jeff V. Merkey explained, "The Partition formats are different on W2K vs. NT4.0. If you attempt to write to a W2K partition with the current Linux NTFS driver - YOU WILL CORRUPT THE DRIVE. You might be able to mount it and read from it with the current code (some W2K configurations won't work though), but you should not attempt to write to it. The formats of a W2K partitions are using the Veritas Volume Manager Stuff they developed for W2K. The obvious fix is for someone to study NT4.0 vs. W2K and add the necessary support to **NOT** stomp on the database section Veritas stamps on the W2K partition -- if you overwrite it (which you do -- you think it's free space) -- W2K will be toast the first time it tries to mount the volume after Linux has corrupted it ......."

Steve Dodd replied, "the write code probably mangles NT 4 volumes quite nicely too - the directory handling, anyway." But David Weinehall countered, "NTFS requires CONFIG_EXPERIMENTAL and the write-support is marked DANGEROUS. This _should_ make most people thing twice before mounting any partitions with write-support. But of course, that's in an ideal world..."

In his same post, Steve added with eyes as wide as saucers, "They store data on the volume without marking the used blocks in $Bitmap? That's Evil(tm). Is there any technical justification for it?" Jeff replied to this and to Steve's previous comment:

Microsoft may string me up for this but having their customer's data get mangled beyond recognition is not helping them, is not helping Linux, and is definitely not helping their customers. There's also the fact that despite the fact that the driver is busted and we have told folks that it is, people just seem to keep using it. I am heads down in the Linux page cache chasing bugs with NWFS (yes BUGS in the page cache code of linux - I've found one in generic_commit_write where it will not post data properly unless the page->buffers buffer head is alloc'd from the Linux Buffer Cache) - I've got several items to do yet on NWFS on 2.4, but this NTFS thing just keeps coming back and biting us.

I will not have time to get on this until after I post the full Page Cache version of NWFS (which is very close). I also am still looking at a problem with mkisofs comlaining about "circular diretories when you image an NWFS volume. After I get caught up, I will be happy to help you get the NTFS driver working (to the point where is stops corrupting data). I recommend MINIMAL functionality (i.e. let's drop the indexing records and trying to get fancy with the MFT -- just write out data runs so W2K can mount the FS without corrupting data).

The thread petered out around here.

5. More Fixes After Structural Changes

29 Apr 2000 - 2 May 2000 (4 posts) Archive Link: "Patch to change from char* to char[]."

Topics: USB

People: Nick HollowayRandy DunlapLinus Torvalds

Continuing from Issue #63, Section #7  (4 Apr 2000: Structural Changes Before Stable Series) , where Linus had redefined a particular variable from a pointer to an array, requiring some recoding in various drivers; Nick Holloway replied this week:

A while ago, I said I had started a patch that updated the network drivers to support this. As it has been a while, I thought it would be better to publish what I have, rather than sit on it.

The patched kernel compiles, but I haven't actually test booted it!

In addition, the cleanup is partial. I believe that they should all be moving from static net_device structures to using init_etherdev, and other such changes. I didn't want to get into the realms of such large scale cleanup.

There are two known ommissions from this patch.

Firstly, none of the PCMCIA drivers are updated. Whereas the standard net drivers used a "char[] name" member in their private structure for allocating storage (normally insufficient), the PCMCIA drivers use the name member for other purposes. It may be that a simple "strcpy" will suffice, but I'll leave that for now.

Secondly, I haven't fixed the recently introduced lmc wan driver. This is because it is difficult to do without breaking their conditional compilation based on kernel version. They also need to fix "Allan Cox" in their comments :-)

The patch updated 60 files, and can be found at: (10804 bytes)

Please give it a try, and see if it works. I may get a chance to do some testing myself, but my wife has thoughts of decorating :-(

Randy Dunlap replied that the USB drivers also had not been updated, and added, "Please give us a small bit of warning if/when drivers need to be modified...or is this the notification?" Linus Torvalds replied:

Consider this the notification. I'm sorry for the inconvenience, but the alternative patch (that also fixes the bugs with name handling) from Alan was just too ugly for me and left this clean-up for a later date.

The good news being that all the problem spots should pretty much be pinpointed by simple compiler error messages..

6. Some Discussion Of How Often To Task-Switch

30 Apr 2000 - 2 May 2000 (6 posts) Archive Link: "What is the optimum time for task switching?"

People: Michael PooleJohan Kullstam

Mark Zealey asked what the optimal time for task switching was, and Michael Poole replied, "From a throughput point of view, due to the overhead of a task switch (not only in actual CPU cycles, but in terms of cache dirtying and reloading), you only want to switch tasks when the current task becomes blocked on I/O (or blocked for some other reason). For interactivity and response time, you want to switch tasks periodically. The exact frequency depends on the job mix you're running and how stringent the response time requirements are, and these numbers are almost impossible to determine objectively. So there's a bit of calculation and a whole lot of guessing that goes into determining how often you preempt a task."

Johan Kullstam put it similarly, "from a cycle efficiency standpoint, optimum would be to stay infinitely long with each task and only switch when the task was complete or waiting for input. however, user interaction demands switching once in a while to preserve the illusion of multi-tasking," and went on later, "the cost of switching context is non-zero and it is overhead, i.e., not productive. each switch incurs a fixed cost. thus you want to switch as little possible. note, that when the context switch time is a small fraction of the total, say under 1%, then it doesn't pay to reduce it (since you can only reap what it burns and that's under 1%)."

That was that.

7. 'devfs' Change Breaks Current Systems But Mollifies Detractors

30 Apr 2000 - 4 May 2000 (16 posts) Archive Link: "[WARNING] devfs mount default changed"

Topics: FS: devfs

People: Richard GoochJeff GarzikDavid FordRask Ingemann Lambertsen

Richard Gooch announced, "The default mounting behaviour for devfs has changed recently :-( If your system no longer boots correctly (a typical message is "Unable to open initial console"), you may have been caught by this change. Look for the boot message: Mounted devfs on /dev. If this is not present, add "devfs=mount" to your boot options."

Rask Ingemann Lambertsen also pointed out that just giving the "devfs=mount" command at bootup wouldn't work, it had to go directly into the 'lilo.conf' file.

Jeff Garzik (who authored the change) also replied to Richard's initial announcement, pointing out, "Thanks to the default mounting behavior change, people can compile devfs into their kernels without being forced to use it." But David Ford complained, "Unfortunately that means everyone who has been using it for the last two years now has to change things. It's really annoying and breaks "least surprise."" Richard also replied to Jeff:

Of course, they could before as well: just add "devfs=nomount". Your patch just made it more convenient to some people, at the cost of breaking existing behaviour (and making it inconvenient to others).

Anyway, I don't really want to argue. I'm not happy about the change, but the King Penguin has decided, and that's that. Unlike some people, I won't keep screaming at the top of my lungs for him to change his mind. I'll just grumble whenever I get a similar bug report ;-)

He mentioned the possibility of adding a config option to control the default, and David pushed for this as well. Elsewhere, Richard mentioned that this was done in 'devfs' version 166.

8. Putting Good Code On Hold While Checking Correctness

3 May 2000 (4 posts) Archive Link: "pre7-3/mm/vmscan.c "#error Do not let this one slip through..""

People: Linus TorvaldsAdam J. Richter

Adam J. Richter noticed an '#error' preprocessor command in 'linux-2.3.99-pre7-3/mm/vmscan.c' that was not bracketted by a preprocessor conditional. So the code it preceded would always fail to compile. Folks tried compiling with that line taken out, and had no problems either during compilation or after booting the new kernel. But at this point, Linus Torvalds explained

it will stay in the pre-kernels until the discussion on the mm mailing list has either shown that yes, the code really is safe, or resulted in a re-write.

I make pre-patches regardless, because I want to be able to let other developers see what I've integrated etc etc..

He went on:

The code is quite old and stable and works as well today as it did yesterday or a month ago.

However, there's an on-going discussion on whether the code is actually strictly correct or not, and I didn't want to forget and release a real pre7 without that being resolved..

9. Ancient Cache Bug Found And Fixed In 2.0, 2.2, And 2.3

3 May 2000 - 6 May 2000 (5 posts) Archive Link: "[PATCH] Bug in ext2 in 2.2.15pre20?"

Topics: FS: ext2

People: Stephen C. TweedieAndrea ArcangeliTheodore Y. Ts'oMalcolm Beattie

Stephen C. Tweedie posted a one-line patch to 'fs/ext2/balloc.c', and explained:

Looks like there's a long-standing thinko in the block bitmap caching in ext2 in 2.2. The inode bitmap cache looks fine.

Ted, want to ack this? It looks wrong in 2.3 too, btw.

Andrea Arcangeli agreed that this was a bug, and explained, "We never noticed because it can trigger only if there's been an I/O error while reading the group stuff in memory (and the probability to have an I/O error exactly in such place is very low). Also it seems to me that even if it triggers it won't cause subtle silent corruption but it will more safely generate immediatly an Oops inside ext2fs. It was probably hurting a bit performances in some case (because we was probably entering load__block_bitmap more than necessary ;)." Theodore Y. Ts'o also replied to Stephen's report, "Yup, it's a bug, all right. We're lucky this hasn't caused us problems on large filesystems." Malcolm Beattie asked if this was a problem in 2.0 as well, and Stephen confirmed that it was.

10. Some Explanation Of '/dev/kmem'

5 May 2000 - 8 May 2000 (14 posts) Archive Link: "/dev/kmem"

People: Tigran Aivazian

Michal Kosek asked for information about '/dev/kmem', and Tigran Aivazian replied:

The idea of /dev/kmem is that file "offsets" in it correspond to kernel virtual addresses, so seeking to the addresses of "well-known" symbols and reading values off there gives you the values of kernel data structures. Of course, these values are not 100% self-consistent because the kernel data structures change while you are reading/writing them.

Linux version of /dev/kmem has one limitation - you cannot write to vmalloc'd range of addresses but you can read from them. Amit Kale (of VERITAS) solved this problem and sent a patch so if you need this ability - look for it in archives.

As for examples of usage of /dev/kmem - some old (and also non-Linux) versions of ps(1) used to use /dev/kmem - nowadays it is much better to access kernel data structures via well-defined interfaces exported by /proc.

Also, kernel programmers sometimes write little programs that automatically test self-consistency of various kernel structures by reading /dev/kmem.

The best example of usage of /dev/kmem is probably crash(1M) which is available on all System V Release 4 UNIX flavours. (there is also a clone of crash for Linux from MCL)

11. Status Of Intel 810 Chipset Graphics Card

5 May 2000 - 8 May 2000 (5 posts) Archive Link: "Does Linux support Intel 810 chipset graphics card ?"

Topics: Sound: i810

People: Alan CoxNils Faerber

The 810 graphics chipset was first covered in Issue #41, Section #8  (19 Oct 1999: New Hardware Mailing List; Some Discussion Of The 810 Graphics Chipset) , where it got a very poor review from Alan Cox. This week it did a little better. Someone asked about the status of 810 support under Linux, and Alan replied differently, "The 810 onboard video is supported by XFree 3.3.6 with an additional kernel module thats part of XFree and manages the AGP." Nils Faerber added that it was pretty stable, and went on, "Newer i810 boards even have a BIOS that initializes the panellink interface (digital LCD connection) so that panellink can be used with any videomode (i.e. XFree then runs via panellink!). And even more there is 3D hardware acceleration coming up in utah-glx. So far I would say the i810 is better supported than many other video cards ;)"

12. VMWare Breaks Under Latest Development Kernels

5 May 2000 - 6 May 2000 (7 posts) Archive Link: "pre7-4 to pre7-6 breaks VMWare module build"

People: Rik van RielMichael HarnoisAlan CoxAri PollakPetr Vandrovec

Problems with 'VMWare' were first covered in Issue #34, Section #1  (28 Aug 1999: VMWare Discombobulates The System) . Things looked even worse in Issue #39, Section #3  (2 Oct 1999: Vmware Developers Unresponsive To Bug Reports) . This week, Ari Pollak reported that ever since 2.3.99pre7-4, the 'vmware' module would no longer compile. Rik van Riel replied indifferently, "I guess the vmware people will have to issue an upgrade, then..."

Michael Harnois went on, "The closest thing the vmware people have to good sense is Petr Vandrovec; it'll probably take them six months to get a labeled fix out, and in the meantime they'll tell you it's your own damn stupidity that leads you to run development kernels. However, you can get Petr's fix at today." But Alan Cox chided, "I think its a bit much to berate someone for not supporting the latest cutting edge snapshot."

(BTW, the link to the patch above was broken at KT press time. Petr sent in a correction. Thanks, Petr! -- Ed: [15 May 2000 10:29:00 -0800]







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.