Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #60 For 27 Mar 2000

By Zack Brown

Table Of Contents


Thanks go to Ben Tilly, who sent this email regarding last week's Issue #59, Section #1  (9 Feb 2000: Capabilities) :

Most of the people following your page probably do not realize that the "Capabilities" being added to Linux are a very poor substitute for true capabilities as implemented by systems like EROS.

The discussions at explain what a pure Capabilities model is. The past kernel discussion at explains clearly why what is being added to Linux is a different beast all together.

I think that you should link back to that conversation to help clue in a few people...

Thanks, Ben! The subject certainly has room for many perspectives, and yours is valued as well.

Mailing List Stats For This Week

We looked at 1611 posts in 6986K.

There were 538 different contributors. 266 posted more than once. 197 posted last week too.

The top posters of the week were:

1. Alan Nears 2.2.16

10 Mar 2000 - 20 Mar 2000 (16 posts) Archive Link: "Linux 2.2.15pre14"

Topics: FS: ext2, I2O, Ioctls, Kernel Release Announcement

People: Alan CoxRik van RielFrank van MaarseveenDoug LedfordArjan van de VenRogier WolffDeepak SaxenaMitchell Blank JrChristoph RohlandPaul VojtaDave Jones

Alan Cox announced 2.2.15pre14, and said, "Ok everyone promptly found lots of bugs. The good thing is almost all of these are small but long standing issues, so we are starting to really shake out the obscure bugs the bigger ones masked. The megaraid issue is a work in progress. The folks who matter are working on it. The 1.04 driver is far slower. If you want to run 1.07 it appears good advice is to flash the most recent firmware."

He posted his changelog:

Change Author
Revert megaraid driver to 1.04 due to apparent corruption problems some firmware shows. This is a temporary state of affairs I hope, once Dell/AMI have a handle on which firmware and how to either fix it or refuse to boot on those firmwares then we can go back. (me)
Acard scsi shared IRQ fix (hopefully) Folks with Acard stuff please test this one hard (Acard, ported into newer driver by me)
Fix assorted network driver ioctl checks (Mitchell Blank Jr)
Two small updates to the telephony API needed by other vendors (me)
Fix bit masking error on IO port in I2O (Deepak Saxena)
Work around spitfire errata 32 and 54 on Ultrasparc (Dave Miller)
Work around Sparcstation 5 Swift MMU (Dave Miller)
Fix SunQE problem with 32bit sparc (Dave Miller)
Fix breakage of ISA support in SX driver and add EISA support (Rogier Wolff)
Arlan fixes (but not 4500 for 2.2.15) (Elmer Joandi)
Fix EXPERIMENTAL checking in 2.2.15pre (Paul Vojta)
Update AIC7xxx driver to rev 5.1.28 (Doug Ledford)
Simple (ie not strictly correct) fix for the cisco 3600 syncppp problem [Proper fix for 2.2.16 I think] (Madarasz Gergely)
Zero the sin_zero part of sockaddr_in (Frank van Maarseveen)
Correct erase handling in 16 colour text (Jon Mitchell)
Fix typo in videodev.h (fjolliton)
Another small ultrasparc errata fix (Dave Miller)
Semaphore deadlock fix (Christoph Rohland)
Further SX fixes (Rogier Wolff)
SKTR driver fixes (Christoph Goos)
Further small 3ware fix (Adam Radford)
Up the default number of module loadable scsi disks to 16 (me)
Wan config typo fix (Dave Jones)
Sparc blackbird errata fixes (Dave Miller)
Wanpipe needs inet (Arjan van de Ven)
Revert cursor/bh lock patch (breaks Alpha) (me)
Fix ext2 dir race (Al Viro)

Rik van Riel pointed out:

OOM testing people keep complaining to me that their klogd dies in 2.2.15pre13 (where my code got backed out).

Do you have plans of integrating OOM handling code or is my time better spent hacking other things?

Alan replied, "I plan to drop it back in at some point. I didnt want to flip something core like that while adding tons of small driver fixes" and Rik replied, "OK, I'll try to integrate some of Andrea's code and my code when I come back from Spain. Good to hear that some solution is at least being considered for 2.2."

2. Linus On The Verge Of pre-2.4; Status Of reiserfs

10 Mar 2000 - 17 Mar 2000 (163 posts) Archive Link: "Linux-2.3.51, and the pre-2.4 series.."

Topics: FS: NFS, FS: ReiserFS, FS: XFS, FS: ext2, I2O, Kernel Release Announcement, Modems, Networking, POSIX, SMP

People: Linus TorvaldsRichard GoochAlan CoxIvan PassosPavel MachekArtur SkawinaJamie LokierHans ReiserChris MasonChris EvansAlexander ViroManfred SpraulStefan TrabyAndrea ArcangeliDominik Kubla

Linus Torvalds announced Linux 2.3.51, saying:

I just made a 2.3.51 release, and the next kernel will be the first of the pre-2.4.x kernels. That does NOT mean that I'll apply a lot of last-minute patches: it only means that I'll let 2.3.51 be out there over the weekend to hear about any embarrassing problems so that we can start the pre-2.4 series without the truly stupid stuff.

There's some NFSv3 and other stuff pending, but those who have pending stuff should all know who they are, and for the rest it's just time to say nice try, see you in 2.5.x.

The pre-2.4.x series will probably go on for a while, but these are the "bug fixes only" trees. These are also the "I hope a lot of people test them" trees, because without testing we'll never get to the eventual goal, which is a good and stable 2.4.x in the reasonably near future.

Dominik Kubla replied that, at least on 2.3.50, his Xircom Ethernet/Modem/ISDN cardbus system wouldn't work, and failed to allocate resources for the ethernet. He posted some kernel messages, one of which was "kernel: cs: socket 0 timed out during reset", and Linus replied that that particular error message was probably fixed in 2.3.51; he added that the other stuff Dominik mentioned might be unrelated, but at least the socket 0 message appreared to be gone. He explained, "It's due to the TI cardbus controllers not correctly sensing the power of the inserted card, so the higher level layers will try to apply 5V power and it goes to hell from there. I added code in 2.3.51 to notice when the power sense is wrong and force a new VS sense event." Dominik replied that 2.3.51 did indeed get rid of the message, though the system still wouldn't work with that hardware combination. However, Stefan Traby also replied to Linus, saying that on his HP Omnibook 4100, 2.3.51 didn't get rid of the message. There was no reply to him, but Richard Gooch replied with consternation to Linus' explanation, asking, "Argh! Does that mean 5V is being applied to 3V cards and said 3V cards are being toasted?" Linus replied:

No. The controller has over-voltage protection, so it just means that it won't work.

The problem is actually due to a unlucky interface between the low-level slot driver and the card services layer: there's a flag that says "I'm a 3V card", but there is NOT a flag that says "I'm a 5V card". So what happened here is that

The fix in 2.3.51 is for the low-level driver to notice that neither the 5V nor the 3V flags were set, and if that happens it will try to force a re-sense of the card.

I've seen this with the TI1225, and it does not happen with my Ricoh controller in my Sony VAIO that I did most of the development on. So I think it's actually a buglet in the TI core.

Alan Cox replied to Linus' original announcement:

You are more optimisic than me on time scales. I have to agree that making 2.4pre the freeze and fix stuff makes sense. Its a clear line.

I assume the raid stuff is in 'pending'?

From my side I'll push you 2.2.15pre and the remaining unmerged drivers (COMX) this weekend. That makes sure 2.4pre is in sync with 2.2.15 fix ups. I'll also update the jobs list

There is a big i2o block driver update pending from Intel. If its sane I'd like to get it in, if its not sane then I need to get bits of it in fixed up to handle geometry and other compatibility issues. This wont stray outside drivers/i2o

Umm Apache 2.0, Xfree 4.0 and 2.4pre all in a week. This is going to make the ftp archives suffer 8)

Linus also replied to his own original announcement, quoting a private email from Ivan Passos, in which Ivan announced, "I have a new synchronous card driver which I haven't submitted yet to you because there is still one issue to be cleared. If I don't get it on 2.3.x, does it mean it won't get to the 2.4.x src tree?? This is a completely new driver. You mention to "see you in 2.5.x" scared me... :)"

In his reply, Linus said:

I can add it during 2.4.x - adding new drivers is not a problem, the same way we have added new drivers to 2.2.x.

But at this point I'd prefer to not get it for the pre-2.4.x series, just to make the patches smaller and hopefully bug-fixes only.

The only special case would be if somebody comes out with a driver that is for something =SO= popular that I think everybody will want it in 2.4.0 and not have to wait. I can't think of any such thing off-hand, but maybe a softmodem driver would count for psychological reasons or something like that.

Pavel Machek replied peripherally, "FYI softmodem driver for lucent modems is there, but it stays completely userspace, because it is more convient that way. No, you can't do data transfers with it. You can use it as telephony card, though. But v.34 is userspace thingie, anyways."

Artur Skawina had a problem actually booting 2.3.51; he reported, "it just manages to launch init, but then i get a stream of "respawning too fast, disabling for 5 minutes" and that's it." Linus had a suggestion that didn't pan out, but Artur started to suspect that 'bash' was actually the problem. He mentioned, "turns out bash (ie the dynamic linker) gets killed with a SIGBUS, after it maps a zero-length "/etc/"" and added, "ahh, temporarily removing that file finally gives a bootable system." Linus slapped himself on the forehead, and exclaimed:


This is another change in 2.3.x behaviour: it is a POSIX requirement that I don't particularly like, but there you have it. Any access past the last page of a file should give a SIGBUS. Previous Linux behaviour was to just map in a zero page.

I will leave the SIGBUS behaviour, andif this is the only program that breaks due to new POSIX conformance, I will consider us very lucky indeed.

Finges crossed. If some other major package breaks we will probably have to forget that particular conformance detail..

And Jamie Lokier put in, "Fortunately, reading any byte of a zero length /etc/ (a text file) and expect to see a zero looks like a blatant bug in the dynamic linker."

But the reply to Linus' original announcement that spawned the most discussion, came from Hans Reiser. He said:

We now have a working port of reiserfs for 2.3.49, and I am not sure whether you consider us pending. Can reiserfs get in? Putting us in as an experimental file system until we are accepted by the community as known stable is just fine. Our 2.2 version seems to be accepted by the users on our reiserfs mailing list as stable.

We'll port it to the new 2.3.51 starting immediately, the 2.3.49 version will hit our webserver in a few hours.

Sorry we tweaked longer than we should have, and created inconvenience for you.

Chris Mason, in reply to this, posted a long announcement for the latest reiserfs patch:

ReiserFS for 2.3.49 is now available for testing, comments, and review. You can get the code from:

This is beta code, and support for the disk format in our code for the 2.2. kernel is turned off right now (more on that below).

We realize this isn't as useful as a 2.3.51 patch, which we are working on right now. Looking through the VFS changes in 2.3.51, I don't anticipate problems porting up, but I wanted to get this code out just in case. Look for the 2.3.51 patch Sunday night, perhaps on Monday if things get really ugly.

I've added a new call in the super_operations struct, called read_inode2. This is a kludge to pass all 64 bits reiserfs_read_inode needs to find something on disk. As I understand things, the VFS changes in progress to make knfsd happy will allow us to get rid of the read_inode2 hack.

Big differences from our 2.2 code:

We've changed our stat data and the keys used to find objects in the tree. We have code in place to support the old disk format, but we concentrated on testing the new format first, so that compatibility code is disabled right now.

You won't be able to mount a 2.2 formatted disk with this patch. The new format was designed around making old format support easy, we are committed to providing it once we are convinced the new code is solid.

The horrible things we did to linux/fs/buffer.c have been removed.

64 bit file offsets, and disk format should be safe for use on alphas. The alpha port is not finished, but it is at least possible now. If you are interested in helping, please let us know.

page cache integration resulted in big changes in when we pack file tails. Files with tails will be slower than they were in the 2.2 code, until we make better use of the address space operations.

What still needs to be done:

2.3.51 port, more testing.

new format support ported back to 2.2.X kernels. This comes after testing the old format support in 2.3.X kernels, so it should take at least a month.

fsck needs to be ported into the new format, so does the resizer.

memory pressure hooks would be very useful. We have a hard to reproduce deadlock under very high swap load, where kswapd ends up waiting on the log while trying to flush inodes to disk. I have a work around planned (all in the journal code), it won't take long to code, but it will take a few days to test right.

Code cleanup (esp. the journal API)

journal tuning code, including per filesystem journal sizes. I've been promising this forever. Now that our 2.3 changes are almost done, I can finally think about adding it. The super has all the fields need to make this happen already.

A few misc bug fixes from the 2.2.X code need to be ported in.

Again, this is beta code, please don't put data on it you care about yet.

Chris Evans replied, "Thanks for the detailled post on the status. One piece of information was missing though, which I think might be of interest to readers of this list; how fast is this beast ;-)" Hans replied at length:

We don't yet have a set of valid benchmarks. I can generally say that I don't have confidence in the benchmark performance of this code, there are too many tweaks not yet done, and our goal for Monday is a port to 2.3.51 that does not crash. I hope Linus will put this port in as an experimental filesystem, and then a bit later we'll be able to tell him the performance merits making it a more than experimental use FS.

We have some tail packing algorithms new to 2.3 that are worse in some benchmarks than the 2.2 code. This will be fixed after more time for tweaking (or going back to the old algorithms) passes.

It has long been an architectural weakness of ReiserFS that we allocate block numbers to buffers as we need each buffer. We knew that we should cure this by doing either pre-allocation of blocknrs as ext2 does (meaning allocate many blocks at a time, and then keep them reserved for that file until file close), or allocating block numbers on dirty buffer/page flush as XFS does. It is for sure the case that our current practice of allocating blocks as we need new buffers to write into is a bad architecture. Yura wrote some pre-allocation code that improved our performance by as much as 4x in some benchmarks, but I discouraged it because I prefer allocate on flush. This decision is hurting us at the moment. Pre-allocation is much easier to code (and already written by Yura for an older version), allocate on flush is the right answer long-term. Zam has been working on allocate on flush for about 6 weeks, but it is not ready yet. I think we will have Yura throw in his pre-allocation code until the allocate on flush code is completed. I hypothesize (not enough data yet to consider it an observation) that having good SMP in VFS makes writes more evenly interleaved using our current block allocation bad architecture, and more evenly interleaved is bad for performance.

There is also unfinished work to decrease the lock granularity throughout reiserfs. We don't yet know how much effect that has, but it surely has some.

So, in summary, 2.3.51 ~Monday, not sure how many days for pre-allocation but not a lot. We really need that allocate on flush code and a few more tweaks in various places before we can start smiling at our benchmarks and taking weekends off again, but until then I hope we will at least be admissible as an experimental FS. Everyone is working quite hard.

Alexander Viro spoke up with some discomfort (and received several replies), "Ahem. I really hate to say it, but I dearly hope that it" [inclusion in the main tree] "will not happen right now. IMNSHO reiserfs in the official tree right now is the worst thing one can do to VFS. If my opinion on that is of any interest - please, don't do it."

Chris M. asked for some clarification, and Alexander exploded with technical criticisms, culminating in:

Please, start with removing the crud from your code. It's the same picture all over the place. Excuse me, it's kinda hard to consider the patch seriously when it's bloody obvious that nobody cared to check what happened with VFS during the last two years unless the changes broke compile. I _know_ that not all changes of rules were detectable that way. And I wouldn't bet a dime that you got them right (especially since one example of the contrary can be found two paragraphs above). It's not like that was a 2.3-only thing - your code is incorrect for 2.2 since the February/March 1999...

When this sort of crap will be cleaned up and code really audited and _understood_ by your team - fine, there will be a point in talking about this animal more seriously. Right now you are pushing the patch with large chunks of code you've copied years ago and had not maintained since then. Sorry, but it's not a good idea.

Chris M. replied:

Ok, yes, there are places like this, and clearly, they need to go away. We are subscribed to fs-devel, and we are making a real effort to stay more up to date on all the kernel changes (yes, we are on fs-devel). We should have been fixing the problems you described earlier, but it just didn't get done.

I'm sending a patch out in a few hours that will have bug fixes, and a port to 2.3.51. It won't have the cleanup you describe, I'll start on that on Monday.

Many thanks for you comments and review.

Alan Cox also replied to Alexander's initial appraisal, with, "I'd like to see reiserfs merging done in 2.5.x as well not 2.4. People can merge reiserfs and test it with 2.4 happily and it gives both the Reiserfs and the kernel folks time to make a better product." Hans replied that a lot of people wanted a journalling filesystem, and that their latest patch against 2.3.51 was surviving all their in-house torture-tests. Alexander replied, "Torture-tesing is no good against somebody who is hunting for races... Get into sufficiently evil state of mind and try to go through the code. Thinking "how could I exploit it". And yes, it requires understanding of what kind of calls can be forced out of VFS. Really. Been there, done that, found quite a few local DoSes and several root exploits. On ext2."

Elsewhere, Hans said to Alexander:

Alexander, if you tell us what you want done when making VFS changes we will do it for you. You don't need to take the responsibility for working on reiserfs code you dislike if you don't want it. Truly. You can go on using ext2 and telling all your friends to use ext2, and when you change VFS just let us know.

On the other hand, I am very happy to get flames from you which are specific in what they complain of. It is bedtime for me now, but I am cc'ing the below to Vladimir and Alexei in Russia. It would be nice if you would frame your remarks in the form of: you do A, B is the right thing, or A is bad because of such and such, rather than you do A, you are clueless.

The first forms lead to a faster fix. If fix is your objective.

Chris M. replied to this:

Hans, I think Alexander has told us exactly what he wants done. He wants us to audit our entire VFS interface, and make sure that we are dealing with boundary conditions, special cases, and normal cases the same way the existing linux filesystems are. I don't think we need a list from him of the broken segments, we should be able to find them on our own. This is not an unreasonable request from him, and we should not treat it as one.

Once we fix what we can find, we'll send the patch in again, and hopefully the people on linux-kernel and fs-devel audit the code again. This is exactly the kind of feedback I was hoping for, and it is appreciated.

Yury Yu. Rupasov did some tests and said he felt reiserfs was ready, but a bunch of people came down on him for ignoring the real issue of the Virtual Filesystem interface. Jamie explained patiently:

Alex Viro's concerns are not about whether reiserfs passes stress tests. They're about subtle, and not too subtle deadlock, livelock and race conditions, and long term maintainability.

As soon as an fs goes into the kernel, even marked as "experimental", then every time Alex modifies VFS he has to look at all the kernel filesystems and verify that his VFS changes are valid. Sometimes this means he has to change the filesystems themselves. Also Alex Viro is not the only person who does this.

That can only be done if every fs in a single kernel release makes the same set of assumptions about what to test, what locks are held for different functions, when to call iput, dput, d_delete and d_move, whether to check S_ISDIR etc.

Currently reiserfs does not make the same assumptions as the current kernel filesystems. So it cannot be included even though it passes its own stress tests: including reiserfs would create too much work for people modifying and auditing VFS.

Sure you could keep a note with it saying "this filesystem is not maintained in the kernel". But then there's no point including it in the kernel tarball.

Others replied more abruptly to Yury, and at one point Linus stepped in, with:

Now, now, don't be too harsh on the resierfs guys.

Do we suddenly expect code to be bug-free before inclusion into the kernel?

For rather obvious PR reasons I'd love to say "yes, we have a journalling filesystem these days" as part of the 2.4.x release stuff, so it does fall under the "drivers so cool that they might make it into 2.4.x". I don't think I want to see the read_inode() changes, though, that's just too ugly. I may like the PR angle of reiserfs, but that doesn't mean that I'd forget about things like these completely.

But it looks to me as if the read_inode thing plus a few cleanups in raiserfs to take into account that the VFS layer does more these days would certainly make it a candidate for inclusion. Maybe not 2.4.0, but during 2.4.x. Don't be so down on the guys, there are people who really like actively using raiserfs..

Some folks said they didn't mean to sound hostile, and actually wanted the project to be included, but they felt there were problems that needed to be worked on. Hans also replied to Linus, saying he had put 2 coders full time on the VFS fixes. Alexander also clarified the problems he felt should be cleaned up:

OK, let me summarize my position on that:

  1. namespace-related methods _must_ be cleaned up. I definitely can just rip the extra checks out of the thing, but there are questions about correctness of said action. _If_ that was just the case of "replay all global changes that happened during last two years" I would be certainly PO'd but I would do it. However, some of those places do funny things with checks for ->i_count and friends. They may be harmless atavisms. They may be actual bugs. And they may be band-aids that cover real problems 99% of time. I don't want any responsibility for that stuff - analysis of reiserfs wrt races will involve going through the guts of their code and that's about 200Kb of stuff to read. IMO that's work for the reiserfs authors.
  2. some things are obvious bugs and should be fixed both in 2.2 and 2.3 - e.g. d_move() in rename() is _wrong_ since 2.2.<early>.
  3. having ~30Kb of unmaintained code in the patch is Bad Thing(tm). What about the rest? Obviously journalling code _is_ maintained (at least I _seriously_ hope that it is). Ditto for the allocation stuff, but here there may be need to account for big lock changes. And I would be much happier if somebody looked through ->truncate() searching for races. Andrea?
  4. could you please convert symlink.c to pagecache? BTW, that would kill ugly REISERFS_KERNEL_MEM/REISERFS_USER_MEM stuff.
  5. personally to Hans: please, _please_, lose references to Plan 9 namespaces in the documentation. Trust me, they are _NOT_ what you think they are (at least according to what you had written). It wouldn't be related to patch, but you are bringing the URL of that text into the tree. I'm 100% serious - those references may be good for marketing, but they'll make us a laughing stock for everyone who knows what Plan 9 namespaces are about. Hint: namespaces are _not_ about "everything can be viewed as filesystem". It's pure VFS thing and _nothing_ in your patch touches even remotely related areas.

There followed an implementation discussion between him, Hans, Chris M., Matthew Dietrich, Manfred Spraul, and Andrea Arcangeli.

3. Alan's Task List For 2.4: Saga Continues

10 Mar 2000 - 18 Mar 2000 (37 posts) Archive Link: "Running JOBS list."

Topics: Disk Arrays: RAID, Disks: IDE, Disks: SCSI, FS: NFS, FS: NTFS, Framebuffer, I2O, Networking, PCI, Power Management: ACPI, SMP, Virtual Memory

People: Alan CoxJes SorensenDavid S. MillerJens BeneckeAlexander ViroJames SimmonsArjan van de VenStephen FrostIan PetersAndrea ArcangeliDaniel Kobras

Alan Cox's task list was last covered in Issue #56, Section #3  (10 Feb 2000: To Do For 2.4: Saga Continues) . This time, Alan posted the latest version of his list of tasks to do before 2.4; but he acknowledged that it might have gotten a little out of date since he was busy with other stuff. He invited corrections, and listed:

  1. In Progress
    1. Merge the network fixes (DaveM)
    2. Merge 2.2.13/14 changes (Alan, all done barring COMX and Sk98)
    3. Get RAID 0.90 in (Ingo)

  2. Fix Exists But Isnt Merged
    1. Signals leak kernel memory (security) [JJ has fixes]
    2. msync fails on NFS [Wrong return]
    3. Semaphore races
    4. Sempahore memory leak
    5. Exploitable leak in file locking [Lock limit needed]

  3. To Do
    1. Restore O_SYNC functionality
    2. Fix eth= command line
    3. vmalloc(GFP_DMA) is needed for DMA drivers
    4. VM needs rebalancing
    5. Fix SPX socket code
    6. put_user appears to be broken for i386 machines
    7. Fix module remove race bug
    8. Test other file systems on write
    9. Directory race fix for UFS
    10. Audit all char and block drivers to ensure they are safe with the 2.3 locking - a lot of them are not especially on the open() path.
    11. Stick lock_kernel() calls around driver with issues to hard to fix nicely for 2.4 itself

  4. To Do But Non Showstopper
    1. Make syncppp use new ppp code
    2. Finish 64bit vfs merges (lockf64 and friends missing)
    3. NCR5380 isnt smp safe
    4. DMFE is not SMP safe
    5. ACPI hangs on boot for some systems
    6. Get the Emu10K merged
    7. Finish I2O merge
    8. Go through as 2.4pre kicks in and figure what we should mark obsolete for the final 2.4
    9. Per Process rtsigio limit

  5. Probably Post 2.4
    1. per super block write_super needs an async flag
    2. addres_space needs a VM pressure/flush callback
    3. per file_op rw_kiovec
    4. enhanced disk statistics

  6. To Check
    1. Truncate races (Debian apt shows it nicely) [done ?]
    2. Elevator and block handling queue change errors are all sorted
    3. Check O_APPEND atomicity bug fixing is complete
    4. Incredibly slow loopback tcp bug
    5. Finish softnet driver port over and cleanups
    6. Page cache high on PAE36 boxes is very slow, maybe disable ?
    7. Protection on isize (sct)
    8. Mikulas claims we need to fix the getblk/mark_buffer_uptodate thing for 2.3.x as well
    9. Fbcon races
    10. Fix all remaining PCI code to use new resources and enable_Device
    11. VFS?VM - mmap/write deadlock
    12. initrd is bust
    13. rw sempahores on page faults (mmap_sem)
    14. kiobuf seperate lock functions/bounce/page_address fixes
    15. Fix routing by fwmark
    16. Some FB drivers check the A000 area and find it busy then bomb out
    17. NTFS needs updating/binning or something
    18. rw semaphores on inodes to fix read/truncate races ?
    19. Not all device drivers are safe now the write inode lock isnt taken on write
    20. File locking needs checking for races
    21. Multiwrite IDE breaks on a disk error
    22. AFFS doesn't work on current page cache

Ian Peters replied that as far as he knew, Andrea Arcangeli took care of item 6.1 (Truncate races (Debian apt shows it nicely) [done ?] ) as of 2.3.48; there was no reply.

James Hayden asked why the Pentium III optimizations were not on Alan's task list, and Alan replied, "Fair question. Actually the evidence is that the optimisations are not a win but that the kernel support for user usage would be. I'll tack it onto the list." Jes Sorensen replied, "I saw significant performanceincreases on Gigabit Ethernet when I played with the patches back in 2.3.10, but it's been a while, I don't know if they still make such a big difference."

Daniel Kobras asked if netfilter had been merged yet, and added that he felt it should definitely be on the task list, given its importance. Stephen Frost replied that it was partially merged, and that David S. Miller was working on it as part of David's work on item 1.1 (Merge the network fixes (DaveM)); David, also in reply to Daniel, said:

Actually, all of the core netfilter support is merged.

The only thing left are the modules themselves, and that should be as simple as adding a new driver so in theory it could even occur during 2.4.x if needed.

Elsewhere, Giacomo Amabile Catenazzi suggested that better documentation should be on Alan's task list as well. Jens Benecke replied sorrowfully, "Yes. That is one of the main problems I have when trying to get some exotic module loaded. For most of the time, I know the parameters by now - but sometimes (e.g. NCR400a SCSI, which I never got to run properly) I had to wade through the source - a real PITback." [...] "I think the current situation (README's scattered through the source tree) is inacceptable for many people. Having to actually compile your own kernel is a seldom task with todays' distros, so browsing the source itself does not even occur to many people when looking for driver information." Several people suggested moving the READMEs into subdirectories of /usr/src/linux/Documentation, and Arjan van de Ven suggested following Alan's lead, and standardizing on 'gdoc', which Alan had been using to document the networking code.

Alexander Viro replied to several items in Alan's task list. To item 3.7 (Fix module remove race bug), he explained, "Infrastructure is in place now and I'm going through the rest of fs-related stuff. Then it will be a lot of mess with drivers/ldiscs/protocol families." To item 6.1 (Truncate races (Debian apt shows it nicely) [done ?]), he replied, "99% done. Remaining 1% is the i_size handling in CODA. We have almost everything to do it now - it's on the list." To item 6.7 (Protection on isize (sct) ), he replied, "It's mostly done now. If Stephen is going to return to that stuff - fine, but then we'ld better go through the code together - I've tweaked it a lot and will touch it one more time." To item 6.17 (NTFS needs updating/binning or something), he felt that this was already fixed, though he added that he'd recheck it himself. But he added, "NTFS is unmaintained and I'ld just bitbucket the current variant (sharing codebase between the kernel and userland library is *WHAM* bad*WHAM* idea *WHACK* *SPLASH*)."

To item 6.9 (Fbcon races ), James Simmons explained:

I didn't finish the fbdev API changes. Its better but not perfect. You will see. I sent a patch to Geert to look over before he sends it to linus that updates vfb.c to show how a 2.4.X driver should be written. As for races. Its still not really multihead friendly. Neither is the console code as several of us with multihead system discover. SMP and multihead and console system don't mix :( The con2fb mapping is just a mess. Things like fbcon=map:0110 bombs if you don't have a second video card. Their is no way to fix this. Its a chicken and egg problem. fbcon_setup is called before fbmem_init so you can't do any sanity checking. The con2fb mapping stuff is just pure crap. Its just a pure hack to get multihead somewhat working. I hate it and can't wait to see it removed in 2.5.X.

The second problem is vgacon and fbcon. Until this last kernel you can boot vgacon and then inmod a fbdev driver and then fbcon would take over. Now the problem is most video card when they go from VGA mode to a non VGA mode can NOT go back to VGA mode. So if you go to rmmod the fbdev driver you are fubar. I just made a patch that changes the files to have it so you either have to choose fbcon or vgacon. Its one or the other. This brings me to another question. Can prom console run at the same time as the framebuffers on a sparc station ? Same for some of the SGI stations. Can a newport console also run at the same time as a framebuffer console. This way my patch can ensure you can select between fbcon or some other console type.

And to the related item 6.16 (Some FB drivers check the A000 area and find it busy then bomb out), he went on, "I know cirrus, dnfb, and vga16. I can see vga16 since this is really for a isa card. vga16 should be called last since many cards start in VGA mode program a few registers then switch to MMIO thus freeing this region. Its then safe for the next card to do this. As for cirrus and dnfb. I don't have the docs for these cards but I assume it should be possible to switch to MMIO mode for these cards."

4. More Of Alan's Task List For 2.4: Saga Continues

12 Mar 2000 - 18 Mar 2000 (78 posts) Archive Link: "Linux Jobs: Update"

Topics: BSD: FreeBSD, Compression, Disk Arrays: RAID, Disks: IDE, FS: Coda, FS: NFS, FS: NTFS, FS: UMSDOS, I2O, Networking, PCI, Power Management: ACPI, SMP, Sound: SoundBlaster, Virtual Memory, VisWS

People: Alan CoxDavid S. MillerChris WedgwoodMike A. HarrisBill WendlingVictor KhimenkoStephen C. TweedieGerhard MackWakko WarnerSteve DoddJeff Garzik

Alan posted his latest task list:

  1. In Progress
    1. Merge the network fixes (DaveM)
    2. Merge 2.2.15 changes (Alan)
    3. Get RAID 0.90 in (Ingo)

  2. Fix Exists But Isnt Merged
    1. Signals leak kernel memory (security)
    2. msync fails on NFS
    3. Semaphore races
    4. Sempahore memory leak
    5. Exploitable leak in file locking

  3. To Do
    1. Restore O_SYNC functionality
    2. Fix eth= command line
    3. vmalloc(GFP_DMA) is needed for DMA drivers
    4. VM needs rebalancing
    5. Fix SPX socket code
    6. put_user appears to be broken for i386 machines
    7. Fix module remove race bug
    8. Test other file systems on write
    9. Directory race fix for UFS
    10. Audit all char and block drivers to ensure they are safe with the 2.3 locking - a lot of them are not especially on the open() path.
    11. Stick lock_kernel() calls around driver with issues to hard to fix nicely for 2.4 itself
    12. PCMCIA/Cardbus hangs, IRQ problems, Keyboard/mouse problem (related ?)
    13. Tulip hang on rmmod
    14. Use PCI DMA by default in IDE is unsafe (must not do so on via VPx x<3)
    15. Use PCI DMA 'lost interrupt' problem with some hw [which ?]

  4. To Do But Non Showstopper
    1. Make syncppp use new ppp code
    2. Finish 64bit vfs merges (lockf64 and friends missing)
    3. NCR5380 isnt smp safe
    4. DMFE is not SMP safe
    5. ACPI hangs on boot for some systems
    6. Get the Emu10K merged
    7. Finish I2O merge
    8. Go through as 2.4pre kicks in and figure what we should mark obsolete for the final 2.4
    9. Per Process rtsigio limit
    10. Boot hangs on a range of Dell docking stations (Latitude)
    11. Port SGI VisWS to 2.3.x or mark obsolete
    12. S/390 Merge
    13. HFS is still broken
    14. iget abuse in knfsd
    15. Mark NTFS as obsolete
    16. via rhine oopses under load (softnet ?)
    17. Symbol clashes and other mess from _three_ copies of zlib!
    18. Paride seems to need fixes for the block changes yet

  5. Compatibility Errors
    1. Shared memory changes change the API breaking applications (eg gimp)

  6. Probably Post 2.4
    1. per super block write_super needs an async flag
    2. addres_space needs a VM pressure/flush callback
    3. per file_op rw_kiovec
    4. enhanced disk statistics
    5. AFFS fixups
    6. UMSDOS fixups resync

  7. To Check
    1. Truncate races (Debian apt shows it nicely) [done ? - all but Coda]
    2. Elevator and block handling queue change errors are all sorted
    3. Check O_APPEND atomicity bug fixing is complete
    4. Incredibly slow loopback tcp bug
    5. Make sure all drivers return 1 from their __setup functions
    6. Finish softnet driver port over and cleanups
    7. Page cache high on PAE36 boxes is very slow, maybe disable ?
    8. Protection on isize (sct) [Al Viro mostly done]
    9. Mikulas claims we need to fix the getblk/mark_buffer_uptodate thing for 2.3.x as well
    10. Network block device seems broken by block device changes
    11. Fbcon races
    12. Fix all remaining PCI code to use new resources and enable_Device
    13. VFS?VM - mmap/write deadlock
    14. initrd is bust
    15. rw sempahores on page faults (mmap_sem)
    16. kiobuf seperate lock functions/bounce/page_address fixes
    17. Fix routing by fwmark
    18. Some FB drivers check the A000 area and find it busy then bomb out
    19. rw semaphores on inodes to fix read/truncate races ? [Probably fixed]
    20. Not all device drivers are safe now the write inode lock isnt taken on write
    21. File locking needs checking for races
    22. Multiwrite IDE breaks on a disk error
    23. AFFS doesn't work on current page cache

New in this list were items 3.12 through 3.15, 4.10 through 4.19, section 5 (compatibility errors), items 6.5 and 6.6, 7.5, and 7.10. Items that had changed (not including changes of numbering) were items 1.2, 2.1, 2.2, 2.5, 7.1, 7.8, and 7.19. Item 7.17 (NTFS needs updating/binning or something) from the previous list had migrated from the "To Check" section, to item 4.15 in the "To Do But Non Showstopper" section.

David S. Miller replied to item 1.1 (Merge the network fixes (DaveM)), with, "The only thing left are the netfilter modules from Paul Russel, and I imagine I haven't seen that patch yet due to the AU conference last week."

David also replied to 7.4 (Incredibly slow loopback tcp bug), saying he hadn't been aware of this bug, and asking for more information. Chris Wedgwood replied that Linux's bw_pipe() function was apparently only half the speed as FreeBSD's. After a bit of back and forth, David posted a patch and reported, "Ok, the difference is that they" [FreeBSD] " are doing half as many memcpy's as us for large pipe writes. This should make Linux push bulk pipe data as fast as FreeBSD. Give it a try, let me know how things look now ok?" Chris posted some benchmarks showing David's new version to be 25% faster than the FreeBSD version, and hollered, "Freeeeooooowwww! I must say Dave, when I saw your patch I though it would help but I didn't expect a better than 2x increase. Impressive"

Wakko Warner objected to Alan's item 4.15 (Mark NTFS as obsolete), asking what would replace NTFS in this case, since he had a dual Debian/NT system and had to read between them. Jeff Garzik and Alan replied that any code would be marked obsolete if it didn't have a maintainer. Steve Dodd volunteered, not to maintain the code, but to maintain an "outstanding issue" list on the web. Several people, in reply and elsewhere, reported that the code seemed to work perfectly. Only Craig Whitmore found it to be very buggy, although there was no follow-up discussion.

Gerhard Mack started a lengthy thread in reply to Alan's list, asking if anyone had fixed autodetection of Sound Blaster cards. Alan replied that autodetection had never existed to break. He added that there was, however, plug-n-play detection. Gerard replied that with 2.3.48, the sound card's IO address was apparently detected automatically, while in 2.3.51, it failed and insisted that the IO port be specified beforehand. The only difference between the two kernels was that one was compiled with 'make oldconfig'. Mike A. Harris replied, "I fixed this bug and another in the soundblaster code a billion years ago back in 2.0.35. If I'm not mistaken, Alan either accepted my patch or worked it into 2.0.36 manually." [...] "I can hunt it down and fix it again for 2.3.x if someone hasn't allready. It was a 2 liner for the DMA/midi prob, and one line removed for the irq thing.. simple."

Alan also replied to Gerhard, saying that Gerhard's 2.3.48 kernel had been compiled with the needed values in, while the 2.3.51 hadn't, and had not had them specified on the command line either. Gerhard took a look and saw that Alan was in fact right, but he also saw that 'make menuconfig' for 2.3.51 was missing the entire sub-menu under Sound Blaster support, while 'make menuconfig' for 2.3.48 had it. Kjartan Maraas replied that this change was described in '': the information was now a command line option, to be included in 'lilo.conf'. Bill Wendling let out a scream, and yelled, "WHY?!?! This is horrid! No longer can I simply reboot my machine but I have to use command line options?" But Victor Khimenko replied with some annoyance, "Argh. What's the screams all over the place ? You CAN NOT boot Linux without command line option. I repeat. You CAN NOT boot Linux without using command line option. Never was, never will. You MUST specify at least one option: root=blah-blah-blah ... This option is not specifyed as append=... in LiLo but, for example, in loadlin you must specify it explicitly. And compiled defaults (where you need to RECOMPILE kernel just to reflect changes in jumper settings) are kludges. Thus they were removed (not all I afraid)." And elsewhere, Stephen C. Tweedie put in, "you either install the command line options via lilo, or you use modules. Editing conf.modules once to set your sound module parameters means that you don't have to worry about the state of the kernel config files every time you build a new kernel."

5. Mounting 'shm' Someplace Other Than '/var/shm'

13 Mar 2000 - 15 Mar 2000 (13 posts) Archive Link: "2.3.51-52.pre1, shm and mounting somewhere else than /var/shm..."

Topics: FS: devfs, FS: sysfs

People: Albert D. CahalanChristoph RohlandH. Peter AnvinPetr VandrovecWerner AlmesbergerJeremy Katz

Petr Vandrovec wanted to mount the System V Shared Memory module someplace other than '/var/shm', such as '/shmfs', but found that it apparently wouldn't work under that directory. Jurgen Botz recommended, 'echo /shmfs > /proc/sys/kernel/shmpath'; and Jeremy Katz pointed out that the documentation specified '/proc/sys/kernel/shmpath' as the place to alter the path if one wished. Werner Almesberger, H. Peter Anvin, and Albert D. Cahalan asked why this was necessary; Albert posed the question:

Documentation doesn't a feature make. It seems odd that the kernel would not know where a filesystem is mounted. The kernel, more than anything else, ought to be aware of filesystem mount points.

BTW, /var/shm is a bad default. Stuff like this should be right on the root because /var may be a separate filesystem that gets mounted after the system already needs shared memory. (it would be even better to let this filesystem hang loose, active but not part of the normal tree at all)

Christoph Rohland, the author of the feature, replied that he'd originally coded it to autodetect the mount point, but that this had turned out to be a really ugly hack, and he'd removed it. He explained, "Unfortunately the fs does not get the mount point from anywhere. The only function called by the VFS ist read_super, which does not get the information. If you want to autodetect, you have to play tricks with dcache and the path to your root inode." But he added, "I am very open to proposals for other mount point defaults. But personally I really do not like any further root directory cluttering. Another idea would be /dev/shm like /dev/pts but I wanted to avoid clashes with devfs." H. Peter liked the idea of '/dev/shm', and explained, "It should be possible to mount on top of devfs just like any other filesystem, although devfs of course needs to make the mount point appropriately. /dev in fact would probably be the correct location these things, being a kernel interface. HOWEVER, sometimes avoiding "root directory cluttering" seems to be a little too much of a goal in itself. Sometimes adding items at the root is entirely appropriate."

6. IPv6, FreeS/WAN, And Crypto In 2.4

13 Mar 2000 - 14 Mar 2000 (5 posts) Archive Link: "Query: advanced routing"

Topics: Networking

People: Sandy HarrisH. Peter AnvinLars Marowsky-Bree

Sandy Harris asked, "what about IPv6 and the FreeS/WAN IPSEC code? Any plans to merge those into the mainstream?" H. Peter Anvin replied, "IPv6 still breaks with regularity; however, it's just a stability issue. As far as FreeS/WAN is concerned, with the new policy it's just a matter of submitting patches to Linus."

Lars Marowsky-Bree asked if crypto support would make it into 2.4, or would it have to wait until 2.5; H. Peter replied, "I talked with Linus about this, and he says he thinks it can get merged into 2.4 assuming it is reasonably separate, but "probably not for 2.4.0.""

7. Tulip Driver Developer Flame War

13 Mar 2000 - 20 Mar 2000 (71 posts) Archive Link: "2.3.51 tulip broken"

Topics: Backward Compatibility, Clustering: Beowulf, MAINTAINERS File, Networking, PCI, SMP

People: Donald BeckerJeff GarzikDavid FordLinus Torvalds

In the course of argument, Donald Becker said to Jeff Garzik, "you didn't understand the task you were taking on when you decided to take over maintaining the Ethernet drivers. It took years to write the driver set -- it's something you can just pick up in a few months. And expecting me to now fix or maintain your hacked up code branch is just completely unreasonable." Jeff replied with venom:

No one expects anything from you and has not for a long time. If you wanted to actually WORK on the drivers, rather than just complain, then I'm sure many people including myself would find that work very valuable.

It is unarguable that you have more experience with these net drivers, but when was the last time you submitted a patch to Linus? Two years ago? Three? More?

I never claimed to be perfect but at least I am trying to fix some of the the bit-rotten, UNMAINTAINED net driver code currently in the kernel.

Finally, only the Tulip and RTL-8139 drivers are maintained by me. It takes a two-second grep of the MAINTAINERS file to discover this fact. Other drivers are still unmaintained, except when I get spare time to hack on them. (or when others send me patches for them)

Elsewhere, Jeff went on, "Donald, I, and others all seem to agree that having his drivers and the kernel drivers diverge is a poor situation. However, while Donald continues closed source development with periodic code drops, and does not work with other kernel developers when creating infrastructure, I do not see a resolution to the situation any time soon." David Ford replied angrily, "Please explain how his code development is closed source? This is totally BS and you know it. All the code is available, all the list discussion is available, and patches and requests are accepted all the time. Quit it. His development is quite open. Resolutions come when the mud stops being thrown." Linus Torvalds replied:

David, pipe down.

You seem to like the approach Donald has taken. But take it from me, it DOES NOT WORK.

The problem is that maintaining the drivers in their own small universe means that only those people who follow the driver development will ever even test them. Which means that people for whom the old drivers are broken are basically the only ones testing the new drivers - and when a new driver is finally integrated into the base kernel it won't work for a lot of people that had a perfectly working driver in the old version.

And don't even bother telling me it doesn't happen. It happened to _me_ more than once. I decided that I should try to fetch Donalds latest and greatest _despite_ the fact that he hadn't even bothered to send it to me or to Alan, and it simply didn't work on the very basic hardware that I had.

I fixed the tulip driver at least twice to work with the media detection, and sent Donald email about what I had done and why (why: without those fixes it would notever get a link on the machine that I used). I don't know if my fixes ever actually made it into Donalds version, because after the second time I just stopped bothering trying to re-fix the same thing, and I never updated his driver again.

In contrast, what Jeff and others have done have been of the type where immediately when a fix is made, it is released. Which means that if there are problems with it, people who follow new kernel releases will know. Immediately. Not in a few months time when the next "driver release" happens.

This is what Jeff means with "closed source". Yes, the sources are there. Yes, they get released every once in a while. But Donald doesn't let people _participate_. He thinks he is the only one who should actually touch the driver, and then he gets very upset when things change and others fix up "his" drivers to take into account the fact that the interfaces changed.

I accept patches from anybody. I accept patches from people other than Jeff. If somebody sends me a bugfix for a driver, and it is so obvious that even I can tell that it must be better than the current code, I will apply it. Obviously, in many cases I cannot make a good judgement (because I don't know the hardware or other issues well enough), and that is when having a maintainer is important.

If anybody thinks that being the maintainer equals being in 100% control, then I don't think they have understood the TRUE meaning of Open Source. Open source is about letting go of complete control. Accept the fact that other people are wonderful resources to fixing problems, and let them help you.

It's about accepting the fact that open source means that interfaces will change. Not whining about it. And when somebody else steps forward that does a better job of maintaining a driver, accept it gracefully.

Jeff also replied to David:

Donald's development is not open AT ALL. Read Donald's own description of how he developed the 2.3 network drivers and interface (pci-netif). He disappears for many months, creates a design without interfacing with kernel developers, and then appears again with a code drop.

It is classic cathedral style of development. Read Eric Raymond's paper on why the bazaar method is far, far superior. The Linux kernel is the bazaar method, and this is the central conflict which forced the kernel and Donald drivers to diverge.

Yes, the end result of Donald's work is open source, but his development is not open at all. And therein lies the problem [which existed far longer than I have been hacking on the net drivers...]

Donald replied to Jeff:

A quick search of the two very active Tulip mailing lists reveals that you have contributed nothing until this year. Apparently you were not even a subscriber until then, and know nothing about the very open way development has been done. Yet you willing throw around pejorative phrases like "cathedral style" -- a hot button in this community.

For those not interested what superficially appears to be a kernel power grab, there are issue underlying all of what appears to be a personal conflict.

  1. Should the kernel source code interfaces, for well-understood interfaces, be stable? (We are solidly committed not having a binary interface, so bringing that up is a red herring.)
  2. Given that development kernels are frequently unstable in some unexpected way, is is reasonable force testing of driver changes combined with unknown other changes?
  3. Given that the kernel continues to exponentially increasing in size, should all development go through the latest development kernel?

I feel the network driver interface from 1.2.* through 2.2.* is the cleanest interface in the kernel. It's possible to add most new drivers to the kernel without modifying or recompiling the kernel source. I like to think that I influenced the clean design.

Compare that to the filesystem code, which requires that the kernel be reconfigured and recompiled if you wish to add a new filesystem. Or a new block driver, where there is a similar situation with block.h. Both the VFS layer and the block driver interface should be very well understood, but the nicely designed interfaces were quickly corrupted with ugly hacks.

I think the continued viability of a monolithic, single-point kernel source tree should be questioned. The average *compressed* patch size for the 51 kernel patches since 2.3.1 is over 346KB, totaling just under 18MB. When uncompressed, that's over 1MB per patch, far more than it's possible for anyone to reasonably review.

The usual justification for this scale of change is that only the developers should be using the development source tree. But the "official" advice to anyone with device problems, frequently repeated on the kernel mailing list, is to run the latest development kernel.

To Donald's statement that the network driver interface from 1.2.x through 2.2.x was the cleanest in the kernel, Linus replied:

You're basically the only one thinking so.

The fairly recent changes in 2.3.x (the so-called "softnet" changes) are just incredibly more readable and robust than the old crap was that I don't see your point at ALL.

Just about every single network driver out there was SERIOUSLY broken wrt SMP and locking. I know, I had fixed many of them. The games the drives played with timeouts, "dev->interrupt", "dev->tbusy" etc were just incredibly baroque, and had absolutely NOTHING to do with "clean".

All of that crap is gone, and it was much overdue.

And it required every single networking driver to change. Tough. But that's the advantage of open source - in a closed source binary interface world we would have been basically unable to clean things up. We would have had to maintain some ridiculous backwards compatibility layer, making drivers and networking harder to understand.

The PCI layer changes are similar. Yes, we basically got rid of the old code that used to have "slot/fn" arguments to the PCI access functions. Instead, the functions got cleaned up, and you have to use "struct pci_device" instead, forcing drivers to be a bit more structured. Changes.

You seem to think that changes are bad. I disagree. I think a cleaner interface is worth just about ANY changes.

Jeff also replied to Donald, saying that his FTP site appeared not to have been updated for 3 months. He quoted several file date-stamps, to which Donald replied:

There is a reason for that for those dates, which you should have picked up on. That was when the driver development issue last came up, and I was thoroughly flamed for separate driver development lists, especially by Jeff. I was told that my contributions were no longer needed.

I decided to take a few month break from my driver update schedule, which was taking up most of my waking hours, to work on Scyld and the Beowulf software. I provide driver updates to clients that value them and write new drivers, but left the kernel development merges to those that obviously wanted control of them.

Trace back to the beginning of the current thread: this round of finger pointing was started because there are driver bugs that haven't been addressed, and 2.4 is about to come out.

It must have seemed like a good idea in Autumn '99, when the kernel was "just about to be frozen" in preparation for late-'99 2.4 release, and money was pouring into anything with "Linux" in the name, to believe that my Ethernet driver development could be better done elsewhere. There were many companies that could suddenly see that PR value in contributing to the kernel. And certainly there are developers that want to see everything brought under the central planning of the linux-kernel list. But replacing my efforts is perhaps not as easy as it first appears. Nonetheless, the decision was made months ago, and it would be very difficult to reverse it now.

Jeff replied:

Oh good grief, save the conspiracies for X-Files.

I won't speak for the motivations of others, but for me the 2.3.x kernel net drivers weren't getting updated, so I decided to play a small part in correcting that situation.

This has nothing to do with money, or control. Just broken drivers.

Elsewhere in an entirely different subthread, Donald argued:

I'm in the increasingly untenable position of being expected to maintain drivers for the current and older kernels, but not having any influence over the new development exactly because of that backwards compatibility. It's no fun being responsible for just the old versions, especially after I did years of unpaid development work.

There were many interface changes added incrementally in the 2.3 kernels. Some with added without consideration of, or even in opposition to, cross-version compatibility. And few of those interface changes were designed, as opposed to just hacked in. When I proposed an new PCI detection interface I wrote a skeleton driver, converted several of my drivers, demonstrated that it worked with several hardware classes and wrote a usage guide. But the few day hack was added because the patches were incremental (even if misdesigned and broken).

Linus replied:

Donald, that's not true, and you know it.

Neither I nor anybody else has expected you to maintain the drivers for quite a long time now - you just didn't seem to have the interest, and a lot of people have acknowledged that. That is why there ARE new maintainers for things like tulip and eepro100, whether you like it or not.

You did not lose influence of the drivers because you want to maintain backwards compatibility. You lost influence over the drivers simply because you never bothered to send in your changes. Don't start blaming anybody else.

You were more interested in making sure your drivers worked with old kernels than you were in making sure they worked with new ones. And now you're surprised that they are only used with old kernels? I don't see why.

8. Philosophy Of Having Debugging Code In The Kernel

15 Mar 2000 - 19 Mar 2000 (12 posts) Archive Link: "[bugfix] SMP, shm-2.3.52-A0"

People: Ingo MolnarJeremy FitzhardingeLinus Torvalds

In the course of discussion, Ingo Molnar explained, "we want to keep 'permanent debugging code' out of the main kernel, as much as possible. There is no problem in having separate debugging patches (such as IKD, which is a much more capable debugging tool than plain asserts). Permanent debugging code pollutes the kernel over time and degrades readability and maintainability." Jeremy Fitzhardinge replied, "Properly used, asserts are not debugging code so much as executable design constraints. They are really useful as in-line documentation. assert(arg != NULL) is much more powerful than a /* arg cannot be NULL */ comment. The issue of whether the assert actually generates code is secondary; the code *should* run the same either way." But Linus Torvalds explained his approach:

I've had this problem before, and I don't like it.

I've worked on projects where the above happened, and it turned out that people ended up doing extra work just to make sure that the asserts were "right", instead of realizing that the thing that the assert "documented" was no longer worth maintaining.

I donot like asserts. I like temporary debugging aides, and I like them just as long as they are _seen_ as temporary debugging aides.

99% of all asserts I've ever seen have been a complete waste of time. They were useful when the code was written, because the code was buggy (or more often the code was not buggy, but all the "surrounding" code that depended on that piece of functionality had not been updated to new semantics). But they become either a liability or just worthless over time.

That's just my opinion, of course. I know that my views on debugging are pretty much scoffed at by a lot of people: I don't like debuggers either (for somewhat similar reasons - my strongly held personal belief is that debuggers tend to cause the _symptoms_ to be fixed rather than the actual underlying bugs that you fix when you think about the problem on a source level).

This is why I like BUG(). Not because it is any different from "assert()" in any real sense, but because it does not have the psychological mindset associated to it that "assert()" has..

9. Spam On linux-kernel

15 Mar 2000 - 17 Mar 2000 (4 posts) Archive Link: "Fantastic filter for linux-kernel"

Topics: FS: ReiserFS, Spam

People: Dominik KublaAndrew MortonJason GunthorpeMike A. Harris

Mike A. Harris said that the following procmail recipe would catch virtually all spam coming to linux-kernel:

:0 W
* ^Sender: owner-linux-.*@vger\.rutgers\.edu
* ! ^(((To|Cc):)|( )).*linux.*@vger\.rutgers\.edu

Dominik Kubla replied that

:0 W
* ^Sender: owner-linux-.*@vger\.rutgers\.edu
* ! ^TO_.*linux.*@vger\.rutgers\.edu

might be better "as per procmail manual, or you might lose resent/redistributed or bcc-ed emails, which are rare on lmkl but still should be considered "legal"..."

Andrew Morton thought this was a clever solution, and added, "This, and a resolution of the utterly abysmal lag at vger would make lkml a more pleasant place to be." Jason Gunthorpe also added, "Yeah, the ~6 (?) hour lag sucks" [...] "It is particularly pointed with the recent reiserfs threads, I read a message on the reiser list and then a half day later the cc comes up here. I guess I'm on a bad exploder or something. :<"

10. Jiffies Wraparound

15 Mar 2000 - 17 Mar 2000 (7 posts) Archive Link: "jiffies wraparound"

People: Andrea ArcangeliLech Szychowski

Nicholas Vinen had read in "Linux Device Drivers" that the jiffie count would wrap after a year of uptime, and that this could cause strange behavior. He asked if any auditing/fixing had been done to work around this, and Andrea Arcangeli replied, "We found and fixed all the design problem related to jiffies wrap arounds and audited all the drivers before 2.2.x." Lech Szychowski added anecdotally, "I've got one machine that's been up and running since Sep 24, 1996. That's two wraps already; haven't seen any strange things happening." In a later message, he clarified that his box had been up for around 1270 days, or almost three and a half years.

11. Makefile Bug Fix

18 Mar 2000 - 20 Mar 2000 (6 posts) Archive Link: "[PATCH] Additional patch for toplevel Makefile driver lists"

People: Brian GerstJamie LokierLinus Torvalds

Brian Gerst posted a 3-line makefile patch, and explained, "Some shells (bash 2.x specifically) don't like environment variables with dashes in them. This patch makes them not exported. This problem doesn't show up in the other makefiles because they do not export all their variables to the environment like the toplevel Makefile does." He added, "Linus, this is a resend of the patch to you. I'm not certain it made it to you the first time." Linus Torvalds replied that this seemed to be working around a misfeature, as opposed to making a proper feature to begin with. Brian replied with a new patch, explaining, "How does this look as a first stab? The magic is in the MAKEFILES variable. It causes the sub-makes to read in .config before the rest of the Makefile instead of having to add it to every single file. I do not know if this is specific to GNU make or not. I tested it with i386, and I checked the rest of the arches for variables that needed to be exported." Linus was very pleased with the patch, in spite of the fact that the config file would be read and parsed multiple times. It seemed fundamentally correct to him, and he asked if anyone disagreed. Jamie Lokier replied, "The alternative is reading and parsing the environment variables, and they have to be copied via the execve too. I should think there's not a lot of difference between the two -- except the config file approach works :-)" End Of Thread.

12. Removing Tests From ext2 Mounts

20 Mar 2000 (12 posts) Archive Link: "Patch to make ext2 mounts go faster...."

Topics: FS: ext2

People: Theodore Y. Ts'oStephen C. TweedieOliver XymoronStefan MonnierVojtech Pavlik

Theodore Y. Ts'o posted a patch against 2.3.99-pre2, and announced, "The following patches makes ext2 mounts go faster by removing the pointless counting of all of the free blocks and inodes in the bitmaps to make sure they match up with the block group descriptors. The checks take a huge amount of time, and are completely duplicated by the checks done by fsck. Furthermore, the things which they check (the free blocks/inodes counts), if wrong, won't critically impact ext2 performance. Hence, by removing this check, we speed up the mounting of ext2 filesystems significantly. I've been recommending that people mount filesystems "-o check=none" for a while. This simply makes this the default." Nasser Abbasi felt that the patch was wrong, because 'mount' should always check to see if it was or was not mounting a clean filesystem; but Stephen C. Tweedie explained quickly, "That still happens: there is an entirely separate set of data structures in the superblock which keep track of uncleanly-umounted filesystems." But Nicholas Vinen also replied to Nasser, coming out against the patch. He said that since the mount option existed in userspace (-o check=none), that userspace solution was better than going to the lengths of patching the kernel. Theodore and Vojtech Pavlik both replied that the checks were unnecessary to begin with; and Oliver Xymoron also replied to Nicholas, "No, it's best to have extensive checks in user space rather than simple, yet expensive, checks in the kernel. The checks have been there forever and haven't caught much that isn't caught by the dirty fs flag." But Michael Weller disagreed, pointing out that in the unpatched situation, it would still be easy for experienced users to add the "nocheck" option to their '/etc/fstab', while inexperienced users would not know enough to enable tests that were not on by default. Stefan Monnier reiterated that the tests really didn't belong in the kernel; and Theodore also replied to Michael:

There are only two problems with your thesis:

  1. The check, although it is quite time consuming, only checks the bitmaps blocks to see if they match with the superblock group descriptors. On filesystems with 4k blocks, this means that you only catch errors that occur in 0.009% of the filesystem. (On filesystems with 1k blocks, you're cecking 0.03% of the filesystem --- in both cases, it's well less than a tenth of a percent.) Granted, the bitmap blocks get written a lot, but it still means that a large number of potential disk corruptions might not get noticed by the check.
  2. If there are problems, the kernel simply prints a warning. Many of the non-experienced users are likely to not notice or ignore the warning messages which appear at boot-time anyway.
  3. If someone *really* wants to do a potentially time-consuming minimalistic check, I can add it to e2fsck (it's basically means skipping passes 1 through 4 and only doing pass #5), but quite frankly, I'm still not convinced its worth the cost/benefit ratio. Still, people who are really paranoid could use this if they really want.

End of thread.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.