Matthew Garrett ([info]mjg59) wrote,
@ 2009-03-27 00:02:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:advogato, fedora

Reducing disk use
UNIX filesystems generally store three pieces of timing information about files - ctime (when the file was changed in any way), mtime (when the file contents, as opposed to its metadata, was last changed) and atime (when the file was last accessed by any process). This is a usefully flexible system, but the semantics of atime can be troublesome. atime must be updated every time a file is read, causing a read operation to instead become a read/write operation. This results in a surprising amount of io being generated in normal filesystem use, slowing the more relevant io and causing disks to spin up due to atime updates being required even if the file was read out of cache. It also results in a lot of unnecessary activity on flash media which may reduce their lifetime.

One option is to disable atime updates entirely. The problem with this approach is that certain applications depend on atime. This is especially common in mail clients which compare atime to mtime in order to determine whether a mailbox has been read since it was last modified. So, unfortunately, disabling atime entirely is impractical as a default. Back in 2006, Valerie Aurora submitted a patch that worked around this issue. The new relatime option meant that atime would only be updated if it would otherwise be older than ctime or mtime. Mail clients became happy and the world rejoiced.

Unfortunately, it turned out that there was one other common case of atime being used. Applications like tmpwatch monitor files in /tmp and delete them if they appear unused. In this case, "unused" means "has an atime older than a certain date". Since merely reading files doesn't update the ctime or mtime, relatime wouldn't cause the atime on these files to be updated and tmpwatch would happily delete them - even if users were reading them on a daily basis.

Ingo Molnar submitted a patch to add a further heuristic to the relatime behaviour. With it, the atime of a file will be updated if it's older than mtime, older than ctime or (and this is the important one) more than 24 hours in the past. This deals with the tmpwatch case nicely, while still providing a significant reduction in the quantity of atime updates.

Fedora shipped this patch for several releases, and Ubuntu have used it by default since 8.04. Unfortunately there were some concerns over certain aspects of its behaviour (in respect to its interface as opposed to the relatime functionality itself) and it never got merged. I pushed a trimmed down version that purely implements the change to the relatime behaviour, and earlier today Linus merged it and a further patch that makes relatime the default behaviour on Linux.

Most users won't notice this change in behaviour at all, other than as a small improvement in io performance and a reduction in the number of drive spinups. For users that do have issues, a new strictatime mount option has been added - using this will require an updated mount command, but it's a trivial patch. I'd be surprised if there are any real world use cases that are negatively affected by this, especially since it's been default behaviour in several distributions for a while, but there's always the potential that someone will be tripped up by it. We'll see.




(48 comments) - (Post a new comment)


[info]lionsphil
2009-03-27 12:50 am UTC (link)
Woot. Fairly unpleasant hackery, but pleasantly pragmatic given the constraints of UNIX.

(I love how this human-check thing thinks that "12:60-1" is a word.)

(Reply to this)

Let's stop relying on this misfeature
(Anonymous)
2009-03-27 01:46 am UTC (link)
Shouldn't we simply stop relying on this misfeature? Mutt and tmpwatch be damned! If a few popular distros made noatime the default, app devs would eventually think twice before relying on it. Consider even relatime still carries a heavy penalty. It's just not worth it.

(Reply to this) (Thread)

Re: Let's stop relying on this misfeature
[info]mjg59
2009-03-27 01:47 am UTC (link)
How would you implement the same functionality?

(Reply to this) (Parent)(Thread)

Re: Let's stop relying on this misfeature
(Anonymous)
2009-03-27 04:27 am UTC (link)
tmpwatch is a heuristic anyway -- we can imagine cases where even the atime would lie about whether something in /tmp is important enough to keep around. So you could use a heuristic that relies on mtime and administrators would just have to live with it by listing exceptions for files that are intended to remain unmodified for long periods of time.

mutt could maintain its own cache indicating when the last time *it* accessed the mailbox was, or it could read through the mail files to find out whether there was new mail in them.

popcon could just not report "vote" popularity

(Reply to this) (Parent)

Re: Let's stop relying on this misfeature
[info]zaitcev
2009-03-27 04:30 am UTC (link)
Maildir is an easy answer for MUA.
The tmpwatch can just use mtime.

(Reply to this) (Parent)(Thread)

Re: Let's stop relying on this misfeature
(Anonymous)
2009-03-27 05:50 am UTC (link)
With the original relatime code, tmpwatch was effectively using just the mtime (since the atime would only be updated when mtime increased). That caused real problems, which explains the changes.

So "just use mtime" would likely reintroduce those same problems.

(Reply to this) (Parent)


[info]king_of_wrong
2009-03-27 09:30 am UTC (link)
Who is relying on "temporary" files being available for reading for more than 24 hours?

(Reply to this) (Parent)(Thread)


[info]tienelle
2009-03-27 11:45 am UTC (link)
If sockets count, my ssh-agent.

(Reply to this) (Parent)(Thread)


[info]sweh
2009-03-27 04:18 pm UTC (link)
Make your agent store the socket in $HOME/.ssh ? /tmp is the wrong place.

(Oh you clever recaptcha.. how do I enter the "1/2" character? Huh? Huh?)

(Reply to this) (Parent)(Thread)


[info]ewx
2009-03-27 04:27 pm UTC (link)
I imagine /tmp is chosen because it's the only place that is both writeable and unique to the local machine ($HOME often failing on the latter). Still, you wouldn't think sticking uname -n into the path was that onerous, would you?

(Reply to this) (Parent)(Thread)


[info]sweh
2009-03-27 05:12 pm UTC (link)
Or specify /var/run/ssh-agent/$USER (similar to /var/run/sudo) or... lots of possibilities. Not /tmp :-)

(Reply to this) (Parent)


[info]tienelle
2009-03-27 04:29 pm UTC (link)
/tmp is the default; what properties make it wrong?

Also, I'm probably evil bad and wrong, but my /tmp is a RAM drive; having my old ssh-agent socket evaporate when the computer's improperly shut down removes a minor inconvenience.

(Reply to this) (Parent)(Thread)


[info]sweh
2009-03-27 05:09 pm UTC (link)
/tmp is a shared resource (so security files, which ssh-agent socket is) definitely don't belong there. Contents of /tmp and /var/tmp may be deleted in a site-specific manner. No strong assumptions can be made about the persistence of data in /tmp or /var/tmp.

Historically, systems would just delete files with an mtime over a few days old. eg my old SunOS 4 machine (decomissioned 8 years ago, but the backups still online!)

0 3 * * * ( cd /tmp ; find . -type f -mtime +2 -exec rm -f {} \; ) > /dev/null 2>&1

If you expect a file to be retained for more than a few hours then use your home directory or /var/run/ssh-agent or similar.

(Reply to this) (Parent)(Thread)


[info]ewx
2009-03-27 05:58 pm UTC (link)
Deleting 'old' files in /tmp is hardly universal behaviour! I'd say it's a problem for people writing /tmp cleaners to arrange for them to cope with things that have different lifetimes to the ones they imagined.

(Reply to this) (Parent)(Thread)


[info]king_of_wrong
2009-03-27 09:41 pm UTC (link)
Which part of "temporary" is unclear?

(Reply to this) (Parent)(Thread)


[info]ewx
2009-03-28 10:42 am UTC (link)
The part you're inferring a hard time limit from. In the case of /tmp it might simply be that it's not backed up, or doesn't survive reboots - both very common de facto policies, and both plainly temporary compared to other parts of the filesystem.

(Reply to this) (Parent)(Thread)


[info]king_of_wrong
2009-03-28 02:24 pm UTC (link)
The problem is that people are inferring "in use" from timestamps rather than reference counts.

Anything in /tmp that isn't "in use" is - and should be - at risk. Most of the stuff in /tmp probably shouldn't be, and certainly should be deleted on exit (obviating cleaners), so this entire discussion is how to work around catastrophically broken code instead of changing it. Why is that?

(Reply to this) (Parent)(Thread)


[info]ewx
2009-03-28 02:27 pm UTC (link)
Where are you getting these "shoulds" about /tmp from? As far as I can see they're just one set of opinions.

(Reply to this) (Parent)(Thread)


[info]king_of_wrong
2009-03-28 03:12 pm UTC (link)
Option 1: /tmp is for temporary files.

When the creating process goes away, or when the file is dropped, it's automatically deleted.

Implications: no need for tmp cleaners, trivial auto-delete, any software which relies on persistence of /tmp is broken.


Option 2: /tmp is just a name.

File are persistent, existing for as long as desired. Explicit clean-up.

Implications: no need for tmp cleaners, any software which shits files all over /tmp is broken.


Option 3: /tmp is for files which are temporary, except when they're not.

Implications: OS needs to magically know which unused files are dead and which unused files are live - attempted through an increasingly-hairy set of heuristics which satisfy nobody.


Figure out your own "shoulds" from that, or don't, and kick it over the wall as Someone Else's Problem and accept that the OS is fucked because everyone else will do the same.

(Reply to this) (Parent)


[info]mjg59
2009-03-28 02:27 pm UTC (link)
By what definition is a file that's frequently accessed not in use?

(Reply to this) (Parent)(Thread)


[info]king_of_wrong
2009-03-28 02:50 pm UTC (link)
By what definition is a file that's been used, and may be used again, but isn't currently in use, "in use"? And, more to the point, why are people relying on that?

There's a good case for temporary files not being able to be re-opened or shared between processes. It cuts down all the brokenness of files being created and read "frequently", but being "temporary" for some increasingly tenuous definition... If you get to that level, all files are temporary as they will - one day - be replaced or deleted.

(Reply to this) (Parent)(Thread)


[info]mjg59
2009-03-28 02:57 pm UTC (link)
There's a good case for that, and maybe if we redesigned Unix it'd be a worthy design goal. But we're not, so worrying about it is just wanking.

(Reply to this) (Parent)(Thread)


[info]king_of_wrong
2009-03-28 03:24 pm UTC (link)
How hard can it be to change where some app puts files? Or to ensure that it deletes them after it's done?

This isn't a Unix problem, this is an application problem. For the example you gave someone, somewhere has clearly thought "Ooh, I'll create a file in /tmp and then people can use it later". Who? Why? When will it no longer be useful? Where is the code to delete it when it's dead?

(Reply to this) (Parent)(Thread)


[info]mjg59
2009-03-28 03:27 pm UTC (link)
It's not "some app". It's "a large number of apps". Traditional Unix behaviour has been to either delete the contents of /tmp at boot or reap them after some arbitrary period of time, and while that could be changed it'd involve a great deal of effort for no benefit.

(Reply to this) (Parent)


[info]ewx
2009-03-28 02:28 pm UTC (link)
BTW, your /tmp cleaner has a security hole.

(Reply to this) (Parent)(Thread)


[info]sweh
2009-03-28 03:23 pm UTC (link)
Not my cleaner. That was the default "out of the box" crontab entry from SunOS 4, written for a more civilised age before attacks using deliberately malformed filenames etc etc were even considered.

Many things that went unchecked for decades is know clearly and obviously bad; writing secure scripts isn't as easy as people thought, back then!

(Reply to this) (Parent)


[info]xlerb
2009-03-28 04:14 am UTC (link)
Also, I'm probably evil bad and wrong, but my /tmp is a RAM drive

It's a relatively common practice; even the BSD hier(7) man page states that /tmp is “usually” a memory filesystem and not preserved over reboots (by contrast with /var/tmp, which is preserved, and as such used for vi recovery files).

(Reply to this) (Parent)


[info]xlerb
2009-03-28 04:01 am UTC (link)
(Oh you clever recaptcha.. how do I enter the "1/2" character? Huh? Huh?)

Compose 1 2, thusly: ½.

(Oh, great, and then I get a 3½ in mine.)

(Reply to this) (Parent)

Re: Let's stop relying on this misfeature
(Anonymous)
2009-03-27 09:32 am UTC (link)
How do you implement it sanely with old "POSIX" behaviour?
mutt must not use atime because atime doesn't mean user read mail:
- backup, antivirus, or a user grep (to check old email)
- other MUA or notification programs like xbiff (if a person read main only in mutt, noatime is not necessary in mutt): they could look the mail headers (maybe to display subject), so I miss new mail in mutt
- I doubt atime could be implemented sanely on a network fs.

So "stricttime" is usefull only for sysadmin, who know that there is no extraneous access (maybe for backups, for optimization)

popcon case is interesting, but anyway not so precise (backups, checksum backups), for sid user (atime > mtime).

(Reply to this) (Parent)

Re: Let's stop relying on this misfeature
[info]simonkagstrom
2009-03-27 04:14 pm UTC (link)
Can't inotify be used to implement these features without relying on atime?

(Reply to this) (Parent)(Thread)

Re: Let's stop relying on this misfeature
[info]mjg59
2009-03-27 04:17 pm UTC (link)
No, for a variety of reasons including the limited number of inotify handles available per user.

(Reply to this) (Parent)(Thread)

Re: Let's stop relying on this misfeature
[info]simonkagstrom
2009-03-27 04:27 pm UTC (link)
Even for tmpwatch? Isn't it possible to just add a watch for the directory?

But ok, even if there would be enough handles I guess it would cause a lot of processing without much real gain.

(Reply to this) (Parent)

popcon
(Anonymous)
2009-03-27 04:20 am UTC (link)
Popcon will notice. It checks atime to indicate how many people are actually using a package as opposed to merely having it installed. With old relatime behavior, I think Ubuntu had much lower "vote" totals than Debian. With the new relatime behavior, the vote totals should be the same as having full atime updating.

(Reply to this)

Fine
[info]dnivie
2009-03-27 09:07 am UTC (link)
It's probably a reasonable solution in practice, but it still feels a lot like a band-aid for a hack.

relatime feels like a hack to begin with; "atime" is supposed to be "access-time", with relatime it really isn't, instead it's "first-read-after-last-modification" which isn't the same thing at all, even though it *works* the same for some common use-cases such as mutt.

And now this, what's so magical about *24* hours anyway ? Why not 1,6,12 or 48 ? Feels like a completely arbitrary band-aid for something which clearly was a hack to begin with.

"Someone read the mailbox *AFTER* it was last modified" was never a good way to determine that "user has already been informed of the new mail" anyway, arguably even THAT is a hack.

I say fuck it, mount stuff noatime and deal with the resulting (fairly minor) breakage.

(Reply to this) (Thread)

Re: Fine
[info]king_of_wrong
2009-03-27 09:26 am UTC (link)
"Someone read the mailbox *AFTER* it was last modified" was never a good way to determine that "user has already been informed of the new mail" anyway

Indeed! Relying on the filesystem metadata to provide email metadata is ugly and unsound behaviour.

Why not explicitly store the last notification time, and separate "read" flags for each message? That also deals with the case where a user has multiple new mails, but doesn't read them all.

Worst case guarantee from a system crash can (ext4 aside) be that some previously-read mails become "unread" again, and you're told again that you have "new" mail.

(Reply to this) (Parent)(Thread)

Re: Fine
(Anonymous)
2009-03-27 03:59 pm UTC (link)
It is, in fact, only mutt that relies on atime for this. Every other mail server and mailer is sane and keeps its own separate cache.

(Reply to this) (Parent)(Thread)

Re: Fine
[info]valhenson
2009-03-27 06:03 pm UTC (link)
And in fact, if I recall correctly, mutt only uses atime if it is compiled with certain configuration parameters.

(Reply to this) (Parent)

Re: Fine
(Anonymous)
2009-03-27 09:35 am UTC (link)
24h is a reasonable time, but it is only the default value. It could be modified by mount options. (I did not verifies, but was a requirement on LKML, and the reason relatime was rejected last time)

(Reply to this) (Parent)


[info]simont
2009-03-27 09:19 am UTC (link)
Unfortunately, it turned out that there was one other common case of atime being used.

On a similar note, last year I wrote a du utility which makes heavy use of atimes to help identify the difference between large data you're actually still using and large data you untarred once and forgot about. So I've been disturbed by rumours that (old-style) relatime was considering becoming the default. But the 24-hour change sounds excellent to me; the loss of resolution won't bother my application noticeably, and having roughly right atimes continue to be available by default is just what I want.

(Reply to this) (Thread)


[info]lionsphil
2009-03-27 03:33 pm UTC (link)
Ooh. That'd be a useful addition to Baobab.

(Reply to this) (Parent)

Some others atime "natural" uses cases
[info]bpineau
2009-03-27 10:02 am UTC (link)
I can think of two other uses cases for atime:

- Tools like Debian's popularity-contest, ie. tools that lists which installed packages/binaries you use the more (Debian collect those stats to place the more popular packages on the firsts releases' cdroms), or which packages aren't used on your system (and you may want to remove), etc. Those tools uses atime on package's content to evaluate usage frequency. That's fragile on other ways (for instance, daily backups may break such an heuristic), and Ingo's improvement should fix the problem for (most of) them anyway.

- Postfix does use atime on deferred mails in queue to order properly next retries (I mean, to avoid hammering the same destination when delivery failed a few seconds ago, and give high priority on mails that had not been touched for the longer time). Recent Postfix versions include a workaround for noatime mount, by just forcing the atime update with utime(2)/futime(2). This atime-ordered files workqueue method seems natural enough, chance are it is used by other softwares (MTAs or not), maybe without the Postfix workaround...

(Reply to this) (Thread)

Re: Some others atime "natural" uses cases
(Anonymous)
2009-03-27 10:18 am UTC (link)
mailq will reset all retries? If I have 100 recipients, the mail will be copied 100 times, in order to allow "partial" deliveries? For long term retry (I think RFC recommend to retry for one month) a backup system (6h backups) will block it?

How could a sysadmin control what is wrong, without breaking the queue?

IMHO MTA should write status a separate file, with some more information (4xx errors, no route to host, dns problems), and *read* it for queue management (when data is not already loaded).

IMHO POSIX atime are too weak to be used by portable programs (but ok for sysadminsm and local settings)

(Reply to this) (Parent)

Keep atime interface but change on-disk format?
(Anonymous)
2009-03-27 11:52 am UTC (link)
As I was reading this, I got a weird idea, which I am not sure has any merit, but hey: what about not storing atime verbatim on disk, but in a form that needs less udating.

I.e., you could store a flag with the two meanings: "has been accessed in the last hour" and "has not been used since mtime". I would imagine that this flag changes not very often.

The filesystem code will need to compute the real atime from this, which might be a bit tricky for "has been accessed in the last hour". Maybe there is enough information on disk already to give this a meaning when a filesystem is mounted, e.g., a general filesystem-wide atime that is updated infrequently.

The "one hour" would be tuneable, of course.

Dunno, just an idea.

(Reply to this) (Thread)

Re: Keep atime interface but change on-disk format?
[info]lionsphil
2009-03-27 12:35 pm UTC (link)
'"has been accessed in the last hour"...I would imagine that this flag changes not very often.'

Er. That would involve toggling the flag an hour after the last write for every file written. Not to mention how it would work across unmounts.

I agree that, if atime is going to be wibbled away into basically a couple of flags, maybe it doesn't have to be a timestamp, but storing a time is generally a reasonable way to do "has been X since Y" tests.

(Reply to this) (Parent)


[info]sweh
2009-03-27 04:22 pm UTC (link)
"atime" is a bad idea, anyway. Nightly backups using file-system sweeps (ie not "dump") can reset atime details. Programs like tmpwatch are a kludge for people misusing /tmp.

(Reply to this) (Thread)

file-system sweeps only update atime for directories
[info]joe_buck
2009-03-27 06:14 pm UTC (link)
Doing "stat" on a file doesn't affect the atime.

(Reply to this) (Parent)(Thread)

Re: file-system sweeps only update atime for directories
[info]sweh
2009-03-27 06:52 pm UTC (link)
No, but adding the file to a tarball / Netbackup / whatever does...

(Reply to this) (Parent)(Thread)

Re: file-system sweeps only update atime for directories
(Anonymous)
2009-07-04 06:24 am UTC (link)
so rsync it somewhere before you tar it up/copy it

(Reply to this) (Parent)


(48 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…