|
|
==============
|
|
|
File Time Type
|
|
|
==============
|
|
|
|
|
|
.. contents::
|
|
|
:local:
|
|
|
|
|
|
.. _file-time-type-motivation:
|
|
|
|
|
|
Motivation
|
|
|
==========
|
|
|
|
|
|
The filesystem library provides interfaces for getting and setting the last
|
|
|
write time of a file or directory. The interfaces use the ``file_time_type``
|
|
|
type, which is a specialization of ``chrono::time_point`` for the
|
|
|
"filesystem clock". According to [fs.filesystem.syn]
|
|
|
|
|
|
trivial-clock is an implementation-defined type that satisfies the
|
|
|
Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of
|
|
|
representing and measuring file time values. Implementations should ensure
|
|
|
that the resolution and range of file_time_type reflect the operating
|
|
|
system dependent resolution and range of file time values.
|
|
|
|
|
|
|
|
|
On POSIX systems, file times are represented using the ``timespec`` struct,
|
|
|
which is defined as follows:
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
struct timespec {
|
|
|
time_t tv_sec;
|
|
|
long tv_nsec;
|
|
|
};
|
|
|
|
|
|
To represent the range and resolution of ``timespec``, we need to (A) have
|
|
|
nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``).
|
|
|
|
|
|
As the standard requires us to use the ``chrono`` interface, we have to define
|
|
|
our own filesystem clock which specifies the period and representation of
|
|
|
the time points and duration it provides. It will look like this:
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
struct _FilesystemClock {
|
|
|
using period = nano;
|
|
|
using rep = TBD; // What is this?
|
|
|
|
|
|
using duration = chrono::duration<rep, period>;
|
|
|
using time_point = chrono::time_point<_FilesystemClock>;
|
|
|
|
|
|
// ... //
|
|
|
};
|
|
|
|
|
|
using file_time_type = _FilesystemClock::time_point;
|
|
|
|
|
|
|
|
|
To get nanosecond resolution, we simply define ``period`` to be ``std::nano``.
|
|
|
But what type can we use as the arithmetic representation that is capable
|
|
|
of representing the range of the ``timespec`` struct?
|
|
|
|
|
|
Problems To Consider
|
|
|
====================
|
|
|
|
|
|
Before considering solutions, let's consider the problems they should solve,
|
|
|
and how important solving those problems are:
|
|
|
|
|
|
|
|
|
Having a Smaller Range than ``timespec``
|
|
|
----------------------------------------
|
|
|
|
|
|
One solution to the range problem is to simply reduce the resolution of
|
|
|
``file_time_type`` to be less than that of nanoseconds. This is what libc++'s
|
|
|
initial implementation of ``file_time_type`` did; it's also what
|
|
|
``std::system_clock`` does. As a result, it can represent time points about
|
|
|
292 thousand years on either side of the epoch, as opposed to only 292 years
|
|
|
at nanosecond resolution.
|
|
|
|
|
|
``timespec`` can represent time points +/- 292 billion years from the epoch
|
|
|
(just in case you needed a time point 200 billion years before the big bang,
|
|
|
and with nanosecond resolution).
|
|
|
|
|
|
To get the same range, we would need to drop our resolution to that of seconds
|
|
|
to come close to having the same range.
|
|
|
|
|
|
This begs the question, is the range problem "really a problem"? Sane usages
|
|
|
of file time stamps shouldn't exceed +/- 300 years, so should we care to support it?
|
|
|
|
|
|
I believe the answer is yes. We're not designing the filesystem time API, we're
|
|
|
providing glorified C++ wrappers for it. If the underlying API supports
|
|
|
a value, then we should too. Our wrappers should not place artificial restrictions
|
|
|
on users that are not present in the underlying filesystem.
|
|
|
|
|
|
Having a smaller range that the underlying filesystem forces the
|
|
|
implementation to report ``value_too_large`` errors when it encounters a time
|
|
|
point that it can't represent. This can cause the call to ``last_write_time``
|
|
|
to throw in cases where the user was confident the call should succeed. (See below)
|
|
|
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
#include <filesystem>
|
|
|
using namespace std::filesystem;
|
|
|
|
|
|
// Set the times using the system interface.
|
|
|
void set_file_times(const char* path, struct timespec ts) {
|
|
|
timespec both_times[2];
|
|
|
both_times[0] = ts;
|
|
|
both_times[1] = ts;
|
|
|
int result = ::utimensat(AT_FDCWD, path, both_times, 0);
|
|
|
assert(result != -1);
|
|
|
}
|
|
|
|
|
|
// Called elsewhere to set the file time to something insane, and way
|
|
|
// out of the 300 year range we might expect.
|
|
|
void some_bad_persons_code() {
|
|
|
struct timespec new_times;
|
|
|
new_times.tv_sec = numeric_limits<time_t>::max();
|
|
|
new_times.tv_nsec = 0;
|
|
|
set_file_times("/tmp/foo", new_times); // OK, supported by most FSes
|
|
|
}
|
|
|
|
|
|
int main() {
|
|
|
path p = "/tmp/foo";
|
|
|
file_status st = status(p);
|
|
|
if (!exists(st) || !is_regular_file(st))
|
|
|
return 1;
|
|
|
if ((st.permissions() & perms::others_read) == perms::none)
|
|
|
return 1;
|
|
|
// It seems reasonable to assume this call should succeed.
|
|
|
file_time_type tp = last_write_time(p); // BAD! Throws value_too_large.
|
|
|
}
|
|
|
|
|
|
|
|
|
Having a Smaller Resolution than ``timespec``
|
|
|
---------------------------------------------
|
|
|
|
|
|
As mentioned in the previous section, one way to solve the range problem
|
|
|
is by reducing the resolution. But matching the range of ``timespec`` using a
|
|
|
64 bit representation requires limiting the resolution to seconds.
|
|
|
|
|
|
So we might ask: Do users "need" nanosecond precision? Is seconds not good enough?
|
|
|
I limit my consideration of the point to this: Why was it not good enough for
|
|
|
the underlying system interfaces? If it wasn't good enough for them, then it
|
|
|
isn't good enough for us. Our job is to match the filesystems range and
|
|
|
representation, not design it.
|
|
|
|
|
|
|
|
|
Having a Larger Range than ``timespec``
|
|
|
----------------------------------------
|
|
|
|
|
|
We should also consider the opposite problem of having a ``file_time_type``
|
|
|
that is able to represent a larger range than ``timespec``. At least in
|
|
|
this case ``last_write_time`` can be used to get and set all possible values
|
|
|
supported by the underlying filesystem; meaning ``last_write_time(p)`` will
|
|
|
never throw a overflow error when retrieving a value.
|
|
|
|
|
|
However, this introduces a new problem, where users are allowed to attempt to
|
|
|
create a time point beyond what the filesystem can represent. Two particular
|
|
|
values which cause this are ``file_time_type::min()`` and
|
|
|
``file_time_type::max()``. As a result, the following code would throw:
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
void test() {
|
|
|
last_write_time("/tmp/foo", file_time_type::max()); // Throws
|
|
|
last_write_time("/tmp/foo", file_time_type::min()); // Throws.
|
|
|
}
|
|
|
|
|
|
Apart from cases explicitly using ``min`` and ``max``, I don't see users taking
|
|
|
a valid time point, adding a couple hundred billions of years in error,
|
|
|
and then trying to update a file's write time to that value very often.
|
|
|
|
|
|
Compared to having a smaller range, this problem seems preferable. At least
|
|
|
now we can represent any time point the filesystem can, so users won't be forced
|
|
|
to revert back to system interfaces to avoid limitations in the C++ STL.
|
|
|
|
|
|
I posit that we should only consider this concern *after* we have something
|
|
|
with at least the same range and resolution of the underlying filesystem. The
|
|
|
latter two problems are much more important to solve.
|
|
|
|
|
|
Potential Solutions And Their Complications
|
|
|
===========================================
|
|
|
|
|
|
Source Code Portability Across Implementations
|
|
|
-----------------------------------------------
|
|
|
|
|
|
As we've discussed, ``file_time_type`` needs a representation that uses more
|
|
|
than 64 bits. The possible solutions include using ``__int128_t``, emulating a
|
|
|
128 bit integer using a class, or potentially defining a ``timespec`` like
|
|
|
arithmetic type. All three will allow us to, at minimum, match the range
|
|
|
and resolution, and the last one might even allow us to match them exactly.
|
|
|
|
|
|
But when considering these potential solutions we need to consider more than
|
|
|
just the values they can represent. We need to consider the effects they will
|
|
|
have on users and their code. For example, each of them breaks the following
|
|
|
code in some way:
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
// Bug caused by an unexpected 'rep' type returned by count.
|
|
|
void print_time(path p) {
|
|
|
// __int128_t doesn't have streaming operators, and neither would our
|
|
|
// custom arithmetic types.
|
|
|
cout << last_write_time(p).time_since_epoch().count() << endl;
|
|
|
}
|
|
|
|
|
|
// Overflow during creation bug.
|
|
|
file_time_type timespec_to_file_time_type(struct timespec ts) {
|
|
|
// woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation
|
|
|
// this may overflow before it's converted to a file_time_type.
|
|
|
auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec);
|
|
|
return file_time_type(dur);
|
|
|
}
|
|
|
|
|
|
file_time_type correct_timespec_to_file_time_type(struct timespec ts) {
|
|
|
// This is the correct version of the above example, where we
|
|
|
// avoid using the chrono typedefs as they're not sufficient.
|
|
|
// Can we expect users to avoid this bug?
|
|
|
using fs_seconds = chrono::duration<file_time_type::rep>;
|
|
|
using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>;
|
|
|
auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec);
|
|
|
return file_time_type(dur);
|
|
|
}
|
|
|
|
|
|
// Implicit truncation during conversion bug.
|
|
|
intmax_t get_time_in_seconds(path p) {
|
|
|
using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >;
|
|
|
auto tp = last_write_time(p);
|
|
|
|
|
|
// This works with truncation for __int128_t, but what does it do for
|
|
|
// our custom arithmetic types.
|
|
|
return duration_cast<fs_seconds>().count();
|
|
|
}
|
|
|
|
|
|
|
|
|
Each of the above examples would require a user to adjust their filesystem code
|
|
|
to the particular eccentricities of the representation, hopefully only in such
|
|
|
a way that the code is still portable across implementations.
|
|
|
|
|
|
At least some of the above issues are unavoidable, no matter what
|
|
|
representation we choose. But some representations may be quirkier than others,
|
|
|
and, as I'll argue later, using an actual arithmetic type (``__int128_t``)
|
|
|
provides the least aberrant behavior.
|
|
|
|
|
|
|
|
|
Chrono and ``timespec`` Emulation.
|
|
|
----------------------------------
|
|
|
|
|
|
One of the options we've considered is using something akin to ``timespec``
|
|
|
to represent the ``file_time_type``. It only seems natural seeing as that's
|
|
|
what the underlying system uses, and because it might allow us to match
|
|
|
the range and resolution exactly. But would it work with chrono? And could
|
|
|
it still act at all like a ``timespec`` struct?
|
|
|
|
|
|
For ease of consideration, let's consider what the implementation might
|
|
|
look like.
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
struct fs_timespec_rep {
|
|
|
fs_timespec_rep(long long v)
|
|
|
: tv_sec(v / nano::den), tv_nsec(v % nano::den)
|
|
|
{ }
|
|
|
private:
|
|
|
time_t tv_sec;
|
|
|
long tv_nsec;
|
|
|
};
|
|
|
bool operator==(fs_timespec_rep, fs_timespec_rep);
|
|
|
fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep);
|
|
|
// ... arithmetic operators ... //
|
|
|
|
|
|
The first thing to notice is that we can't construct ``fs_timespec_rep`` like
|
|
|
a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to
|
|
|
constructing it from a single 64 bit integer.
|
|
|
|
|
|
We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values
|
|
|
directly. A ``chrono::duration`` represents its value as a tick period and a
|
|
|
number of ticks stored using ``rep``. The representation is unaware of the
|
|
|
tick period it is being used to represent, but ``timespec`` is setup to assume
|
|
|
a nanosecond tick period; which is the only case where the names ``tv_sec``
|
|
|
and ``tv_nsec`` match the values they store.
|
|
|
|
|
|
When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will
|
|
|
use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the
|
|
|
remaining seconds. Let's consider how this might cause a bug were users allowed
|
|
|
to manipulate the fields directly.
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
template <class Period>
|
|
|
timespec convert_to_timespec(duration<fs_time_rep, Period> dur) {
|
|
|
fs_timespec_rep rep = dur.count();
|
|
|
return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds.
|
|
|
}
|
|
|
|
|
|
template <class Duration>
|
|
|
Duration convert_to_duration(timespec ts) {
|
|
|
Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds.
|
|
|
return file_time_type(dur);
|
|
|
file_time_type tp = last_write_time(p);
|
|
|
auto dur =
|
|
|
}
|
|
|
|
|
|
time_t extract_seconds(file_time_type tp) {
|
|
|
// Converting to seconds is a silly bug, but I could see it happening.
|
|
|
using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>;
|
|
|
auto secs = duration_cast<Secs>(tp.time_since_epoch());
|
|
|
// tv_sec is now representing gigaseconds.
|
|
|
return secs.count().tv_sec; // Oops!
|
|
|
}
|
|
|
|
|
|
Despite ``fs_timespec_rep`` not being usable in any manner resembling
|
|
|
``timespec``, it still might buy us our goal of matching its range exactly,
|
|
|
right?
|
|
|
|
|
|
Sort of. Chrono provides a specialization point which specifies the minimum
|
|
|
and maximum values for a custom representation. It looks like this:
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
template <>
|
|
|
struct duration_values<fs_timespec_rep> {
|
|
|
static fs_timespec_rep zero();
|
|
|
static fs_timespec_rep min();
|
|
|
static fs_timespec_rep max() { // assume friendship.
|
|
|
fs_timespec_rep val;
|
|
|
val.tv_sec = numeric_limits<time_t>::max();
|
|
|
val.tv_nsec = nano::den - 1;
|
|
|
return val;
|
|
|
}
|
|
|
};
|
|
|
|
|
|
Notice that ``duration_values`` doesn't tell the representation what tick
|
|
|
period it's actually representing. This would indeed correctly limit the range
|
|
|
of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But
|
|
|
nanoseconds isn't the only tick period it will be used to represent. For
|
|
|
example:
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
void test() {
|
|
|
using rep = file_time_type::rep;
|
|
|
using fs_nsec = duration<rep, nano>;
|
|
|
using fs_sec = duration<rep>;
|
|
|
fs_nsec nsecs(fs_seconds::max()); // Truncates
|
|
|
}
|
|
|
|
|
|
Though the above example may appear silly, I think it follows from the incorrect
|
|
|
notion that using a ``timespec`` rep in chrono actually makes it act as if it
|
|
|
were an actual ``timespec``.
|
|
|
|
|
|
Interactions with 32 bit ``time_t``
|
|
|
-----------------------------------
|
|
|
|
|
|
Up until now we've only be considering cases where ``time_t`` is 64 bits, but what
|
|
|
about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case
|
|
|
for 32 bit builds).
|
|
|
|
|
|
When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit
|
|
|
``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation
|
|
|
involved. And nor should we, as it would suffer from the numerous complications
|
|
|
described by this paper.
|
|
|
|
|
|
Obviously our implementation for 32-bit builds should act as similarly to the
|
|
|
64-bit build as possible. Code which compiles in one, should compile in the other.
|
|
|
This consideration is important when choosing between ``__int128_t`` and
|
|
|
emulating ``timespec``. The solution which provides the most uniformity with
|
|
|
the least eccentricity is the preferable one.
|
|
|
|
|
|
Summary
|
|
|
=======
|
|
|
|
|
|
The ``file_time_type`` time point is used to represent the write times for files.
|
|
|
Its job is to act as part of a C++ wrapper for less ideal system interfaces. The
|
|
|
underlying filesystem uses the ``timespec`` struct for the same purpose.
|
|
|
|
|
|
However, the initial implementation of ``file_time_type`` could not represent
|
|
|
either the range or resolution of ``timespec``, making it unsuitable. Fixing
|
|
|
this requires an implementation which uses more than 64 bits to store the
|
|
|
time point.
|
|
|
|
|
|
We primarily considered two solutions: Using ``__int128_t`` and using a
|
|
|
arithmetic emulation of ``timespec``. Each has its pros and cons, and both
|
|
|
come with more than one complication.
|
|
|
|
|
|
The Potential Solutions
|
|
|
-----------------------
|
|
|
|
|
|
``long long`` - The Status Quo
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
Pros:
|
|
|
|
|
|
* As a type ``long long`` plays the nicest with others:
|
|
|
|
|
|
* It works with streaming operators and other library entities which support
|
|
|
builtin integer types, but don't support ``__int128_t``.
|
|
|
* Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs.
|
|
|
|
|
|
Cons:
|
|
|
|
|
|
* It cannot provide the same resolution as ``timespec`` unless we limit it
|
|
|
to a range of +/- 300 years from the epoch.
|
|
|
* It cannot provide the same range as ``timespec`` unless we limit its resolution
|
|
|
to seconds.
|
|
|
* ``last_write_time`` has to report an error when the time reported by the filesystem
|
|
|
is unrepresentable.
|
|
|
|
|
|
__int128_t
|
|
|
~~~~~~~~~~~
|
|
|
|
|
|
Pros:
|
|
|
|
|
|
* It is an integer type.
|
|
|
* It makes the implementation simple and efficient.
|
|
|
* Acts exactly like other arithmetic types.
|
|
|
* Can be implicitly converted to a builtin integer type by the user.
|
|
|
|
|
|
* This is important for doing things like:
|
|
|
|
|
|
.. code-block:: cpp
|
|
|
|
|
|
void c_interface_using_time_t(const char* p, time_t);
|
|
|
|
|
|
void foo(path p) {
|
|
|
file_time_type tp = last_write_time(p);
|
|
|
time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count();
|
|
|
c_interface_using_time_t(p.c_str(), secs);
|
|
|
}
|
|
|
|
|
|
Cons:
|
|
|
|
|
|
* It isn't always available (but on 64 bit machines, it normally is).
|
|
|
* It causes ``file_time_type`` to have a larger range than ``timespec``.
|
|
|
* It doesn't always act the same as other builtin integer types. For example
|
|
|
with ``cout`` or ``to_string``.
|
|
|
* Allows implicit truncation to 64 bit integers.
|
|
|
* It can be implicitly converted to a builtin integer type by the user,
|
|
|
truncating its value.
|
|
|
|
|
|
Arithmetic ``timespec`` Emulation
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
Pros:
|
|
|
|
|
|
* It has the exact same range and resolution of ``timespec`` when representing
|
|
|
a nanosecond tick period.
|
|
|
* It's always available, unlike ``__int128_t``.
|
|
|
|
|
|
Cons:
|
|
|
|
|
|
* It has a larger range when representing any period longer than a nanosecond.
|
|
|
* Doesn't actually allow users to use it like a ``timespec``.
|
|
|
* The required representation of using ``tv_sec`` to store the giga tick count
|
|
|
and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer,
|
|
|
but complicates a lot.
|
|
|
* It isn't a builtin integer type, and can't be used anything like one.
|
|
|
* Chrono can be made to work with it, but not nicely.
|
|
|
* Emulating arithmetic classes come with their own host of problems regarding
|
|
|
overload resolution (Each operator needs three SFINAE constrained versions of
|
|
|
it in order to act like builtin integer types).
|
|
|
* It offers little over simply using ``__int128_t``.
|
|
|
* It acts the most differently than implementations using an actual integer type,
|
|
|
which has a high chance of breaking source compatibility.
|
|
|
|
|
|
|
|
|
Selected Solution - Using ``__int128_t``
|
|
|
=========================================
|
|
|
|
|
|
The solution I selected for libc++ is using ``__int128_t`` when available,
|
|
|
and otherwise falling back to using ``long long`` with nanosecond precision.
|
|
|
|
|
|
When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation
|
|
|
provides same resolution and a greater range than ``timespec``. Otherwise
|
|
|
it still provides the same resolution, but is limited to a range of +/- 300
|
|
|
years. This final case should be rather rare, as ``__int128_t``
|
|
|
is normally available in 64-bit builds, and ``time_t`` is normally 32-bits
|
|
|
during 32-bit builds.
|
|
|
|
|
|
Although falling back to ``long long`` and nanosecond precision is less than
|
|
|
ideal, it also happens to be the implementation provided by both libstdc++
|
|
|
and MSVC. (So that makes it better, right?)
|
|
|
|
|
|
Although the ``timespec`` emulation solution is feasible and would largely
|
|
|
do what we want, it comes with too many complications, potential problems
|
|
|
and discrepancies when compared to "normal" chrono time points and durations.
|
|
|
|
|
|
An emulation of a builtin arithmetic type using a class is never going to act
|
|
|
exactly the same, and the difference will be felt by users. It's not reasonable
|
|
|
to expect them to tolerate and work around these differences. And once
|
|
|
we commit to an ABI it will be too late to change. Committing to this seems
|
|
|
risky.
|
|
|
|
|
|
Therefore, ``__int128_t`` seems like the better solution.
|