Feature: cache #3

Closed
opened 2024-09-19 07:46:24 +00:00 by stephenseo · 11 comments
Owner

This can be implemented by doing the following:

  1. Set up a program argument to enable the cache and designate a flag for it (default cache disabled).
  2. Add a function to report all files "used" by a requested path.
  3. Set up a directory to hold cached html.
  4. Set up a function to store a requested path's html to the cache dir.
  5. Set up logic to compare the timestamp of the cached file and the files "used" by a requested path so that the cache is overwritten if the "used" files are newer.
This can be implemented by doing the following: 1. Set up a program argument to enable the cache and designate a flag for it (default cache disabled). 2. Add a function to report all files "used" by a requested path. 3. Set up a directory to hold cached html. 4. Set up a function to store a requested path's html to the cache dir. 5. Set up logic to compare the timestamp of the cached file and the files "used" by a requested path so that the cache is overwritten if the "used" files are newer.
stephenseo self-assigned this 2024-09-19 07:46:29 +00:00
stephenseo added the
enhancement
label 2024-09-19 07:46:33 +00:00
Author
Owner

It may be best to have a --enable-cache-dir=<DIR> to do both enabling of the cache and setting up a designated cache dir at the same time.

It may be best to have a `--enable-cache-dir=<DIR>` to do both enabling of the cache and setting up a designated cache dir at the same time.
Author
Owner

It needs to be decided the name-format of cached html.

It needs to be decided the name-format of cached html.
Author
Owner

Perhaps also keep track of how old a cache file is, so that they are purged after a certain time period. Maybe a day to a week?

Could be set with --cache-lifetime=<TIME>. Will need to define a format for the time. Maybe "#d" for days and "#w" for weeks.

Perhaps also keep track of how old a cache file is, so that they are purged after a certain time period. Maybe a day to a week? Could be set with `--cache-lifetime=<TIME>`. Will need to define a format for the time. Maybe "#d" for days and "#w" for weeks.
Author
Owner

Thinking about it, it probably would be best to modify the function that parses and constructs the html to also return the list of files used in the process.

char *c_simple_http_path_to_generated(

This function also uses something that may benefit to be separated into its own file: concatenating a string via the use of a linked-list and combining the linked-list items into a single buffer. I would probably do this first as it may prove useful in other contexts.

Thinking about it, it probably would be best to modify the function that parses and constructs the html to also return the list of files used in the process. https://git.seodisparate.com/stephenseo/c_simple_http/src/commit/47d7f0396d8ab94b9e297526ac3840ba3b6c4a67/src/http_template.c#L95 This function also uses something that may benefit to be separated into its own file: concatenating a string via the use of a linked-list and combining the linked-list items into a single buffer. I would probably do this first as it may prove useful in other contexts.
Author
Owner

For an ideal way of handling cached html:

  • On first HTML load, store the HTML_FILE's filename (if used), and every VAR_FILEs' filenames (if used) for each path.
  • On load where a matching cached HTML exists, check against the relevant file's timestamps.
  • For cases where HTML is directly in the config file, keep track of if they have changed after a config reload, and invalidate the matching cache HTML entry if a change is detected. A hash of the contents to compare should be adequate.

Perhaps there may be a better way of doing this, but this is what I'm thinking for now.

For an ideal way of handling cached html: - On first HTML load, store the HTML_FILE's filename (if used), and every VAR_FILEs' filenames (if used) for each path. - On load where a matching cached HTML exists, check against the relevant file's timestamps. - For cases where HTML is directly in the config file, keep track of if they have changed after a config reload, and invalidate the matching cache HTML entry if a change is detected. A hash of the contents to compare should be adequate. Perhaps there may be a better way of doing this, but this is what I'm thinking for now.
Author
Owner

Perhaps the cache file shouldn't be just the raw HTML, but a plain-text formatted file:

  1. A "header" designating the start of relevant filenames.
  2. The filenames to check the timestamps of to compare with the cache entry. (Format likeFILE: <filename>)
  3. There may also be variable names followed by a hash (like VAR: 012345...).
  4. A "header-ending" designating the end of the relevant filenames and the start of the actual cached HTML.

Not sure if the HTML's size should be stored in this "header" or if it should rely on fseek/ftell to get the size. Either way, the current implementation responds to an http request with a header containing the size of the HTML being sent.

EDIT: Actually, one can count the bytes read after reaching the end of "header" when loading the HTML into memory.

Perhaps the cache file shouldn't be just the raw HTML, but a plain-text formatted file: 1. A "header" designating the start of relevant filenames. 2. The filenames to check the timestamps of to compare with the cache entry. (Format like`FILE: <filename>`) 3. There may also be variable names followed by a hash (like `VAR: 012345...`). 4. A "header-ending" designating the end of the relevant filenames and the start of the actual cached HTML. Not sure if the HTML's size should be stored in this "header" or if it should rely on fseek/ftell to get the size. Either way, the current implementation responds to an http request with a header containing the size of the HTML being sent. EDIT: Actually, one can count the bytes read after reaching the end of "header" when loading the HTML into memory.
Author
Owner

Actually the non-FILE variables don't need to be individually checked. If a variable has changed, then the config has changed, and thus only the timestamp on the config file can account for them.

Actually the non-FILE variables don't need to be individually checked. If a variable has changed, then the config has changed, and thus only the timestamp on the config file can account for them.
Author
Owner

A format has to be decided for the cache file. Perhaps something like:

--- CACHE ENTRY ---
some_filename.html
another_filename.html
--- BEGIN HTML ---
<html>
...
</html>
A format has to be decided for the cache file. Perhaps something like: ``` --- CACHE ENTRY --- some_filename.html another_filename.html --- BEGIN HTML --- <html> ... </html> ```
Author
Owner

A format needs to be decided for cache filenames.

Maybe a path of / corresponds to "ROOT", a path of /outer corresponds to "outer" , and a path of /outer/inner corresponds to "outer0x2Finner". (0x2F is the / character.)

Hopefully a delimeter of 0x2F won't actually be used in an actual path. If it is used, then perhaps support an alternate string to use as delimeter as well?

URLs have %2F as the escaped form of /, so perhaps that can be used as an alternative delimeter.

A format needs to be decided for cache filenames. Maybe a path of `/` corresponds to "ROOT", a path of `/outer` corresponds to "outer" , and a path of `/outer/inner` corresponds to "outer0x2Finner". (0x2F is the `/` character.) Hopefully a delimeter of `0x2F` won't actually be used in an actual path. If it is used, then perhaps support an alternate string to use as delimeter as well? URLs have `%2F` as the escaped form of `/`, so perhaps that can be used as an alternative delimeter.
Author
Owner

As of 206cad6f57 , the alternate delimeter %2F in place of 0x2F will be used if 0x2F exists in the given path. This can be checked for when turning the filename into a path by checking if it starts with % due to how the first slash is converted into either delimeter.

As of https://git.seodisparate.com/stephenseo/c_simple_http/commit/206cad6f57daf5641f4f79555fb87a6c54db7887 , the alternate delimeter `%2F` in place of `0x2F` will be used if `0x2F` exists in the given path. This can be checked for when turning the filename into a path by checking if it starts with `%` due to how the first slash is converted into either delimeter.
Author
Owner

Thinking about it, I'm not sure if a http client will pass escaped characters directly in the request. It probably would be best to convert such escaped characters during processing. This might be worthy of being a separate issue in this issue tracker, but for now it probably would be best to add functions to convert from/to escaped form and to use a delimeter other than %2F.

Thinking about it, I'm not sure if a http client will pass escaped characters directly in the request. It probably would be best to convert such escaped characters during processing. This might be worthy of being a separate issue in this issue tracker, but for now it probably would be best to add functions to convert from/to escaped form and to use a delimeter other than `%2F`.
stephenseo referenced this issue from a commit 2024-09-26 04:03:11 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: stephenseo/c_simple_http#3
No description provided.