Add documentation for the HousekeepingInterval parameter and enforce validation for it. #3517
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In my testing Kubernetes environment, I observed that if the HousekeepingInterval of cadvisor is set very high by using --housekeeping-interval, for example, greater than 2 minutes, then obtaining container data via kubelet returns null.
![image](https://private-user-images.githubusercontent.com/8870947/323141691-7e26fe5b-8294-4f27-93fe-baeb32827b4e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxMDk1ODYsIm5iZiI6MTcyMDEwOTI4NiwicGF0aCI6Ii84ODcwOTQ3LzMyMzE0MTY5MS03ZTI2ZmU1Yi04Mjk0LTRmMjctOTNmZS1iYWViMzI4MjdiNGUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcwNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MDRUMTYwODA2WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ODc0ODQ2OTk1YjRhNGM2OWUxMGVhMjVjZGI1NzM3NjMxY2VhNzg4N2RmYzc5OTM4Y2Y5ZTkzZTYwMWY4ZDE5MiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.3K2pHeHfJ3HZguHk6vBoLKTN3WhVJPmEh6gKq47PIeo)
Upon analyzing the cadvisor code, I found that this is due to the fact that the RecentStats of containers only contain one stat data in cadvisor's timed_store. https://github.com/google/cadvisor/blob/master/manager/manager.go#L529;
Consequently, it becomes impossible to calculate the CpuStats of containers, resulting in a nil value for stat.CpuInst(https://github.com/google/cadvisor/blob/master/info/v2/conversion.go#L192, https://github.com/google/cadvisor/blob/master/info/v2/conversion.go#L234);
When kubelet aggregates metrics data for all containers on a node, it filters container data. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cadvisor_stats_provider.go#L97 ; Due to stat.CpuInst=nil,kubelet considers the container is Terminated, https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cadvisor_stats_provider.go#L399 ,resulting in a null value for 'containers' in the final.
The reason why RecentStats of containers in cadvisor's timed_store only contain one data point is because the HousekeepingInterval set by --housekeeping-interval exceeds the expiration period of data in timed_store, In kubelet, this expiration period is defined as 2 minutes by default so the memoryCache's maxAge is 2minutes.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/cadvisor_linux.go#L55 ;Therefore, each time housekeeping polls to write data to timed_store, previous data has already expired and been deleted. https://github.com/google/cadvisor/blob/master/utils/timed_store.go#L78
Based on this analysis, I believe we should add documentation for the HousekeepingInterval parameter and enforce validation for it, which is setted by --housekeeping-interval.