Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for the HousekeepingInterval parameter and enforce validation for it. #3517

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dsxing
Copy link
Contributor

@dsxing dsxing commented Apr 17, 2024

In my testing Kubernetes environment, I observed that if the HousekeepingInterval of cadvisor is set very high by using --housekeeping-interval, for example, greater than 2 minutes, then obtaining container data via kubelet returns null.
image

Upon analyzing the cadvisor code, I found that this is due to the fact that the RecentStats of containers only contain one stat data in cadvisor's timed_store. https://github.com/google/cadvisor/blob/master/manager/manager.go#L529;
Consequently, it becomes impossible to calculate the CpuStats of containers, resulting in a nil value for stat.CpuInst(https://github.com/google/cadvisor/blob/master/info/v2/conversion.go#L192, https://github.com/google/cadvisor/blob/master/info/v2/conversion.go#L234);

When kubelet aggregates metrics data for all containers on a node, it filters container data. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cadvisor_stats_provider.go#L97 ; Due to stat.CpuInst=nil,kubelet considers the container is Terminated, https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cadvisor_stats_provider.go#L399 ,resulting in a null value for 'containers' in the final.

The reason why RecentStats of containers in cadvisor's timed_store only contain one data point is because the HousekeepingInterval set by --housekeeping-interval exceeds the expiration period of data in timed_store, In kubelet, this expiration period is defined as 2 minutes by default so the memoryCache's maxAge is 2minutes.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/cadvisor_linux.go#L55 ;Therefore, each time housekeeping polls to write data to timed_store, previous data has already expired and been deleted. https://github.com/google/cadvisor/blob/master/utils/timed_store.go#L78

Based on this analysis, I believe we should add documentation for the HousekeepingInterval parameter and enforce validation for it, which is setted by --housekeeping-interval.

@k8s-ci-robot
Copy link
Collaborator

Hi @dsxing. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dsxing
Copy link
Contributor Author

dsxing commented Apr 18, 2024

@iwankgb Is it possible to handle the code logic in this way, or should we just add a note for clarification?

Signed-off-by: dsxing <[email protected]>
@dsxing dsxing changed the title Add descriptive information for HousekeepingInterval and add a mechanism to lower the HousekeepingInterval Add documentation for the HousekeepingInterval parameter and enforce validation for it. Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants