摘要:引言當(dāng)我們?cè)谑褂媚切┙ㄔO(shè)在之上的云平臺(tái)服務(wù)的時(shí)候,往往在概覽頁(yè)面都有一個(gè)明顯的位置用來(lái)展示當(dāng)前集群的一些資源使用情況,如,,內(nèi)存,硬盤等資源的總量使用量剩余量。如上,就是統(tǒng)計(jì)節(jié)點(diǎn)硬件資源的整個(gè)邏輯過(guò)程為例。
引言
當(dāng)我們?cè)谑褂媚切┙ㄔO(shè)在OpenStack之上的云平臺(tái)服務(wù)的時(shí)候,往往在概覽頁(yè)面都有一個(gè)明顯的位置用來(lái)展示當(dāng)前集群的一些資源使用情況,如,CPU,內(nèi)存,硬盤等資源的總量、使用量、剩余量。而且,每當(dāng)我們拓展集群規(guī)模之后,概覽頁(yè)面上的資源總量也會(huì)自動(dòng)增加,我們都熟知,OpenStack中的Nova服務(wù)負(fù)責(zé)管理這些計(jì)算資源,那么你有沒(méi)有想過(guò),它們是如何被Nova服務(wù)獲取的嗎?
Nova如何統(tǒng)計(jì)資源我們知道,統(tǒng)計(jì)資源的操作屬于Nova服務(wù)內(nèi)部的機(jī)制,考慮到資源統(tǒng)計(jì)結(jié)果對(duì)后續(xù)操作(如創(chuàng)建虛擬機(jī),創(chuàng)建硬盤)的重要性,我們推斷該機(jī)制的運(yùn)行順序一定先于其他服務(wù)。
通過(guò)上述簡(jiǎn)單的分析,再加上一些必要的Debug操作,我們得出:
該機(jī)制的觸發(fā)點(diǎn)位于nova.service.WSGIService.start方法中:
def start(self): """Start serving this service using loaded configuration. Also, retrieve updated port number in case "0" was passed in, which indicates a random port should be used. :returns: None """ if self.manager: self.manager.init_host() self.manager.pre_start_hook() if self.backdoor_port is not None: self.manager.backdoor_port = self.backdoor_port self.server.start() if self.manager: self.manager.post_start_hook()
其中,self.manager.pre_start_hook()的作用就是去獲取資源信息,它的直接調(diào)用為nova.compute.manager.pre_start_hook如下:
def pre_start_hook(self): """After the service is initialized, but before we fully bring the service up by listening on RPC queues, make sure to update our available resources (and indirectly our available nodes). """ self.update_available_resource(nova.context.get_admin_context()) ... @periodic_task.periodic_task def update_available_resource(self, context): """See driver.get_available_resource() Periodic process that keeps that the compute host"s understanding of resource availability and usage in sync with the underlying hypervisor. :param context: security context """ new_resource_tracker_dict = {} nodenames = set(self.driver.get_available_nodes()) for nodename in nodenames: rt = self._get_resource_tracker(nodename) rt.update_available_resource(context) new_resource_tracker_dict[nodename] = rt # Delete orphan compute node not reported by driver but still in db compute_nodes_in_db = self._get_compute_nodes_in_db(context, use_slave=True) for cn in compute_nodes_in_db: if cn.hypervisor_hostname not in nodenames: LOG.audit(_("Deleting orphan compute node %s") % cn.id) cn.destroy() self._resource_tracker_dict = new_resource_tracker_dict
上述代碼中的rt.update_available_resource()的直接調(diào)用實(shí)為nova.compute.resource_tracker.update_available_resource()如下:
def update_available_resource(self, context): """Override in-memory calculations of compute node resource usage based on data audited from the hypervisor layer. Add in resource claims in progress to account for operations that have declared a need for resources, but not necessarily retrieved them from the hypervisor layer yet. """ LOG.audit(_("Auditing locally available compute resources")) resources = self.driver.get_available_resource(self.nodename) if not resources: # The virt driver does not support this function LOG.audit(_("Virt driver does not support " ""get_available_resource" Compute tracking is disabled.")) self.compute_node = None return resources["host_ip"] = CONF.my_ip # TODO(berrange): remove this once all virt drivers are updated # to report topology if "numa_topology" not in resources: resources["numa_topology"] = None self._verify_resources(resources) self._report_hypervisor_resource_view(resources) return self._update_available_resource(context, resources)
上述代碼中的self._update_available_resource的作用是根據(jù)計(jì)算節(jié)點(diǎn)上的資源實(shí)際使用結(jié)果來(lái)同步數(shù)據(jù)庫(kù)記錄,這里我們不做展開(kāi);self.driver.get_available_resource()的作用就是獲取節(jié)點(diǎn)硬件資源信息,它的實(shí)際調(diào)用為:
class LibvirtDriver(driver.ComputeDriver): def get_available_resource(self, nodename): """Retrieve resource information. This method is called when nova-compute launches, and as part of a periodic task that records the results in the DB. :param nodename: will be put in PCI device :returns: dictionary containing resource info """ # Temporary: convert supported_instances into a string, while keeping # the RPC version as JSON. Can be changed when RPC broadcast is removed stats = self.get_host_stats(refresh=True) stats["supported_instances"] = jsonutils.dumps( stats["supported_instances"]) return stats def get_host_stats(self, refresh=False): """Return the current state of the host. If "refresh" is True, run update the stats first. """ return self.host_state.get_host_stats(refresh=refresh) def _get_vcpu_total(self): """Get available vcpu number of physical computer. :returns: the number of cpu core instances can be used. """ if self._vcpu_total != 0: return self._vcpu_total try: total_pcpus = self._conn.getInfo()[2] + 1 except libvirt.libvirtError: LOG.warn(_LW("Cannot get the number of cpu, because this " "function is not implemented for this platform. ")) return 0 if CONF.vcpu_pin_set is None: self._vcpu_total = total_pcpus return self._vcpu_total available_ids = hardware.get_vcpu_pin_set() if sorted(available_ids)[-1] >= total_pcpus: raise exception.Invalid(_("Invalid vcpu_pin_set config, " "out of hypervisor cpu range.")) self._vcpu_total = len(available_ids) return self._vcpu_total ..... class HostState(object): """Manages information about the compute node through libvirt.""" def __init__(self, driver): super(HostState, self).__init__() self._stats = {} self.driver = driver self.update_status() def get_host_stats(self, refresh=False): """Return the current state of the host. If "refresh" is True, run update the stats first. """ if refresh or not self._stats: self.update_status() return self._stats def update_status(self): """Retrieve status info from libvirt.""" ... data["vcpus"] = self.driver._get_vcpu_total() data["memory_mb"] = self.driver._get_memory_mb_total() data["local_gb"] = disk_info_dict["total"] data["vcpus_used"] = self.driver._get_vcpu_used() data["memory_mb_used"] = self.driver._get_memory_mb_used() data["local_gb_used"] = disk_info_dict["used"] data["hypervisor_type"] = self.driver._get_hypervisor_type() data["hypervisor_version"] = self.driver._get_hypervisor_version() data["hypervisor_hostname"] = self.driver._get_hypervisor_hostname() data["cpu_info"] = self.driver._get_cpu_info() data["disk_available_least"] = _get_disk_available_least() ...
注意get_available_resource方法的注釋信息,完全符合我們開(kāi)始的推斷。我們下面單以vcpus為例繼續(xù)調(diào)查資源統(tǒng)計(jì)流程,self.driver._get_vcpu_total的實(shí)際調(diào)用為LibvirtDriver._get_vcpu_total(上述代碼中已給出),如果配置項(xiàng)vcpu_pin_set沒(méi)有生效,那么得到的_vcpu_total的值為self._conn.getInfo()[2](self._conn可以理解為libvirt的適配器,它代表與kvm,qemu等底層虛擬化工具的抽象連接,getInfo()就是對(duì)libvirtmod.virNodeGetInfo的一次簡(jiǎn)單的封裝,它的返回值是一組數(shù)組,其中第三個(gè)元素就是vcpus的數(shù)量),我們看到這里基本就可以了,再往下就是libvirt的C語(yǔ)言代碼而不是Python的范疇了。
另一方面,如果我們配置了vcpu_pin_set配置項(xiàng),那么該配置項(xiàng)就被hardware.get_vcpu_pin_set方法解析成一個(gè)可用CPU位置索引的集合,再通過(guò)對(duì)該集合求長(zhǎng)后,我們也能得到最終想要的vcpus的數(shù)量。
如上,就是Nova統(tǒng)計(jì)節(jié)點(diǎn)硬件資源的整個(gè)邏輯過(guò)程(vcpus為例)。
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://specialneedsforspecialkids.com/yun/38158.html
此文已由作者王盼授權(quán)網(wǎng)易云社區(qū)發(fā)布。 歡迎訪問(wèn)網(wǎng)易云社區(qū),了解更多網(wǎng)易技術(shù)產(chǎn)品運(yùn)營(yíng)經(jīng)驗(yàn)~ 現(xiàn)狀計(jì)算節(jié)點(diǎn)發(fā)生磁盤損壞等數(shù)據(jù)無(wú)法恢復(fù)的異常時(shí),節(jié)點(diǎn)上的云主機(jī)系統(tǒng)盤無(wú)法恢復(fù),導(dǎo)致云主機(jī)只能被清理重建 計(jì)算節(jié)點(diǎn)宕機(jī)但磁盤數(shù)據(jù)可用時(shí),重啟即可恢復(fù)所有云主機(jī)的運(yùn)行 計(jì)算節(jié)點(diǎn)多次宕機(jī)(或一段時(shí)間內(nèi)頻繁宕機(jī)),則需要遷移所有云主機(jī)或者直接清理重建,云硬盤需要遷移到其他cinder-volume存儲(chǔ)服務(wù)節(jié)點(diǎn) 一般來(lái)...
摘要:一為什么要使用虛擬云桌面背景攜程呼叫中心,即服務(wù)聯(lián)絡(luò)中心,是攜程的核心部門之一,現(xiàn)有幾萬(wàn)員工。他們?nèi)晷r(shí)為全球攜程用戶提供服務(wù)。為此,攜程正式引入了虛擬云桌面。攜程云桌面現(xiàn)狀攜程云桌面現(xiàn)已部署上海南通如皋合肥信陽(yáng)穆棱六個(gè)呼叫中心。 編者:本文為劉科在第六期【攜程技術(shù)微分享】中的分享內(nèi)容。在攜程技術(shù)中心(微信號(hào)ctriptech)微信后臺(tái)回復(fù)【云桌面】,可加入微信交流群,和關(guān)注云桌面的...
摘要:一為什么要使用虛擬云桌面背景攜程呼叫中心,即服務(wù)聯(lián)絡(luò)中心,是攜程的核心部門之一,現(xiàn)有幾萬(wàn)員工。他們?nèi)晷r(shí)為全球攜程用戶提供服務(wù)。為此,攜程正式引入了虛擬云桌面。攜程云桌面現(xiàn)狀攜程云桌面現(xiàn)已部署上海南通如皋合肥信陽(yáng)穆棱六個(gè)呼叫中心。 編者:本文為劉科在第六期【攜程技術(shù)微分享】中的分享內(nèi)容。在攜程技術(shù)中心(微信號(hào)ctriptech)微信后臺(tái)回復(fù)【云桌面】,可加入微信交流群,和關(guān)注云桌面的...
摘要:下圖展示了虛擬機(jī)可以獲取到的信息神奇的這個(gè)地址來(lái)源于,亞馬遜在設(shè)計(jì)公有云的時(shí)候?yàn)榱俗屇軌蛟L問(wèn),就將這個(gè)特殊的作為服務(wù)器的地址。服務(wù)啟動(dòng)了服務(wù),負(fù)責(zé)處理虛擬機(jī)發(fā)送來(lái)的請(qǐng)求。服務(wù)也運(yùn)行在網(wǎng)絡(luò)節(jié)點(diǎn)。中的路由和服務(wù)器都在各自獨(dú)立的命名空間中。前言下圖是OpenStack虛擬機(jī)在啟動(dòng)過(guò)程中發(fā)出的一個(gè)請(qǐng)求,我們?cè)诶锩婵梢钥吹絚loud-init和169.254.169.254。那么它們分別是做什么用的呢...
摘要:模板中的頂級(jí),定義實(shí)例化后將返回的數(shù)據(jù)。通過(guò)如此的解析和協(xié)作,最終完成請(qǐng)求的處理。服務(wù)接受請(qǐng)求,讀入模板信息,處理后利用請(qǐng)求發(fā)送給。首先,調(diào)用拿到對(duì)應(yīng)的。Heat 是由AWS的EC2 Cloud Formation 演化而來(lái),是openstack中負(fù)責(zé)Orchestration的service, 用于openstack 中資源的編排,它通過(guò)將OpenStack中的資源(resource)以模...
閱讀 2412·2021-08-18 10:21
閱讀 2519·2019-08-30 13:45
閱讀 2155·2019-08-30 13:16
閱讀 2100·2019-08-30 12:52
閱讀 1363·2019-08-30 11:20
閱讀 2622·2019-08-29 13:47
閱讀 1622·2019-08-29 11:22
閱讀 2760·2019-08-26 12:11