Optimal resource allocation in HIV self-testing secondary distribution among Chinese MSM: data-driven integer programming models

Human immunodeficiency virus self-testing (HIVST) is an innovative and effective strategy important to the expansion of HIV testing coverage. Several innovative implementations of HIVST have been developed and piloted among some HIV high-risk populations like men who have sex with men (MSM) to meet the global testing target. One innovative strategy is the secondary distribution of HIVST, in which individuals (defined as indexes) were given multiple testing kits for both self-use (i.e.self-testing) and distribution to other people in their MSM social network (defined as alters). Studies about secondary HIVST distribution have mainly concentrated on developing new intervention approaches to further increase the effectiveness of this relatively new strategy from the perspective of traditional public health discipline. There are many points of HIVST secondary distribution in which mathematical modelling can play an important role. In this study, we considered secondary HIVST kits distribution in a resource-constrained situation and proposed two data-driven integer linear programming models to maximize the overall economic benefits of secondary HIVST kits distribution based on our present implementation data from Chinese MSM. The objective function took expansion of normal alters and detection of positive and newly-tested ‘alters’ into account. Based on solutions from solvers, we developed greedy algorithms to find final solutions for our linear programming models. Results showed that our proposed data-driven approach could improve the total health economic benefit of HIVST secondary distribution. This article is part of the theme issue ‘Data science approaches to infectious disease surveillance’.


Introduction
Men who have sex with men (MSM) are currently the most vulnerable community populations affected by the human immunodeficiency virus (HIV). In China, the percentage of MSM recorded as new HIV infection cases is still rising [1]. HIV status unawareness is the main factor leading to the current HIV epidemic among MSM. Providing routine testing to enable infected individuals to know their HIV-positive status and facilitating their treatment initiation contributes significantly to HIV prevention interventions [2]. Over 40% of Chinese MSM have never been tested [3], and over 30% of HIV-infected individuals remain unaware of their serostatus [4]. An innovative and effective strategy named HIV self-testing (HIVST) that can improve the protection of the testers' privacy and increase their willingness to test has been recommended by the World Health Organization (WHO) for global consideration [5,6]. HIVST is an optimal strategy to address the persisting concerns about HIV testing like stigma and the lack of trust towards medical institutions [7] and to reach marginalized populations at high risk of infection.
Drawing on the progress made by HIVST, secondary distribution further expands HIV testing coverage. It has been developed and piloted in many countries recently [8][9][10][11][12]. In the secondary distribution of HIVST among MSM, individuals (defined as indexes) obtain several HIVST kits and then distribute these testing kits to members within their social networks like stable sexual partners, casual sex partners and MSM community friends (defined as 'alters'). Our previous work has implemented a secondary distribution program of HIVST from September 2019 to September 2020 in China [11,12], which demonstrated the effectiveness of HIVST secondary distribution in expanding testing coverage.
As an emerging form of distribution, current research into secondary HIVST distribution mainly focuses on the development and validation of intervention approaches (e.g. using online social media platforms [10], employing monetary and psychological incentives [11,12]) to better promote the secondary distribution of HIVST, and verify these through traditional public health methodologies such as a randomized controlled trial (RCT) [11] or a quasi-experiment [13]. Nonetheless, there is still much room for secondary HIVST distribution optimization where mathematical modelling could make a difference. Usually, the same number of kits indexes applied for will be dispatched to all indexes for further distribution to alters in this current model. However, a waste of undistributed kits usually happens as some indexes fail to find suitable alters willing to use them. For example, in our previous study, the 309 indexes applied for 759 kits for distribution. Unfortunately, they only distributed 269 kits (i.e. successfully found 269 unique alters), and over 60% (490 test kits) were wasted [11,12]. In a developed or developing country, resources for HIV testing kits might be rich enough. However, the cost of self-testing is relatively high in low-income and least-developed country where healthcare resources are always constrained [14]. Hence, such waste of self-testing kits could impact the further implementation of secondary distribution of HIVST.
Considering a resource-constrained situation of testing kits in the secondary distribution of HIVST, this paper proposed two original data-driven integer programming models. We used mathematical models to determine the number of test kits dispatched to each index participant instead of self-application to achieve optimal resource distribution. In our integer programming models, the objective function addressed the overall health economic benefits of HIVST secondary distribution based on our actual implemented trial among Chinese MSM [11,12], where the economic benefits of covering regular alters and positive & newly tested alters were all taken into consideration. We developed greedy algorithms to obtain final solutions to the linear programming models based on the solver's solutions. The final results showed that the proposed data-driven approach could enhance the health economic benefits of HIVST secondary distribution.
The other aspects of this study are as follows: §2 reviewed some related work. Section 3 illustrates these two linear-integer programming models derived from our actual data. The following §4 explains our greedy algorithms and §5 describes the results. Finally, we present the conclusions in §6.

Literature review
Our paper relates to existing literature on resource-allocation problems for HIV prevention and control. Most previous studies address such problems from a macroscopic view, considering the global optimal resource allocation for HIV prevention and treatment instead of a specific HIV prevention/control program. For example, Zaric & Brandeau [15] built a mathematical optimization model to determine how to optimally allocate HIV prevention funds for many HIV prevention measures (i.e. how to allocate funds for each HIV prevention program) and evaluated healthcare outcomes based on different optimization methods [15]. And Kaplan & Merson [16] discussed the trade-off between efficiency and equity when allocating HIV prevention resources [16]. Additionally, Brandeau et al. [17] adopted information relevant to HIV prevention program production functions with similar limited-resource settings to determine how many resources to dispatch to each HIV prevention program [17]. Lasry et al. [18,19] proposed an optimization model with linear constraints for allocating HIV prevention resources for the Centers for Disease Control and Prevention (CDC) of the United States [18]. The model showed a good outcome for public health if their model could influence the CDC's decision [19]. Furthermore, Alistar et al. [20] studied the problem of allocating limited resources between HIV prevention and treatment. The objective function of that study considered the reproduction number R 0 , and thus, they aimed to minimize the function of R 0 [20]. This work targeted the overall resource allocation for two points in the HIV control process (i.e. prevention & treatment). Furthermore, Deo et al. [21] optimized global resource allocation for three aspects of HIV control at the Veterans Health Administration, including HIV screening, HIV testing and HIV care [21].
On the other hand, few aspects of studying limited resource-allocation problems in HIV control are concentrated on a specific strategy/program from a micro point of view. For instance, to better control infant HIV-infection, Deo & Sohoni [22] showed how to allocate point-of-care devices in resource-limited settings [22], and Jónasson et al. [23] described optimal locations for diagnostic equipment and laboratories [23]. Both strategies led to the efficiency enhancement of early infant diagnosis of HIV. The studies referenced above share similar motivation with this present study as they all consider allocating a fixed budget or resource for the optimal control of HIV prevention. However, our work is different as it examines a very unconventional problem with distinct dynamics (i.e. secondary distribution of HIVST

Integer programming models
We based our data-driven integer programming models on our two previous studies [12,24]. We conducted an HIVST secondary distribution program among Chinese MSM, and there were 309 indexes with 269 alters. Some of the alters had newly tested, while some were HIV-positive testers. It's meaningful for an index to distribute an HIVST kit to a new tester as previous studies showed that over 40% of Chinese MSM have never been tested [3]. Increasing HIV testing yield and identifying more HIV-infected alters unaware of their status played a vital role in HIV infection control. Thus, it could be more crucial to detect a newly tested alter or an HIV-infected alter. We estimated the economic value and benefit of distributing the HIVST kit to an alter and the alter being a new tester or HIV-positive from a health economics perspective in our previous study [12].

(a) Model I
In Model I, we only consider allocating HIV testing kits to key influential indexes. Indexes not identified as significant influencers by the ensemble machine learning model [24] were therefore not considered. Hence, we only concentrated on key influencers like factors that optimize the secondary distribution of HIVST kits. In practice, this stream releases the pressure of healthcare service providers or social workers as they only need to work for around 20% key influential indexes with full attention.
The number of kits to be allocated to each essential influential index was determined using Model I. That was with the final goal of maximizing the economic benefit, which considers the overall effect of normal alters expansion (i.e. distributing one kit to an alter for HIV testing) and positive & newly testing alters detection (i.e. this alter is confirmed later as HIV-positive case or first-time tester). We define the sets, parameters, and variables in the following table 1. These above settings are data-driven from our actual implementation of an HIVST secondary distribution program, and thus our model is data-driven in a sense. Then the Model I is defined as followed.
Objective: maximize the sum of total health economic benefit of HIVST secondary distribution for all key influential indexes, max k∈K a∈A where Pr a,k ∈ Q as well as Pr a,k ∈ [0, 1]. Here, the item Pr a,k x k means the greatest integer number not exceeding Pr a,k x k (i.e. the Gaussian greatest integer function). For example, if Pr a,k = 0.25 and x k = 7, then Pr a,k x k = 1.75, thus Pr a,k x k = 1.75 = 1. The adoption of Gaussian greatest integer function here is due to the fact that we need to ensure that the output number of each type of alters is also an integer. Then we introduce constraints. Subject to: constraints of integer decision variables, of limited resources settings, and of upper bound of kits' number for each index can distribute.
To be specific, constraints (3.2) are our kernel constraints that ensure a fixed resource setting for the optimal allocation of HIV prevention interventions.
Constraints (3.3) consider the upper bound of the number of HIVST kits that each key influential index k can distribute out in the secondary distribution program. Generally, an index cannot distribute more kits than the corresponding number of members within his social network. We already counted the number of stable sexual partners and the number of casual sex partners each index had in the past three months. We also counted the number of MSM community friends each index could contact in the next three months as part of our survey data.        Constraints (3.4) ensure that each decision variable, the number of HIVST kits allocating to key influential index k, is a non-negative integer.

(b) Model II
Model II determined the allocation of HIV test kits to all indexes (both key influencers and nonkey-influencer) who signed up for the secondary HIVST distribution program. Key influencers were considered indexes more likely to distribute more than one kit (i.e. greater than or equal to 2 kits), and thus we set one more constraint that we are supposed to allocate at least two kits to such key influencers. Additionally, the workload of the gay-led community-based organization had already increased as we considered all indexes. Hence, we primarily examined the indexes from the same residence city or the same residence province (i.e. other cities within the same province) with the gay-led community organization. It conveniently enabled participant followup for feedback and healthcare management post-self-testing. In practice, this strategy could expand HIV testing coverage, although this may increase the workload of healthcare service providers or social workers.
Model II was used for determining how many kits should be allocated to each index no matter whether this index is a key influential one or not, with the same final goal of maximizing the economic benefits, which considers the overall effect of normal alters expansion and positive & newly tested alters detection. We define the sets, parameters and variables of Model II in table 2. These settings in Model II are also data-driven from our actual implementation experience of an HIVST secondary distribution program. Then the Model II is defined as follows.
Objective: maximize the sum of total health economic benefit of HIVST secondary distribution for both key influential indexes and non-key indexes, max k∈K a∈A l∈L Pr a,k l x k,l E a + i∈I a∈A l∈L     So does the item Pr i,k l x i,l . The adoption of Gaussian greatest integer function here is also due to the fact that we need to ensure the output number of each type of alters is also an integer. Then we introduce constraints.
Subject to: constraints of integer decision variables, of limited resources settings, of upper or lower bound of kits' number for each index can distribute, and of location consideration k∈K l∈L ∀ k ∈ K, ∀ l ∈ L, x k,l ∈ N * (3.12) and To be specific, constraints (3.6) are the motivation of our study, that is, to consider a limited resource situation of HIVST secondary distribution kits' optimal allocation for indexes.
Constraints (3.7) and (3.8) both consider the location issues. As we mentioned before, Model II increases the workload of healthcare service providers (i.e. a gay-led community organization in our program). Therefore, it would be better to decrease the workload like follow-up testing feedback collection, health management, and other similar things. It is more convenient for social workers in our gay-led community organization to operate the follow-up testing feedback and health management of indexes who come from the same city (i.e. location 1) with our organization or from other cities within the same province (i.e.location 2). As a result, constraints (3.7) and (3.8) are set, obeying the suggestions by social workers in our practical HIVST secondary distribution implementation.
Constraints (3.9) consider the upper bound of the number of HIVST kits that each key influential index k can distribute out in the secondary distribution program. Generally, an index cannot distribute more test kits than the corresponding number of members within his social network neighbours. We have already counted his stable sexual partners in the latest three months, his casual sex partners in the latest three months, and his other MSM community friends whom he could contact in the forthcoming three months, according to our survey data.
Constraints (3.10) consider the upper bound of the number of HIVST kits that each non-key index i can distribute out in the secondary distribution program. The explanations of this upper bound are similar to constraint (3.9).
Constraints (3.11) set the lower bound of the number of HIVST kits that each key influential index k is supposed to distribute out in the secondary distribution program. It is related to our previous study of identifying vital influencers via an ensemble machine learning approach [24], in which we defined key influencers as indexes that could distribute at least two kits. In other words, we set whether '≥ 2 kits' or not for training in that machine learning classification task such constraint was hence developed.
Constraints (3.12) ensure that each decision variable for key influential index k, i.e. the number of HIVST kits allocated to every one, is a non-negative integer.
Constraints (3.13) ensure that each decision variable for non-key index i, i.e. the number of HIVST kits allocated to every one, is a non-negative integer.

Greedy algorithm
There are no existing solvers for Model I and Model II, as we incorporate the Gaussian greatest integer function in objective functions. However, if we change the objective function by temporally excluding the Gaussian greatest integer function, such as changing objective function (3.1) to k∈K a∈A Pr a,k x k E a , (4.1) and changing objective function (3.5) to k∈K a∈A l∈L Pr a,k l x k,l E a + i∈I a∈A l∈L then our two models both become the standard linear integer programming models. We could use existing solvers for integer programming such as CPLEX or some packages in python or R. Assuming that we have obtained the temporary solutions from solvers in terms of objective function (4.1) and (4.2) (i.e. the temporary values of x k for Model I, as well of x k,l and x i,l for Model II), then we can calculate the value of (3.1) and (3.5) based on such temporary solutions, which we call temporary solver solutions. Pr a,k x k E a , (4.3) and it also holds that there is a similar inequality between (3.5) and (4.2). These two inequalities tell us that the optimal value of (3.1) and (3.5) will definitely be less than or equal to the optimal value of (4.1) and (4.2), respectively. However, the temporary solver solutions-based temporary value of (3.1) and (3.5) might be further improved. Therefore, our greedy algorithm is to fine-tune the temporary solver solutions with the final goal of getting a greater value of objective function (3.1) and (3.5), comparing with the temporary value of (3.1) and (3.5) based on the temporary solver solutions.

(a) Algorithm development
We simplify (4.1) and (4.2), and regard them in a uniform way like where w j represents the product of the corresponding Pr j and the corresponding E j in terms of x j , j = 1, 2, . . . , n, that is w j = Pr j E j . (4.5) Thus, (3.1) and (3.5) could also have been rewritten in a uniform way like Then w 1 , w 2 , . . . , w n are sequenced by the following order with corresponding variables x (1) , x (2) , . . . , x (n) .
Assume that x * (j) is the temporary solution of x (j) to maximize the value of (4.4) with linear constraints (3.2) to (3.4) or (3.6) to (3.13) by the solver. According to the property of Gaussian greatest integer function like (4.3), it holds that the value of (4.6) by x * (j) is less than or equal to the value of (4.4) by x * (j) . Thus, we aim to increase the temporary value of (4.6) by x * (j) while there is an upper bound for such addition.
We fine-tune x * (j) by the rule that we substitute x * (1) + 1 with x * (1) and substitute x * (n) − 1 with x * (n) , if all linear constraints are met as well and the value of (4.6) is added through such fine-tuning, then we implement this update and repeat this step again, otherwise we will check similarly in terms of x * (2) + 1 and x * (n−1) − 1, and so on. We illustrate our greedy algorithms for Model I and Model II by the following algorithm 1.
As for Model II, we further propose a sub-grouping greedy algorithm (algorithm 2) especially for Model II considering the Constraints (3.7) and (3.8). The main idea is to fine-tune x within three subgroups (i.e. indexes at three locations are classified in three subgroups). Initiating greedy algorithm in each subgroup three times then Constraints (3.7) and (3.8) automatically hold.

(b) Mathematical derivation
We now explain why our greedy algorithms worked to find better solutions based on temporary solutions from solvers. Theorem 4.1. Assume that Pr j ∈ Q and Pr j ∈ [0, 1], E j ∈ R + , w j = Pr j E j . Consider a standard integer programming problem with objective function w 1 x 1 + w 2 x 2 + w 3 x 3 + · · · + w n x n with all linear constraints for non-negative integer variables x j . To maximize the objective function, suppose we obtain the solutions as x * j . A new objective function is defined as Pr 1 x 1 E 1 + Pr 2 x 2 E 2 + · · · + Pr n x n E n through Gaussian Greatest Integer Function. The value of the new objective function by x * j (i.e. Similarly, we write Pr j 2 = (b 2 /a 2 ). Then we have Pr j 1 x * j 1 = (b 1 /a 1 )x * j 1 . Assume (b 1 /a 1 )x * j 1 = y 1 where y 1 ∈ N and y 1 ≥ 0. Thus we have b 1 x * j 1 = a 1 y 1 + r 1 and the integer r 1 is the remainder of this division, i.e.0 ≤ r 1 < a 1 . Consider the value of (b 1 /a 1 )(x * j 1 + 1) . If the equation holds that b 1 (x * j 1 + 1) = a 1 (y 1 + 1) + r 1 then (b 1 /a 1 )(x * j 1 + 1) = y 1 + 1. The above equation requires the remainder r 1 of the division meets that 0 ≤ r 1 = r 1 + b 1 − a 1 < a 1 and because of a 1 ≤ 2a 1 − b 1 , thus the original remainder r 1 is supposed to meet the condition that (4.9) Similarly, assume (b 2 /a 2 )x * j 2 = y 2 where y 2 ∈ N and y 2 ≥ 0. As well b 2 x * j 2 = a 2 y 2 + r 2 and the integer r 2 is the remainder of this division, i.e.0 ≤ r 2 < a 2 . Consider the value of (b 2 /a 2 )(x * j 2 − 1) . If the equation holds that b 2 (x * j 2 − 1) = a 2 y 2 + r 2 then (b 2 /a 2 )(x * j 2 − 1) = y 2 stays the same value. The above equation requires the remainder r 2 of the division meets that 0 ≤ r 2 = r 2 − b 2 < a 2 , hence the original remainder r 2 is supposed to meet the condition that b 2 ≤ r 2 < a 2 . (4.10) Also assume another condition that when we do x * j 1 + 1 and x * j 2 − 1, all linear constraints are satisfied. Finally, we can make a statement that under these conditions, the inequality (4.8) holds then the value of the new objective function by x * j (i.e. Pr 1 x * 1 E 1 + Pr 2 x * 2 E 2 + · · · + Pr n x * n E n ) is NOT the maximized value of the new objective function.

Results
We compared the economic benefit of the conventional self-application method and our mathematical optimization modelling method.
First, only principally influential indexes were considered. Therefore, we compared results from their conventional self-application approach with the mathematically optimized model approach for the same index with the same limited resource M setting (i.e. a fixed number of test kits, equal to the original total number of test kits the indexes received during actual implementation by self-application). The results are depicted in table 3.
The conventional economic benefit of kits self-application was 31 345.71 USD. The temporary solutions by a solver led to a result of 38 804.82 USD. That means that the total economic benefit increased by 23% with the linear programming model. Furthermore, the greedy algorithm could raise the economic benefits of the strategy to about 39 545.35 USD. That proves the effectiveness of our algorithm.
Second, the same limited resource M was set by the same fixed total number of test kits all indexes received during actual implementation by self-application when we consider both keyinfluential and non-key indexes. Table 4 shows details of the results. After the subsequent addition of Model II for optimal resource allocation for all indexes, the economic benefit has increased by around 45%. Besides, in Model II, we ensure Constraints (3.7) and (3.8), which could release the workload of healthcare service providers (e.g. a gayled community organization in our program). That includes follow-up on participants after secondary distribution and self-testing for feedback on the testing experience, participants' health management, and so on. However, in a conventional self-application pattern, the follow-up workload of healthcare service providers is heavier.
Additionally, note that the general greedy algorithm for Model II is not as effective as for Model I, but our sub-grouping greedy algorithm plays a part. That might be owing to the fact that the fine-tuning process of general greedy algorithm is restricted by Constraints (3.7) and (3.8). The point is that in the general greedy algorithm, the fine-tuning function considers [n/2] pairs of x and over 25% pairs fail in being eligible for Constraints (3.7) and (3.8) based on our real data. In sub-grouping greedy algorithm, the fine-tuning function considers [c/2] + [p/2] + [o/2] pairs of x (where c + p + o = n). All pairs of x in the fine-tuning process of sub-grouping greedy algorithm were automatically eligible for Constraints (3.7) and (3.8). Hence, we only need to check other constraints and whether S > S. The result of the sub-grouping greedy algorithm for Model II has demonstrated this point.

Conclusion
Secondary HIVST kits distribution has proven to be an effective strategy in HIV prevention and should be upscaled in more countries [10,12]. For low-income countries (LIC) with limited healthcare resources, implementing secondary HIVST distribution might need more consideration. This study evaluated two data-driven integer programming models adopted in determining the optimal resource allocation for initial test kits dispatch in secondary HIVST distribution among Chinese MSM. Our results showed an increase in the economic benefits of secondary HIVST using the greedy algorithms developed to solve mathematical optimization problems. Such a data-driven approach to optimizing resource allocation in limited-resource settings could be used as a reference to guide the implementation of secondary HIVST distribution in LIC. Future research into this approach may lie in a quasi-experimental trial conducted from a traditional public health perspective to compare the actual economic benefit outcome of conventional self-application with that of our models and to make managerial and policy implications analysis, as this current one is a retrospective modelling study.
Ethics. Our dataset is collected from a survey and its ethical review of biomedical research has been obtained from the Ethics Committee of Zhuhai Center for Disease Control and Prevention prior to study enrollment (Number: ZhuhaiCDC-201901). For the survey data collection, all participants have provided online consent and signed it electronically prior to taking part in our studies.