论文标题
渐近上的最佳界限,用于估算均方根时间的H索引,并应用于子图计数
Asymptotically Optimal Bounds for Estimating H-Index in Sublinear Time with Applications to Subgraph Counting
论文作者
论文摘要
$ h $ index是一种用于衡量用户在出版物环境中的影响的度量,例如社交网络的成员,拥有许多高度喜欢的帖子或在学术领域中的研究人员,其中许多引用了许多引用的出版物。具体而言,用户的$ h $ index是最大的整数$ h $,因此用户的至少$ h $出版物至少具有$ h $ h $单位的积极反馈。 我们设计了一种算法,该算法对用户的$ n $出版物的查询访问以及每个出版物的相应反馈号码,输出$(1 \ pm \ pm \ varepsilon)$ - 该用户的$ h $ index的近似值,概率至少是$ 1-δ$。 o(\ frac {n \ cdot \ ln {(1/δ)}}}} {\ varepsilon^2 \ cdot h}), \]其中$ h $是算法A-Priori未知的实际$ h $ index。然后,我们设计了一种新颖的下限技术,使我们能够证明该界限实际上在所有参数中对此问题均无最佳,$ n,h,h,\ varepsilon,$和$δ$。 我们的工作是跨金属时间算法中的第一个,该算法解决了获得渐近最佳界限的问题,尤其是在误差和置信参数方面。因此,我们专注于为这项任务设计新技术。特别是,我们的下边界技术似乎很笼统 - 为了展示这一点,我们还使用方法证明了渐近性估算图中三角形数量的渐近最佳下限,现在这在误差和置信参数中也是最佳的。对于此问题的Eden,Levi,Ron和Seshadhri(focs'15)的先前下限,此结果改善了,以及多次扩展到其他子图计数问题的后续行动。
The $h$-index is a metric used to measure the impact of a user in a publication setting, such as a member of a social network with many highly liked posts or a researcher in an academic domain with many highly cited publications. Specifically, the $h$-index of a user is the largest integer $h$ such that at least $h$ publications of the user have at least $h$ units of positive feedback. We design an algorithm that, given query access to the $n$ publications of a user and each publication's corresponding positive feedback number, outputs a $(1\pm \varepsilon)$-approximation of the $h$-index of this user with probability at least $1-δ$ in time \[ O(\frac{n \cdot \ln{(1/δ)}}{\varepsilon^2 \cdot h}), \] where $h$ is the actual $h$-index which is unknown to the algorithm a-priori. We then design a novel lower bound technique that allows us to prove that this bound is in fact asymptotically optimal for this problem in all parameters $n,h,\varepsilon,$ and $δ$. Our work is one of the first in sublinear time algorithms that addresses obtaining asymptotically optimal bounds, especially in terms of the error and confidence parameters. As such, we focus on designing novel techniques for this task. In particular, our lower bound technique seems quite general -- to showcase this, we also use our approach to prove an asymptotically optimal lower bound for the problem of estimating the number of triangles in a graph in sublinear time, which now is also optimal in the error and confidence parameters. This result improves upon prior lower bounds of Eden, Levi, Ron, and Seshadhri (FOCS'15) for this problem, as well as multiple follow-ups that extended this lower bound to other subgraph counting problems.