前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >Quant面试『真题』系列:第二期

Quant面试『真题』系列:第二期

作者头像
量化投资与机器学习微信公众号
发布于 2022-05-23 03:18:16
发布于 2022-05-23 03:18:16
1.3K0
举报

量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能大数据等领域的主流自媒体。公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,连续2年被腾讯云+社区评选为“年度最佳作者”。

量化投资与机器学公众号在2022年又双叒叕开启了一个全新系列:

QIML汇集了来自全球顶尖对冲基金、互联网大厂的真实面试题目。希望给各位读者带来不一样的求职与学习体验!

往期回顾:第一期

本期题目来自:Citadel、Two Sigma、Morgan Stanley

第二期

▌出题机构:Citadel

▌题目难度:Medium

题目

Compare and contrast Gaussian Naive Bayes (GNB) and logistic regression, When would you use one over the other?

答案

Both Gaussian naive Bayes (GNB) and logistic regression can be used for classification. The two models each have advantages and disadvantages, which provide the answer as to which to choose under what circumstances. These are discussed below, along with their similarities and differences:  

Advantages:  

1.GNB requires only a small number of observations to be adequately trained; it is also easy to use and reasonably fast to implement; interpretation of the results produced by GNB can also be highly useful.

2.Logistic regression has a simple interpretation in terms of class probabilities, and it allows inferences to be made about features (i.e. variables) and identification of the most relevant of these with respect to prediction.  

Disadvantages:  

1. Bu assuming features (i.e., variables) to be independent, GNB can be wrongly employed in problems where that does not hold true. a very common occurrence.  

2. Not highly flexible, logistic regression may fail to capture interactions between features and so may lose prediction power. This lack of flexibility can also lead to overfitting if very little data are available for training.  

Differences:

1.Since logistic regression directly learns , it is a discriminative classifier, whereas GNB directly estimates   and   and so is a generative classifier.

2.Logistic regression requires an optimization setup (where weights cannot be learned directly through counts), whereas GNB requires no such setup.

Similarities:

1.Both methods as linear decision functions generated from training data.

2.GNB’s implied   is the same as that of logistic regression (but with particular parameters).

Given these advantages and disadvantages, logistic regression would be preferable assuming training provided data size is not an issue, since the assumption of conditional independence breaks down if features are correlated. however, in cases where training data are limited or the data-generating process includes strong priors, using GNB may be preferable.

---

▌出题机构:Two Sigma

▌题目难度:Hard

题目

Describe the kernel trick in SVMs and give a simple example. How do you decide what kernel to choose?

答案

The idea behind the kernel trick is that data cannot be separated by a hyperplane in its current dimensionality can actually be linearly separable by projecting it onto a higher dimensional space. and we can table any data and map that data to a higher dimension through a variety of functions phi. However, if phi is difficult to compute, then we have a problem — instead, it is desirable if we can compute the value of k without blowing up the computation.  For instance, say we have two examples and want to map them to a quadratic space.

We have the following:

and we can use the following:

If we now change n = 2 (quadratic) to arbitrary n, we can have arbitrarily complex phi. As long as we perform computations in the original feature space (without a feature transformation), then we avoid the long compute time while still mapping our data to a higher dimension!

In terms of which kernel to choose, we can choose between linear and nonlinear kernels, and these will be for linear and nonlinear problems, respectively. For linear problems, we can try a linear or logistic kernel. For nonlinear problems, we can try either radial basis function (RBF) or Gaussian kernels.  In real-life problems, domain knowledge can he handy — in the absence of such knowledge, the above defaults are probably good starting points.

We could also try many kernels, and set up a hyper-parameter search (a grid search, for example) and compare different kernels to one another. Based on the loss function at hand, or certain performance metrics (accuracy, F1, AUC of the ROC curve, etc.), we can determine which kernel is appropriate.

---

▌出题机构:Morgan Stanley

▌题目难度:Hard

题目

Say we have N observations for some variable which we model as being drawn from a Gaussian distribution. What are your best guesses for the parameters of the distribution?

答案

Assume we have some dataset X consisting of n i.i.d observations :

Our likelihood function is the where

and therefore the log-likelihood is given by:

aking the derivative of the log-likelihood with respect to  and setting the result to 0 yields the following:

Simplifying the result yields: , and therefore the maximum likelihood estimate for is given by:

To obtain the variance, we take the derivative of the log-likelihood with respect to   and set the result equal to 0:

Simplifying yields the following:

---

▌出题机构:Two Sigma

▌题目难度:Hard

题目

Suppose you are running a linear regression and model the error terms as being normally distributed. Show that, in this setup, maximizing the likelihood of the data is equivalent to minimizing the sum of the squared residuals. 

答案

In matrix form, we assume is distributed as multivariate Gaussian:

The likelihood of given above is

Of which we can take the log in order to optimize:

Note that, when taking a derivative with respect to , the first term is a constant, so we can ignore it, making our optimization problem as follows:

We can ignore the constant and flip the sign to rewrite as the following:

which is exactly equivalent to minimizing the sum of the squared residuals.

---

▌出题机构:Citadel

▌题目难度:Hard

题目

Describe the model formulation behind logistic regression, How do you maximize the log-likelihood of a given model (using the two-class case)?

答案

Logistic regression aims to classify   into one of   classes by calculating the following:

Therefore, the model is equivalent to the following, where the denominator normalizes the numerator over the   classes:

The log-likelihood over  observations, in general, is the following:

Use the following notation to denote classes 1 and 2 for the two-class case:

Then we have the following:

Using the following notation, such that the log-likelihood can be written as follows:

Simplifying yields the following:

Substituting for the probabilities yields the following:

To maximize this log-likelihood, take the derivative and set it equal to 0

We note that:

which is equivalent to the latter half of the above expression:  

The solutions to these equations are not closed-form, however, and, hence, the above should be iterated until convergence.

---

相关阅读

干翻机器学习面试!

全程干货!Citadel在职Quant求职经验分享

G-Research:量化研究员面试『真题』

小编尽力了!G-Research量化面试『真题』答案出炉!

Quant Puzzle:高级享受!

独家!中国量化私募面试Q&A系列——鸣石投资

独家!中国量化私募面试Q&A系列——白鹭资管

Quant求职系列:Jane Street烧脑Puzzle(2019-2020)

Two Sigma:面试还是挺难(附面经)!

你能做几道?Jane Street烧脑面试题!

独家!全球顶尖对冲基金LeetCode面试题汇总

挑战Man Group!顶级对冲基金的10道Python面试题

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-05-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 量化投资与机器学习 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
Quant面试『真题』系列:第一期
量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能、大数据等领域的主流自媒体。公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,连续2年被腾讯云+社区评选为“年度最佳作者”。 量化投资与机器学公众号在2022年又双叒叕开启了一个全新系列: QIML汇集了来自全球顶尖对冲基金、互联网大厂的真实面试题目。希望给各位读者带来不一样的求职与学习体验! 第一期 ▌出题机构:AQR ▌题目难度:Easy 题目 Say that you are runni
量化投资与机器学习微信公众号
2022/05/10
8790
Quant面试『真题』系列:第一期
Quant面试『真题』系列:第三期
量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能、大数据等领域的主流自媒体。公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,连续2年被腾讯云+社区评选为“年度最佳作者”。 量化投资与机器学公众号在2022年又双叒叕开启了一个全新系列: QIML汇集了来自全球顶尖对冲基金、互联网大厂的真实面试题目。希望给各位读者带来不一样的求职与学习体验! 往期回顾:第一期、第二期 第三期 ▌题目难度:Medium 题目 If your Time-Serie
量化投资与机器学习微信公众号
2022/06/08
1.1K0
Quant面试『真题』系列:第三期
面向工程师的最佳统计机器学习课程,Fall 2017 美国圣母大学,28章节详细讲述(附PPT下载,课程目录视频)
【导读】美国圣母大学2017年新开课程《给科学家和工程师的统计学习》Statistical Computing for Scientists and Engineers 涵盖了统计学习中的几乎所有重要知识,包括《概率与统计、信息论、多维高斯分布、最大后验估计、贝叶斯统计、指数族分布、贝叶斯线性回归、蒙特卡洛方法、重要性采样、吉布斯采样、状态空间模型、EM算法、主成分分析、连续隐变量模型、核方法与高斯过程等》,并提供视频,PPT,课程作业及其参考答案与代码,还有大量参考学习资源,是不可多得的统计学习课程。
WZEARW
2018/04/12
1K0
面向工程师的最佳统计机器学习课程,Fall 2017 美国圣母大学,28章节详细讲述(附PPT下载,课程目录视频)
Two Sigma:面试真题 - 编程(下)
量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能、大数据等领域的主流自媒体。公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,荣获2021年度AMMA优秀品牌力、优秀洞察力大奖,连续2年被腾讯云+社区评选为“年度最佳作者”。 上一起,QIML为大家分享几道有关Two Sigma面试的计算真题。今天,我们主要为大家分享几道编程真题。 Two Sigma:面试真题(上) 量化对冲基金技术面试中一般都会有pair coding的部分,主要是测试候选
量化投资与机器学习微信公众号
2022/09/22
9850
Two Sigma:面试真题 - 编程(下)
【收藏】人工智能术语表
English Terminology中文术语neural networks神经网络activation function激活函数hyperbolic tangent双曲正切函数bias units偏置项activation激活值forward propagation前向传播feedforward neural network前馈神经网络Backpropagation Algorithm反向传播算法(batch) gradient descent(批量)梯度下降法(overall) cost functio
机器人网
2018/04/12
7620
机器学习专业名词中英文对照
部分转自 知乎 部分转自 AI人工智能专业词汇集 部分转自 百度文库 可参考链接:机器之心 https://blog.csdn.net/liuxiao214/article/details/78130910
iOSDevLog
2019/06/11
2K0
统计学学术速递[8.16]
【1】 Non-parametric estimation of cumulative (residual) extropy with censored observations 标题:删失观测值下累积(残差)外部性的非参数估计 链接:https://arxiv.org/abs/2108.06324
公众号-arXiv每日学术速递
2021/08/24
5370
成长之路第二期:来自谷歌小姐姐的攻略
首先来个简单的自我介绍,我是精算统计专业的本科和研究生,第一份工作MMC数据科学机器学习方向,第二份工作是Google软件工程师。从一开始选择数学专业只是单纯的觉得选择题中他有一个既定答案,对就是对,错就是错,而不像文学类的ABCD都是对的,但C更好。同大家一样,我也不停的经历着,学习,崩溃,无助,迷茫,也在一点一点摸索,一点一点得到。
一头小山猪
2022/11/24
4040
成长之路第二期:来自谷歌小姐姐的攻略
这 725 个机器学习术语表,太全了!
大家好,我是东哥。 下面是几位机器学习权威专家汇总的725个机器学习术语表,非常全面了,值得收藏! 英文术语 中文翻译 0-1 Loss Function 0-1损失函数 Accept-Reject Sampling Method 接受-拒绝抽样法/接受-拒绝采样法 Accumulated Error Backpropagation 累积误差反向传播 Accuracy 精度 Acquisition Function 采集函数 Action 动作 Activation Function 激活函数 Active
Python数据科学
2021/10/21
6470
G-Research:量化研究员面试『真题』
量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能、大数据等领域的主流自媒体。公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,连续2年被腾讯云+社区评选为“年度最佳作者”。 G-Research是欧洲领先的量化对冲基金,对于量化研究员的招聘一直非常严苛。对于量化研究员的面试准备,G-Research也给出了比较明确的指导。G-Research建议第一轮面试的准备最多两周的时间,主要从以下几方面准备: Quant Finance 熟悉一些基本的
量化投资与机器学习微信公众号
2022/03/03
1.4K0
高斯过程 Gaussian Processes 原理、可视化及代码实现
本文解析了高斯过程进行公式推导、原理阐述、可视化以及代码实现,并介绍了高斯过程回归基本原理、超参优化、高维输入等问题。
AI算法与图像处理
2020/11/06
6.3K0
高斯过程 Gaussian Processes 原理、可视化及代码实现
统计学学术速递[7.16]
【1】 Mid-flight Forecasting for CPA Lines in Online Advertising 标题:网络广告中CPA线路的中途预测
公众号-arXiv每日学术速递
2021/07/27
6810
机器之心开放人工智能专业词汇集(附Github地址)
机器之心原创 机器之心编辑部 作为最早关注人工智能技术的媒体,机器之心在编译国外技术博客、论文、专家观点等内容上已经积累了超过两年多的经验。期间,从无到有,机器之心的编译团队一直在积累专业词汇。虽然有很多的文章因为专业性我们没能尽善尽美的编译为中文呈现给大家,但我们一直在进步、一直在积累、一直在提高自己的专业性。 两年来,机器之心编译团队整理过翻译词汇对照表「红宝书」,编辑个人也整理过类似的词典。而我们也从机器之心读者留言中发现,有些人工智能专业词汇没有统一的翻译标准,这可能是因地区、跨专业等等原因造成的
机器之心
2018/05/08
2.1K0
机器之心开放人工智能专业词汇集(附Github地址)
sklearn API 文档 - 0.18 中文翻译
所有函数和类的确切API,由docstrings给出。API会为所有功能提供预期类型和允许的功能,以及可用于算法的所有参数。 原文链接 : http://scikit-learn.org/stab
片刻
2018/01/05
3.6K0
神经网络思想建立LR模型(DL公开课第二周答案)
LR回顾 LR计算图求导 算法结构 设计一个简单的算法实现判别是否是猫。 用一个神经网络的思想建立一个LR模型,下面这个图解释了为什么LR事实上是一个简单的神经网。 [图片上传失败...(image
用户1332428
2018/03/30
8300
神经网络思想建立LR模型(DL公开课第二周答案)
统计学学术速递[7.29]
【1】 Identification of parameters in the torsional dynamics of a drilling process through Bayesian statistics 标题:基于贝叶斯统计的钻井过程扭转动力学参数辨识
公众号-arXiv每日学术速递
2021/07/30
4370
坚持打卡23天可以做什么丨吴恩达机器学习丨思维导图
机器学习是目前信息技术中最激动人心的方向之一。本文以吴恩达老师的机器学习课程为主线,使用 Process On 在线绘图构建机器学习的思维导图。
AXYZdong
2022/02/16
3650
坚持打卡23天可以做什么丨吴恩达机器学习丨思维导图
统计学学术速递[12.17]
【1】 A new locally linear embedding scheme in light of Hessian eigenmap 标题:一种新的基于Hessian特征映射的局部线性嵌入方案 链接:https://arxiv.org/abs/2112.09086
公众号-arXiv每日学术速递
2021/12/17
5800
吴恩达机器学习丨思维导图丨坚持打卡23天
机器学习是目前信息技术中最激动人心的方向之一。本文以吴恩达老师的机器学习课程为主线,使用 Process On 在线绘图构建机器学习的思维导图。
AXYZdong
2022/05/05
3660
吴恩达机器学习丨思维导图丨坚持打卡23天
统计学学术速递[6.30]
【1】 On the Optimal Configuration of a Square Array Group Testing Algorithm 标题:关于方阵分组测试算法的最优配置
公众号-arXiv每日学术速递
2021/07/02
1.2K0
相关推荐
Quant面试『真题』系列:第一期
更多 >
领券
💥开发者 MCP广场重磅上线!
精选全网热门MCP server,让你的AI更好用 🚀
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档