Consider this SQL statement: SELECT product, avg(prod_cost) FROM product_detail GROUP BY product. The GROUP BY clause implies what type of function?
Which analytic technique would be appropriate to estimate home sale price in U.S. dollars as a function of square footage, number of bedrooms, and lot size?
What action occurs during feature selection in the model building phase of the data analytics lifecycle?
What is a benefit of Spark in-memory data processing as opposed to using MapReduce?
You build a decision tree to classify five different types of customers based on their browsing history from a sample of 500. The resulting decision tree has 17 layers. One of the leaf nodes has only three customers.
What do you conclude?
A logistic regression model is built to determine the probability of a credit card borrower defaulting on a credit loan. A threshold value of 0.3 is selected. Which statement can be used to predict a borrower will default?
You have the data from a popular e-commerce website. You are exploring the time spent (in seconds) on the website by 100,000 customers across 14 different product categories.
What visualization can be used to represent the relationship between time spent and product category?
In the data preparation phase of the data analytics lifecycle, what does the term “data conditioning” refer to?