[null,null,["最后更新时间 (UTC):2025-05-12。"],[[["\u003cp\u003eGood feature vectors require features that are clearly named and have obvious meanings to anyone on the project.\u003c/p\u003e\n"],["\u003cp\u003eData should be checked and tested for bad data or outliers like inappropriate values before being used for training.\u003c/p\u003e\n"],["\u003cp\u003eFeatures should be sensible, avoiding "magic values" that create discontinuities; instead, use separate boolean features or new discrete values to indicate missing data.\u003c/p\u003e\n"],["\u003cp\u003eContinuous features should not have magic values representing the absence of measurement, but rather use separate Boolean features or discrete values.\u003c/p\u003e\n"],["\u003cp\u003eDiscrete numerical features with missing values should be assigned a new value within the finite set, enabling the model to learn weights for each value including missing features.\u003c/p\u003e\n"]]],[],null,["# Numerical data: Qualities of good numerical features\n\nThis unit has explored ways to map raw data into suitable\n[**feature vectors**](/machine-learning/glossary#feature_vector).\nGood numerical [**features**](/machine-learning/glossary#feature) share the\nqualities described in this section.\n\nClearly named\n-------------\n\nEach feature should have a clear, sensible, and obvious meaning to any human on\nthe project. For example, the meaning of the following feature value is\nconfusing:\n\nNot recommended\n\u003e house_age: 851472000\n\nIn contrast, the following feature name and value are far clearer:\n\nRecommended\n\u003e house_age_years: 27\n| **Note:** Although your co-workers will rebel against confusing feature and label names, the model won't care (assuming you normalize values properly).\n\nChecked or tested before training\n---------------------------------\n\nAlthough this module has devoted a lot of time to\n[**outliers**](/machine-learning/glossary#outliers), the topic is\nimportant enough to warrant one final mention. In some cases, bad data\n(rather than bad engineering choices) causes unclear values. For example,\nthe following `user_age_in_years` came from a source that didn't check for\nappropriate values:\n\nNot recommended\n\u003e user_age_in_years: 224\n\nBut people *can* be 24 years old:\n\nRecommended\n\u003e user_age_in_years: 24\n\nCheck your data!\n\nSensible\n--------\n\nA \"magic value\" is a purposeful discontinuity in an otherwise continuous\nfeature. For example, suppose a continuous feature named `watch_time_in_seconds`\ncan hold any floating-point value between 0 and 30 but represents the *absence*\nof a measurement with the magic value -1:\n\nNot recommended\n\u003e watch_time_in_seconds: -1\n\nA `watch_time_in_seconds` of -1 would force the model to try to figure\nout what it means to watch a movie backwards in time. The resulting model would\nprobably not make good predictions.\n\nA better technique is to create a separate Boolean feature that indicates\nwhether or not a `watch_time_in_seconds`\nvalue is supplied. For example:\n\nRecommended\n\u003e watch_time_in_seconds: 4.82 \n\u003e\n\u003e is_watch_time_in_seconds_defined=True\n\u003e\n\u003e watch_time_in_seconds: 0 \n\u003e\n\u003e is_watch_time_in_seconds_defined=False\n\nThis is a way to handle a continuous dataset with missing values. Now consider\na [**discrete**](/machine-learning/glossary#discrete_feature)\nnumerical feature, like `product_category`, whose values must belong to a finite\nset of values. In this\ncase, when a value is missing, signify that missing value using a new value in\nthe finite set. With a discrete feature, the model will learn different weights\nfor each value, including original weights for missing features.\n\nFor example, we can imagine possible values fitting in the set:\n\u003e {0: 'electronics', 1: 'books', 2: 'clothing', 3: 'missing_category'}.\n| **Key terms:**\n|\n| - [Outliers](/machine-learning/glossary#outliers)\n| - [Feature](/machine-learning/glossary#feature)\n- [Feature vector](/machine-learning/glossary#feature_vector) \n[Help Center](https://support.google.com/machinelearningeducation)"]]