এই পৃষ্ঠাটি Cloud Translation API অনুবাদ করেছে।

মেশিন লার্নিং শব্দকোষ: ML ফান্ডামেন্টালস

এই পৃষ্ঠায় ML Fundamentals শব্দকোষের শব্দকোষ রয়েছে। সমস্ত শব্দকোষের শব্দকোষের জন্য, এখানে ক্লিক করুন ।

ক

নির্ভুলতা

#মৌলিক বিষয়াবলী

#মেট্রিক

সঠিক শ্রেণীবিভাগের পূর্বাভাসের সংখ্যাকে মোট পূর্বাভাসের সংখ্যা দিয়ে ভাগ করলে। অর্থাৎ:

$$\text{Accuracy} = \frac{\text{correct predictions}} {\text{correct predictions + incorrect predictions }}$$

উদাহরণস্বরূপ, একটি মডেল যা ৪০টি সঠিক ভবিষ্যদ্বাণী এবং ১০টি ভুল ভবিষ্যদ্বাণী করেছে তার নির্ভুলতা হবে:

$$\text{Accuracy} = \frac{\text{40}} {\text{40 + 10}} = \text{80%}$$

বাইনারি শ্রেণীবিভাগ সঠিক ভবিষ্যদ্বাণী এবং ভুল ভবিষ্যদ্বাণীর বিভিন্ন শ্রেণীর জন্য নির্দিষ্ট নাম প্রদান করে। সুতরাং, বাইনারি শ্রেণীবিভাগের নির্ভুলতা সূত্রটি নিম্নরূপ:

$$\text{Accuracy} = \frac{\text{TP} + \text{TN}} {\text{TP} + \text{TN} + \text{FP} + \text{FN}}$$

কোথায়:

TP হলো প্রকৃত ধনাত্মক সংখ্যা (সঠিক ভবিষ্যদ্বাণী)।
TN হলো সত্য ঋণাত্মক সংখ্যা (সঠিক ভবিষ্যদ্বাণী)।
FP হলো মিথ্যা ধনাত্মক (ভুল ভবিষ্যদ্বাণী) সংখ্যা।
FN হলো মিথ্যা নেতিবাচক (ভুল ভবিষ্যদ্বাণী) সংখ্যা।

নির্ভুলতা এবং প্রত্যাহারের সাথে তুলনা করুন এবং বৈসাদৃশ্য করুন।

নির্ভুলতা এবং শ্রেণী-ভারসাম্যহীন ডেটাসেট সম্পর্কে বিস্তারিত জানতে আইকনে ক্লিক করুন।

যদিও কিছু পরিস্থিতিতে নির্ভুলতা একটি মূল্যবান মেট্রিক, অন্যদের জন্য এটি অত্যন্ত বিভ্রান্তিকর। উল্লেখযোগ্যভাবে, শ্রেণী-ভারসাম্যহীন ডেটাসেট প্রক্রিয়াকরণকারী শ্রেণিবিন্যাস মডেলগুলির মূল্যায়নের জন্য নির্ভুলতা সাধারণত একটি দুর্বল মেট্রিক।

উদাহরণস্বরূপ, ধরুন, একটি নির্দিষ্ট উপ-ক্রান্তীয় শহরে প্রতি শতাব্দীতে মাত্র ২৫ দিন তুষারপাত হয়। যেহেতু তুষারবিহীন দিনগুলি (ঋণাত্মক শ্রেণী) তুষারযুক্ত দিনের (ধনাত্মক শ্রেণী) তুলনায় অনেক বেশি, তাই এই শহরের তুষার ডেটাসেট শ্রেণী-ভারসাম্যহীন। একটি বাইনারি শ্রেণীবিভাগ মডেল কল্পনা করুন যা প্রতিদিন তুষারপাত বা তুষারপাতের পূর্বাভাস দেওয়ার কথা, তবে কেবল প্রতিদিন "তুষারপাত নেই" বলে ভবিষ্যদ্বাণী করে। এই মডেলটি অত্যন্ত নির্ভুল কিন্তু এর কোনও ভবিষ্যদ্বাণী করার ক্ষমতা নেই। নিম্নলিখিত সারণীটি এক শতাব্দীর ভবিষ্যদ্বাণীর ফলাফলের সারসংক্ষেপ তুলে ধরেছে:

বিভাগ	সংখ্যা
টিপি	0
টিএন	৩৬৪৯৯
এফপি	0
এফএন	২৫

অতএব, এই মডেলের নির্ভুলতা হল:

accuracy = (TP + TN) / (TP + TN + FP + FN)
accuracy = (0 + 36499) / (0 + 36499 + 0 + 25) = 0.9993 = 99.93%

যদিও ৯৯.৯৩% নির্ভুলতা খুবই চিত্তাকর্ষক শতাংশ বলে মনে হচ্ছে, মডেলটির আসলে কোনও ভবিষ্যদ্বাণীমূলক ক্ষমতা নেই।

শ্রেণী-ভারসাম্যহীন ডেটাসেটে প্রশিক্ষিত মডেলগুলি মূল্যায়নের জন্য নির্ভুলতার চেয়ে যথার্থতা এবং প্রত্যাহার সাধারণত বেশি কার্যকর মেট্রিক্স।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে শ্রেণীবিভাগ: নির্ভুলতা, প্রত্যাহার, নির্ভুলতা এবং সম্পর্কিত মেট্রিক্স দেখুন।

সক্রিয়করণ ফাংশন

#মৌলিক বিষয়াবলী

একটি ফাংশন যা নিউরাল নেটওয়ার্কগুলিকে বৈশিষ্ট্য এবং লেবেলের মধ্যে অরৈখিক (জটিল) সম্পর্ক শিখতে সক্ষম করে।

জনপ্রিয় অ্যাক্টিভেশন ফাংশনগুলির মধ্যে রয়েছে:

রিলু
সিগময়েড

অ্যাক্টিভেশন ফাংশনের প্লটগুলি কখনই একক সরলরেখা নয়। উদাহরণস্বরূপ, ReLU অ্যাক্টিভেশন ফাংশনের প্লট দুটি সরলরেখা নিয়ে গঠিত:

দুটি লাইনের একটি কার্টেসিয়ান প্লট। প্রথম লাইনের একটি ধ্রুবক y মান 0, যা x-অক্ষ বরাবর -অসীম,0 থেকে 0,-0 পর্যন্ত চলে। দ্বিতীয় লাইনটি 0,0 থেকে শুরু হয়। এই লাইনের ঢাল +1, তাই এটি 0,0 থেকে +অসীম,+অসীম পর্যন্ত চলে।

সিগময়েড অ্যাক্টিভেশন ফাংশনের একটি প্লট নিম্নরূপ দেখাচ্ছে:

একটি দ্বিমাত্রিক বক্ররেখা প্লট যার x মান ডোমেন -অসীম থেকে +ধনাত্মক পর্যন্ত বিস্তৃত, যেখানে y মানগুলি প্রায় 0 থেকে প্রায় 1 পর্যন্ত বিস্তৃত। যখন x 0 হয়, তখন y 0.5 হয়। বক্ররেখার ঢাল সর্বদা ধনাত্মক থাকে, সর্বোচ্চ ঢাল 0,0.5 থাকে এবং ধীরে ধীরে হ্রাস পায় ঢাল x এর পরম মান বৃদ্ধির সাথে সাথে।

একটি উদাহরণ দেখতে আইকনে ক্লিক করুন।

একটি নিউরাল নেটওয়ার্কে, অ্যাক্টিভেশন ফাংশনগুলি একটি নিউরনে সমস্ত ইনপুটের ওজনযুক্ত যোগফলকে নিয়ন্ত্রণ করে। একটি ওজনযুক্ত যোগফল গণনা করার জন্য, নিউরন প্রাসঙ্গিক মান এবং ওজনের গুণফল যোগ করে। উদাহরণস্বরূপ, ধরুন একটি নিউরনে প্রাসঙ্গিক ইনপুটটিতে নিম্নলিখিতগুলি রয়েছে:

ইনপুট মান	ইনপুট ওজন
২	-১.৩
-১	০.৬
৩	০.৪

অতএব, ওজনযুক্ত যোগফল হল:

weighted sum = (2)(-1.3) + (-1)(0.6) + (3)(0.4) = -2.0

ধরুন এই নিউরাল নেটওয়ার্কের ডিজাইনার সিগময়েড ফাংশনটিকে অ্যাক্টিভেশন ফাংশন হিসেবে বেছে নেন। সেক্ষেত্রে, নিউরন -২.০ এর সিগময়েড গণনা করে, যা প্রায় ০.১২। অতএব, নিউরনটি নিউরাল নেটওয়ার্কের পরবর্তী স্তরে ০.১২ (-২.০ এর পরিবর্তে) প্রেরণ করে। নিম্নলিখিত চিত্রটি প্রক্রিয়াটির প্রাসঙ্গিক অংশটি চিত্রিত করে:

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে নিউরাল নেটওয়ার্ক: অ্যাক্টিভেশন ফাংশন দেখুন।

কৃত্রিম বুদ্ধিমত্তা

#মৌলিক বিষয়াবলী

একটি অ-মানব প্রোগ্রাম বা মডেল যা জটিল কাজগুলি সমাধান করতে পারে। উদাহরণস্বরূপ, একটি প্রোগ্রাম বা মডেল যা পাঠ্য অনুবাদ করে বা একটি প্রোগ্রাম বা মডেল যা রেডিওলজিক চিত্র থেকে রোগ সনাক্ত করে, উভয়ই কৃত্রিম বুদ্ধিমত্তা প্রদর্শন করে।

আনুষ্ঠানিকভাবে, মেশিন লার্নিং কৃত্রিম বুদ্ধিমত্তার একটি উপ-ক্ষেত্র। তবে, সাম্প্রতিক বছরগুলিতে, কিছু সংস্থা কৃত্রিম বুদ্ধিমত্তা এবং মেশিন লার্নিং শব্দটিকে বিনিময়যোগ্যভাবে ব্যবহার করা শুরু করেছে।

AUC (ROC বক্ররেখার অধীনে এলাকা)

#মৌলিক বিষয়াবলী

#মেট্রিক

০.০ এবং ১.০ এর মধ্যে একটি সংখ্যা যা একটি বাইনারি শ্রেণীবিভাগ মডেলের ধনাত্মক শ্রেণী থেকে ঋণাত্মক শ্রেণী আলাদা করার ক্ষমতাকে প্রতিনিধিত্ব করে। AUC 1.0 এর যত কাছাকাছি হবে, মডেলের একে অপরের থেকে শ্রেণী আলাদা করার ক্ষমতা তত ভালো হবে।

উদাহরণস্বরূপ, নিম্নলিখিত চিত্রটিতে একটি শ্রেণিবিন্যাস মডেল দেখানো হয়েছে যা ধনাত্মক শ্রেণি (সবুজ ডিম্বাকৃতি) এবং ঋণাত্মক শ্রেণি (বেগুনি আয়তক্ষেত্র) কে নিখুঁতভাবে পৃথক করে। এই অবাস্তবভাবে নিখুঁত মডেলটির AUC 1.0:

একটি সংখ্যারেখা যার একপাশে ৮টি ধনাত্মক উদাহরণ এবং অন্যপাশে ৯টি ঋণাত্মক উদাহরণ।

বিপরীতভাবে, নিম্নলিখিত চিত্রটি একটি শ্রেণিবিন্যাস মডেলের ফলাফল দেখায় যা এলোমেলো ফলাফল তৈরি করে। এই মডেলের AUC 0.5:

৬টি ধনাত্মক উদাহরণ এবং ৬টি ঋণাত্মক উদাহরণ সহ একটি সংখ্যারেখা। উদাহরণগুলির ক্রম হল ধনাত্মক, ঋণাত্মক, ধনাত্মক, ঋণাত্মক, ধনাত্মক, ঋণাত্মক, ধনাত্মক, ঋণাত্মক, ধনাত্মক ঋণাত্মক, ধনাত্মক, ঋণাত্মক।

হ্যাঁ, পূর্ববর্তী মডেলটির AUC 0.0 নয়, 0.5।

বেশিরভাগ মডেল দুটি চরমের মাঝামাঝি কোথাও থাকে। উদাহরণস্বরূপ, নিম্নলিখিত মডেলটি ইতিবাচক এবং নেতিবাচককে কিছুটা আলাদা করে, এবং তাই এর AUC 0.5 এবং 1.0 এর মধ্যে কোথাও রয়েছে:

৬টি ধনাত্মক উদাহরণ এবং ৬টি ঋণাত্মক উদাহরণ সহ একটি সংখ্যারেখা। উদাহরণগুলির ক্রম হল ঋণাত্মক, ঋণাত্মক, ঋণাত্মক, ঋণাত্মক, ধনাত্মক, ঋণাত্মক, ধনাত্মক, ঋণাত্মক, ধনাত্মক, ধনাত্মক, ধনাত্মক, ধনাত্মক।

AUC আপনার দ্বারা শ্রেণীবিভাগের সীমা নির্ধারণ করা যেকোনো মান উপেক্ষা করে। পরিবর্তে, AUC সমস্ত সম্ভাব্য শ্রেণীবিভাগের সীমা বিবেচনা করে।

AUC এবং ROC বক্ররেখার মধ্যে সম্পর্ক সম্পর্কে জানতে আইকনে ক্লিক করুন।

AUC একটি ROC বক্ররেখার আওতাধীন ক্ষেত্রফলকে প্রতিনিধিত্ব করে। উদাহরণস্বরূপ, একটি মডেলের জন্য ROC বক্ররেখা যা ধনাত্মক এবং নেতিবাচককে পুরোপুরি পৃথক করে তা নিম্নরূপ দেখায়:

AUC হলো পূর্ববর্তী চিত্রের ধূসর অঞ্চলের ক্ষেত্রফল। এই অস্বাভাবিক ক্ষেত্রে, ক্ষেত্রফল হলো ধূসর অঞ্চলের দৈর্ঘ্য (1.0) কে ধূসর অঞ্চলের প্রস্থ (1.0) দ্বারা গুণ করলে। সুতরাং, 1.0 এবং 1.0 এর গুণফল ঠিক 1.0 এর AUC দেয়, যা সর্বোচ্চ সম্ভাব্য AUC স্কোর।

বিপরীতভাবে, একটি শ্রেণীবিন্যাস মডেলের জন্য ROC বক্ররেখা যা একেবারেই শ্রেণী আলাদা করতে পারে না তা নিম্নরূপ। এই ধূসর অঞ্চলের ক্ষেত্রফল 0.5।

একটি আরও সাধারণ ROC বক্ররেখা প্রায় নিম্নলিখিতগুলির মতো দেখায়:

এই বক্ররেখার ক্ষেত্রফল ম্যানুয়ালি গণনা করা শ্রমসাধ্য হবে, যে কারণে একটি প্রোগ্রাম সাধারণত বেশিরভাগ AUC মান গণনা করে।

AUC-এর আরও আনুষ্ঠানিক সংজ্ঞার জন্য আইকনে ক্লিক করুন।

AUC হলো সেই সম্ভাবনা যার মাধ্যমে একটি শ্রেণীবিভাগ মডেল এলোমেলোভাবে নির্বাচিত একটি ইতিবাচক উদাহরণ আসলে ইতিবাচক, এলোমেলোভাবে নির্বাচিত একটি নেতিবাচক উদাহরণ ইতিবাচক হওয়ার চেয়ে বেশি আত্মবিশ্বাসী হবে।

আরও তথ্যের জন্য শ্রেণীবিভাগ: মেশিন লার্নিং ক্র্যাশ কোর্সে ROC এবং AUC দেখুন।

খ

পশ্চাদপ্রসারণ

#মৌলিক বিষয়াবলী

নিউরাল নেটওয়ার্কগুলিতে গ্রেডিয়েন্ট ডিসেন্ট বাস্তবায়নকারী অ্যালগরিদম।

একটি নিউরাল নেটওয়ার্ক প্রশিক্ষণের জন্য নিম্নলিখিত দুই-পাস চক্রের অনেকগুলি পুনরাবৃত্তি জড়িত:

ফরোয়ার্ড পাসের সময়, সিস্টেমটি পূর্বাভাস (গুলি) প্রদানের জন্য উদাহরণগুলির একটি ব্যাচ প্রক্রিয়া করে। সিস্টেমটি প্রতিটি পূর্বাভাসকে প্রতিটি লেবেল মানের সাথে তুলনা করে। পূর্বাভাস এবং লেবেল মানের মধ্যে পার্থক্য হল সেই উদাহরণের ক্ষতি । বর্তমান ব্যাচের মোট ক্ষতি গণনা করার জন্য সিস্টেমটি সমস্ত উদাহরণের ক্ষতি একত্রিত করে।
ব্যাকওয়ার্ড পাস (ব্যাকপ্রোপ্যাগেশন) এর সময়, সিস্টেমটি সমস্ত লুকানো স্তর(গুলি) এর সমস্ত নিউরনের ওজন সামঞ্জস্য করে ক্ষতি হ্রাস করে।

নিউরাল নেটওয়ার্কগুলিতে প্রায়শই অনেক লুকানো স্তর জুড়ে অনেক নিউরন থাকে। এই প্রতিটি নিউরন বিভিন্ন উপায়ে সামগ্রিক ক্ষতিতে অবদান রাখে। ব্যাকপ্রোপ্যাগেশন নির্দিষ্ট নিউরনের উপর প্রয়োগ করা ওজন বাড়াতে হবে নাকি কমাতে হবে তা নির্ধারণ করে।

শেখার হার হল একটি গুণক যা প্রতিটি ব্যাকওয়ার্ড পাস প্রতিটি ওজন কত পরিমাণে বৃদ্ধি বা হ্রাস করে তা নিয়ন্ত্রণ করে। একটি বড় শেখার হার একটি ছোট শেখার হারের চেয়ে প্রতিটি ওজন বেশি বৃদ্ধি বা হ্রাস করবে।

ক্যালকুলাসের ভাষায়, ব্যাকপ্রোপ্যাগেশন ক্যালকুলাস থেকে . শৃঙ্খল নিয়ম প্রয়োগ করে। অর্থাৎ, ব্যাকপ্রোপ্যাগেশন প্রতিটি প্যারামিটারের সাপেক্ষে ত্রুটির আংশিক ডেরিভেটিভ গণনা করে।

বহু বছর আগে, ML অনুশীলনকারীদের ব্যাকপ্রোপ্যাগেশন বাস্তবায়নের জন্য কোড লিখতে হত। Keras-এর মতো আধুনিক ML API এখন আপনার জন্য ব্যাকপ্রোপ্যাগেশন বাস্তবায়ন করে। উফ!

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে নিউরাল নেটওয়ার্ক দেখুন।

ব্যাচ

#মৌলিক বিষয়াবলী

একটি প্রশিক্ষণ পুনরাবৃত্তিতে ব্যবহৃত উদাহরণের সেট। ব্যাচের আকার একটি ব্যাচে উদাহরণের সংখ্যা নির্ধারণ করে।

একটি ব্যাচ কীভাবে একটি যুগের সাথে সম্পর্কিত তার ব্যাখ্যার জন্য epoch দেখুন।

আরও তথ্যের জন্য লিনিয়ার রিগ্রেশন: মেশিন লার্নিং ক্র্যাশ কোর্সে হাইপারপ্যারামিটার দেখুন।

ব্যাচের আকার

#মৌলিক বিষয়াবলী

একটি ব্যাচে উদাহরণের সংখ্যা। উদাহরণস্বরূপ, যদি ব্যাচের আকার ১০০ হয়, তাহলে মডেলটি প্রতি পুনরাবৃত্তিতে ১০০টি উদাহরণ প্রক্রিয়া করে।

জনপ্রিয় ব্যাচ সাইজ কৌশলগুলি নিম্নরূপ:

স্টোকাস্টিক গ্রেডিয়েন্ট ডিসেন্ট (SGD) , যেখানে ব্যাচের আকার ১।
পূর্ণ ব্যাচ, যেখানে ব্যাচের আকার হল সমগ্র প্রশিক্ষণ সেটের উদাহরণের সংখ্যা। উদাহরণস্বরূপ, যদি প্রশিক্ষণ সেটে দশ লক্ষ উদাহরণ থাকে, তাহলে ব্যাচের আকার হবে দশ লক্ষ উদাহরণ। পূর্ণ ব্যাচ সাধারণত একটি অদক্ষ কৌশল।
মিনি-ব্যাচ যেখানে ব্যাচের আকার সাধারণত ১০ থেকে ১০০০ এর মধ্যে থাকে। মিনি-ব্যাচ সাধারণত সবচেয়ে কার্যকর কৌশল।

আরও তথ্যের জন্য নিম্নলিখিতটি দেখুন:

পক্ষপাত (নীতিশাস্ত্র/ন্যায্যতা)

#দায়িত্বশীল

#মৌলিক বিষয়াবলী

১. কিছু জিনিস, মানুষ বা গোষ্ঠীর প্রতি অন্যদের তুলনায় স্টেরিওটাইপিং, পক্ষপাত বা পক্ষপাতিত্ব। এই পক্ষপাতিত্বগুলি তথ্য সংগ্রহ এবং ব্যাখ্যা, একটি সিস্টেমের নকশা এবং ব্যবহারকারীরা কীভাবে একটি সিস্টেমের সাথে যোগাযোগ করে তা প্রভাবিত করতে পারে। এই ধরণের পক্ষপাতিত্বের ধরণগুলির মধ্যে রয়েছে:

২. নমুনা বা রিপোর্টিং পদ্ধতির মাধ্যমে প্রবর্তিত পদ্ধতিগত ত্রুটি। এই ধরণের পক্ষপাতের ফর্মগুলির মধ্যে রয়েছে:

মেশিন লার্নিং মডেল বা ভবিষ্যদ্বাণী পক্ষপাতের ক্ষেত্রে পক্ষপাত শব্দটির সাথে বিভ্রান্ত হবেন না।

আরও তথ্যের জন্য "ফেয়ারনেস: মেশিন লার্নিং ক্র্যাশ কোর্সে পক্ষপাতের ধরণ" দেখুন।

পক্ষপাত (গণিত) বা পক্ষপাত শব্দ

#মৌলিক বিষয়াবলী

একটি অরিজিন থেকে একটি ইন্টারসেপ্ট বা অফসেট। মেশিন লার্নিং মডেলগুলিতে বায়াস হল একটি প্যারামিটার, যা নিম্নলিখিত যেকোনো একটি দ্বারা প্রতীকী:

খ
w ₀

উদাহরণস্বরূপ, নিম্নলিখিত সূত্রে পক্ষপাত হল b :

$$y' = b + w_1x_1 + w_2x_2 + … w_nx_n$$

একটি সরল দ্বিমাত্রিক রেখায়, বায়াস বলতে কেবল "y-ইন্টারসেপ্ট" বোঝায়। উদাহরণস্বরূপ, নিম্নলিখিত চিত্রে রেখার বায়াস হল 2।

০.৫ ঢাল এবং ২ এর পক্ষপাত (y-ইন্টারসেপ্ট) সহ একটি রেখার প্লট।

পক্ষপাত বিদ্যমান কারণ সমস্ত মডেল উৎপত্তিস্থল (0,0) থেকে শুরু হয় না। উদাহরণস্বরূপ, ধরুন একটি বিনোদন পার্কে প্রবেশের জন্য 2 ইউরো খরচ হয় এবং একজন গ্রাহক থাকার প্রতি ঘন্টার জন্য অতিরিক্ত 0.5 ইউরো খরচ হয়। অতএব, মোট খরচ ম্যাপিং করা একটি মডেলের পক্ষপাত 2 হয় কারণ সর্বনিম্ন খরচ হল 2 ইউরো।

পক্ষপাতকে নীতিশাস্ত্র এবং ন্যায্যতার ক্ষেত্রে পক্ষপাত বা ভবিষ্যদ্বাণীর পক্ষপাতের সাথে বিভ্রান্ত করা উচিত নয়।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে লিনিয়ার রিগ্রেশন দেখুন।

বাইনারি শ্রেণীবিভাগ

#মৌলিক বিষয়াবলী

এক ধরণের শ্রেণিবিন্যাস কার্য যা দুটি পারস্পরিক একচেটিয়া শ্রেণীর মধ্যে একটির পূর্বাভাস দেয়:

ইতিবাচক শ্রেণী
নেতিবাচক শ্রেণী

উদাহরণস্বরূপ, নিম্নলিখিত দুটি মেশিন লার্নিং মডেল বাইনারি শ্রেণীবিভাগ সম্পাদন করে:

একটি মডেল যা নির্ধারণ করে যে ইমেল বার্তাগুলি স্প্যাম (ধনাত্মক শ্রেণী) নাকি স্প্যাম (নেতিবাচক শ্রেণী) নয় ।
একটি মডেল যা চিকিৎসা লক্ষণগুলি মূল্যায়ন করে নির্ধারণ করে যে একজন ব্যক্তির একটি নির্দিষ্ট রোগ (ধনাত্মক শ্রেণী) আছে নাকি সেই রোগ (নেতিবাচক শ্রেণী) নেই।

বহু-শ্রেণীর শ্রেণীবিভাগের সাথে তুলনা করুন।

লজিস্টিক রিগ্রেশন এবং ক্লাসিফিকেশন থ্রেশহোল্ডও দেখুন।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে শ্রেণীবিভাগ দেখুন।

বাকেটিংয়ের কাজ

#মৌলিক বিষয়াবলী

একটি একক বৈশিষ্ট্যকে একাধিক বাইনারি বৈশিষ্ট্যে রূপান্তর করা যাকে বাকেট বা বিন বলা হয়, সাধারণত একটি মান পরিসরের উপর ভিত্তি করে। কাটা বৈশিষ্ট্যটি সাধারণত একটি ধারাবাহিক বৈশিষ্ট্য ।

উদাহরণস্বরূপ, তাপমাত্রাকে একক অবিচ্ছিন্ন ভাসমান-বিন্দু বৈশিষ্ট্য হিসাবে উপস্থাপন করার পরিবর্তে, আপনি তাপমাত্রার পরিসরগুলিকে বিচ্ছিন্ন বালতিতে কাটাতে পারেন, যেমন:

<= ১০ ডিগ্রি সেলসিয়াস হবে "ঠান্ডা" বালতি।
১১ - ২৪ ডিগ্রি সেলসিয়াস হবে "নাতিশীতোষ্ণ" বালতি।
>= ২৫ ডিগ্রি সেলসিয়াস হবে "উষ্ণ" বালতি।

মডেলটি একই বাকেটের প্রতিটি মানকে একইভাবে বিবেচনা করবে। উদাহরণস্বরূপ, মান 13 এবং 22 উভয়ই নাতিশীতোষ্ণ বাকেটের মধ্যে রয়েছে, তাই মডেলটি দুটি মানকে একইভাবে বিবেচনা করে।

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

যদি আপনি তাপমাত্রাকে একটি অবিচ্ছিন্ন বৈশিষ্ট্য হিসেবে উপস্থাপন করেন, তাহলে মডেলটি তাপমাত্রাকে একটি একক বৈশিষ্ট্য হিসেবে বিবেচনা করে। যদি আপনি তাপমাত্রাকে তিনটি বালতি হিসেবে উপস্থাপন করেন, তাহলে মডেলটি প্রতিটি বালতিকে একটি পৃথক বৈশিষ্ট্য হিসেবে বিবেচনা করে। অর্থাৎ, একটি মডেল প্রতিটি বালতির লেবেলের সাথে পৃথক সম্পর্ক শিখতে পারে। উদাহরণস্বরূপ, একটি রৈখিক রিগ্রেশন মডেল প্রতিটি বালতির জন্য পৃথক ওজন শিখতে পারে।

বালতির সংখ্যা বৃদ্ধি আপনার মডেলকে আরও জটিল করে তোলে কারণ আপনার মডেলকে কতগুলি সম্পর্ক শিখতে হবে তার সংখ্যা বৃদ্ধি করে। উদাহরণস্বরূপ, ঠান্ডা, নাতিশীতোষ্ণ এবং উষ্ণ বালতি মূলত তিনটি পৃথক বৈশিষ্ট্য যা আপনার মডেলকে প্রশিক্ষণের জন্য ব্যবহার করতে হবে। আপনি যদি আরও দুটি বালতি যোগ করার সিদ্ধান্ত নেন - উদাহরণস্বরূপ, হিমায়িত এবং গরম - তাহলে আপনার মডেলকে এখন পাঁচটি পৃথক বৈশিষ্ট্যে প্রশিক্ষণ নিতে হবে।

কতগুলি বালতি তৈরি করতে হবে, অথবা প্রতিটি বালতির পরিসর কত হওয়া উচিত তা আপনি কীভাবে জানবেন? উত্তরগুলির জন্য সাধারণত যথেষ্ট পরিমাণে পরীক্ষা-নিরীক্ষার প্রয়োজন হয়।

আরও তথ্যের জন্য সংখ্যাসূচক তথ্য দেখুন: মেশিন লার্নিং ক্র্যাশ কোর্সে বিনিং ।

গ

শ্রেণীবদ্ধ তথ্য

#মৌলিক বিষয়াবলী

সম্ভাব্য মানগুলির একটি নির্দিষ্ট সেট সহ বৈশিষ্ট্য । উদাহরণস্বরূপ, traffic-light-state নামক একটি শ্রেণীবদ্ধ বৈশিষ্ট্য বিবেচনা করুন, যার নিম্নলিখিত তিনটি সম্ভাব্য মানের মধ্যে কেবল একটি থাকতে পারে:

red
yellow
green

traffic-light-state একটি শ্রেণীবদ্ধ বৈশিষ্ট্য হিসেবে উপস্থাপন করে, একটি মডেল চালকের আচরণের উপর red , green এবং yellow বিভিন্ন প্রভাব শিখতে পারে।

শ্রেণীবদ্ধ বৈশিষ্ট্যগুলিকে কখনও কখনও বিচ্ছিন্ন বৈশিষ্ট্য বলা হয়।

সংখ্যাসূচক তথ্যের সাথে তুলনা করুন।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে শ্রেণীবদ্ধ ডেটা নিয়ে কাজ করা দেখুন।

শ্রেণী

#মৌলিক বিষয়াবলী

একটি লেবেল যে বিভাগের অন্তর্ভুক্ত হতে পারে। উদাহরণস্বরূপ:

একটি বাইনারি শ্রেণীবিভাগ মডেল যা স্প্যাম সনাক্ত করে, দুটি শ্রেণী স্প্যাম হতে পারে এবং স্প্যাম নয় ।
কুকুরের জাত চিহ্নিত করে এমন একটি বহু-শ্রেণীর শ্রেণিবিন্যাস মডেলে, শ্রেণীগুলি হতে পারে পুডল , বিগল , পাগ , ইত্যাদি।

একটি শ্রেণীবিন্যাস মডেল একটি শ্রেণীর ভবিষ্যদ্বাণী করে। বিপরীতে, একটি রিগ্রেশন মডেল একটি শ্রেণীর পরিবর্তে একটি সংখ্যার ভবিষ্যদ্বাণী করে।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে শ্রেণীবিভাগ দেখুন।

শ্রেণীবিভাগ মডেল

#মৌলিক বিষয়াবলী

একটি মডেল যার পূর্বাভাস একটি শ্রেণী । উদাহরণস্বরূপ, নিম্নলিখিত সমস্ত শ্রেণীবিভাগ মডেল রয়েছে:

একটি মডেল যা একটি ইনপুট বাক্যের ভাষা (ফরাসি? স্প্যানিশ? ইতালীয়?) ভবিষ্যদ্বাণী করে।
একটি মডেল যা গাছের প্রজাতির (ম্যাপেল? ওক? বাওবাব?) ভবিষ্যদ্বাণী করে।
একটি মডেল যা একটি নির্দিষ্ট চিকিৎসা অবস্থার জন্য ইতিবাচক বা নেতিবাচক শ্রেণীর পূর্বাভাস দেয়।

বিপরীতে, রিগ্রেশন মডেলগুলি শ্রেণীর পরিবর্তে সংখ্যার পূর্বাভাস দেয়।

দুটি সাধারণ ধরণের শ্রেণিবিন্যাস মডেল হল:

বাইনারি শ্রেণীবিভাগ
বহু-শ্রেণীর শ্রেণীবিভাগ

শ্রেণীবিভাগের সীমা

#মৌলিক বিষয়াবলী

বাইনারি শ্রেণীবিভাগে , 0 এবং 1 এর মধ্যে একটি সংখ্যা যা একটি লজিস্টিক রিগ্রেশন মডেলের কাঁচা আউটপুটকে ধনাত্মক শ্রেণী বা ঋণাত্মক শ্রেণীর পূর্বাভাসে রূপান্তরিত করে। মনে রাখবেন যে শ্রেণীবিভাগের থ্রেশহোল্ড হল একটি মান যা একজন মানুষ বেছে নেয়, মডেল প্রশিক্ষণ দ্বারা নির্বাচিত মান নয়।

একটি লজিস্টিক রিগ্রেশন মডেল 0 এবং 1 এর মধ্যে একটি কাঁচা মান আউটপুট করে। তারপর:

যদি এই কাঁচা মান শ্রেণীবিভাগের সীমার চেয়ে বেশি হয়, তাহলে ধনাত্মক শ্রেণীর পূর্বাভাস দেওয়া হয়।
যদি এই কাঁচা মান শ্রেণীবিভাগের সীমার চেয়ে কম হয়, তাহলে ঋণাত্মক শ্রেণীর পূর্বাভাস দেওয়া হয়।

উদাহরণস্বরূপ, ধরুন শ্রেণীবিভাগের সীমা ০.৮। যদি কাঁচা মান ০.৯ হয়, তাহলে মডেলটি ধনাত্মক শ্রেণীর পূর্বাভাস দেয়। যদি কাঁচা মান ০.৭ হয়, তাহলে মডেলটি ঋণাত্মক শ্রেণীর পূর্বাভাস দেয়।

শ্রেণীবিভাগের সীমার পছন্দ মিথ্যা ধনাত্মক এবং মিথ্যা নেতিবাচক সংখ্যার উপর জোরালোভাবে প্রভাব ফেলে।

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

মডেল বা ডেটাসেটগুলি বিকশিত হওয়ার সাথে সাথে, ইঞ্জিনিয়াররা কখনও কখনও শ্রেণিবিন্যাসের সীমাও পরিবর্তন করেন। যখন শ্রেণিবিন্যাসের সীমা পরিবর্তন হয়, তখন ইতিবাচক শ্রেণির পূর্বাভাস হঠাৎ করে নেতিবাচক শ্রেণিতে পরিণত হতে পারে এবং তদ্বিপরীতও।

উদাহরণস্বরূপ, একটি বাইনারি শ্রেণীবিভাগ রোগের পূর্বাভাস মডেল বিবেচনা করুন। ধরুন যখন সিস্টেমটি প্রথম বছরে চলে:

একটি নির্দিষ্ট রোগীর জন্য কাঁচা মান 0.95।
শ্রেণীবিভাগের সীমা হল ০.৯৪।

অতএব, সিস্টেমটি পজিটিভ ক্লাস নির্ণয় করে। (রোগী হাঁপাতে হাঁপাতে বলে, "ওহ না! আমি অসুস্থ!")

এক বছর পরে, সম্ভবত মানগুলি এখন এইরকম দেখাচ্ছে:

একই রোগীর জন্য কাঁচা মান ০.৯৫ এ রয়ে গেছে।
শ্রেণীবিভাগের সীমা ০.৯৭ এ পরিবর্তিত হয়।

অতএব, সিস্টেমটি এখন সেই রোগীকে নেতিবাচক শ্রেণীতে পুনর্বিবেচনা করে। ("শুভ দিন! আমি অসুস্থ নই।") একই রোগী। ভিন্ন রোগ নির্ণয়।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সের থ্রেশহোল্ড এবং বিভ্রান্তি ম্যাট্রিক্স দেখুন।

শ্রেণীবদ্ধকারী

#মৌলিক বিষয়াবলী

একটি শ্রেণিবিন্যাস মডেলের জন্য একটি সাধারণ শব্দ।

শ্রেণী-ভারসাম্যহীন ডেটাসেট

#মৌলিক বিষয়াবলী

একটি শ্রেণিবিন্যাসের জন্য একটি ডেটাসেট যেখানে প্রতিটি শ্রেণীর মোট লেবেলের সংখ্যা উল্লেখযোগ্যভাবে পৃথক হয়। উদাহরণস্বরূপ, একটি বাইনারি শ্রেণিবিন্যাস ডেটাসেট বিবেচনা করুন যার দুটি লেবেল নিম্নরূপে বিভক্ত:

১,০০০,০০০ নেতিবাচক লেবেল
১০টি ইতিবাচক লেবেল

নেতিবাচক থেকে ধনাত্মক লেবেলের অনুপাত 100,000 থেকে 1, তাই এটি একটি শ্রেণী-ভারসাম্যহীন ডেটাসেট।

বিপরীতে, নিম্নলিখিত ডেটাসেটটি শ্রেণী-ভারসাম্যপূর্ণ কারণ নেতিবাচক লেবেলের সাথে ধনাত্মক লেবেলের অনুপাত তুলনামূলকভাবে 1 এর কাছাকাছি:

৫১৭টি নেতিবাচক লেবেল
৪৮৩টি পজিটিভ লেবেল

মাল্টি-ক্লাস ডেটাসেটগুলিও ক্লাস-ইমব্যালেন্সড হতে পারে। উদাহরণস্বরূপ, নিম্নলিখিত মাল্টি-ক্লাস ক্লাসিফিকেশন ডেটাসেটটিও ক্লাস-ইমব্যালেন্সড কারণ একটি লেবেলে অন্য দুটির তুলনায় অনেক বেশি উদাহরণ রয়েছে:

"সবুজ" শ্রেণীর ১,০০০,০০০ লেবেল
"বেগুনি" শ্রেণীর ২০০টি লেবেল
"কমলা" শ্রেণীর ৩৫০টি লেবেল

ক্লাস-ভারসাম্যহীন ডেটাসেট প্রশিক্ষণ বিশেষ চ্যালেঞ্জ উপস্থাপন করতে পারে। বিস্তারিত জানার জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে ভারসাম্যহীন ডেটাসেট দেখুন।

আরও দেখুন এনট্রপি , সংখ্যাগরিষ্ঠ শ্রেণী এবং সংখ্যালঘু শ্রেণী ।

ক্লিপিং

#মৌলিক বিষয়াবলী

নিম্নলিখিতগুলির যেকোনো একটি বা উভয়টি করে বহিরাগতদের পরিচালনা করার একটি কৌশল:

সর্বোচ্চ থ্রেশহোল্ডের চেয়ে বেশি বৈশিষ্ট্যের মানগুলিকে সেই সর্বোচ্চ থ্রেশহোল্ডে কমিয়ে আনা।
ন্যূনতম থ্রেশহোল্ডের চেয়ে কম বৈশিষ্ট্যের মান বৃদ্ধি করে সেই ন্যূনতম থ্রেশহোল্ড পর্যন্ত।

উদাহরণস্বরূপ, ধরুন যে একটি নির্দিষ্ট বৈশিষ্ট্যের <0.5% মান 40-60 রেঞ্জের বাইরে পড়ে। এই ক্ষেত্রে, আপনি নিম্নলিখিতগুলি করতে পারেন:

৬০ (সর্বোচ্চ থ্রেশহোল্ড) এর উপরে থাকা সকল মানকে ঠিক ৬০ করে ফেলুন।
৪০ (সর্বনিম্ন থ্রেশহোল্ড) এর নিচে থাকা সকল মানকে ঠিক ৪০ করে ফেলুন।

আউটলায়ার মডেলগুলিকে ক্ষতিগ্রস্ত করতে পারে, কখনও কখনও প্রশিক্ষণের সময় ওজন অতিরিক্ত বেড়ে যায়। কিছু আউটলায়ার নির্ভুলতার মতো মেট্রিক্সকেও নাটকীয়ভাবে নষ্ট করতে পারে। ক্ষতি সীমিত করার জন্য ক্লিপিং একটি সাধারণ কৌশল।

গ্রেডিয়েন্ট ক্লিপিং প্রশিক্ষণের সময় গ্রেডিয়েন্ট মানগুলিকে একটি নির্দিষ্ট পরিসরের মধ্যে জোর করে।

আরও তথ্যের জন্য সংখ্যাসূচক তথ্য দেখুন: মেশিন লার্নিং ক্র্যাশ কোর্সে স্বাভাবিকীকরণ ।

বিভ্রান্তি ম্যাট্রিক্স

#মৌলিক বিষয়াবলী

একটি NxN টেবিল যা একটি শ্রেণিবিন্যাস মডেলের দ্বারা করা সঠিক এবং ভুল ভবিষ্যদ্বাণীর সংখ্যার সারসংক্ষেপ করে। উদাহরণস্বরূপ, একটি বাইনারি শ্রেণিবিন্যাস মডেলের জন্য নিম্নলিখিত বিভ্রান্তি ম্যাট্রিক্সটি বিবেচনা করুন:

	টিউমার (পূর্বাভাসিত)	টিউমারবিহীন (পূর্বাভাসিত)
টিউমার (মূল সত্য)	১৮ (টিপি)	১ (এফএন)
টিউমারবিহীন (মূল সত্য)	৬ (এফপি)	৪৫২ (টিএন)

পূর্ববর্তী বিভ্রান্তি ম্যাট্রিক্স নিম্নলিখিতটি দেখায়:

১৯টি ভবিষ্যদ্বাণীর মধ্যে, যেখানে টিউমারের প্রকৃত সত্য ছিল, মডেলটি ১৮টিকে সঠিকভাবে শ্রেণীবদ্ধ করেছে এবং ১টিকে ভুলভাবে শ্রেণীবদ্ধ করেছে।
৪৫৮টি ভবিষ্যদ্বাণীর মধ্যে, যেখানে বাস্তব সত্য টিউমারবিহীন ছিল, মডেলটি ৪৫২টিকে সঠিকভাবে শ্রেণীবদ্ধ করেছে এবং ৬টিকে ভুলভাবে শ্রেণীবদ্ধ করেছে।

বহু-শ্রেণীর শ্রেণীবিভাগ সমস্যার জন্য কনফিউশন ম্যাট্রিক্স আপনাকে ভুলের ধরণ সনাক্ত করতে সাহায্য করতে পারে। উদাহরণস্বরূপ, 3-শ্রেণীর মাল্টি-ক্লাস ক্লাসিফিকেশন মডেলের জন্য নিম্নলিখিত কনফিউশন ম্যাট্রিক্সটি বিবেচনা করুন যা তিনটি ভিন্ন ধরণের আইরিসকে (ভার্জিনিকা, ভার্সিকলার এবং সেটোসা) শ্রেণীবদ্ধ করে। যখন মূল সত্য ছিল ভার্জিনিকা, কনফিউশন ম্যাট্রিক্স দেখায় যে মডেলটি সেটোসার তুলনায় ভার্সিকলারের ভুলভাবে ভবিষ্যদ্বাণী করার সম্ভাবনা অনেক বেশি ছিল:

	সেতোসা (পূর্বাভাসিত)	ভার্সিকলার (পূর্বাভাসিত)	ভার্জিনিকা (পূর্বাভাসিত)
সেতোসা (মূল সত্য)	৮৮	১২	0
ভার্সিকলার (মূল সত্য)	৬	১৪১	৭
ভার্জিনিকা (মূল সত্য)	২	২৭	১০৯

আরেকটি উদাহরণ হিসেবে, একটি বিভ্রান্তি ম্যাট্রিক্স প্রকাশ করতে পারে যে হাতে লেখা সংখ্যা চিনতে প্রশিক্ষিত একটি মডেল ভুল করে 4 এর পরিবর্তে 9 ভবিষ্যদ্বাণী করে, অথবা ভুল করে 7 এর পরিবর্তে 1 ভবিষ্যদ্বাণী করে।

কনফিউশন ম্যাট্রিক্সে নির্ভুলতা এবং প্রত্যাহার সহ বিভিন্ন কর্মক্ষমতা মেট্রিক্স গণনা করার জন্য পর্যাপ্ত তথ্য থাকে।

ধারাবাহিক বৈশিষ্ট্য

#মৌলিক বিষয়াবলী

তাপমাত্রা বা ওজনের মতো সম্ভাব্য মানগুলির অসীম পরিসর সহ একটি ভাসমান-বিন্দু বৈশিষ্ট্য ।

বিচ্ছিন্ন বৈশিষ্ট্যের সাথে তুলনা করুন।

অভিসৃতি

#মৌলিক বিষয়াবলী

প্রতিটি পুনরাবৃত্তির সাথে ক্ষতির মান খুব কম বা একেবারেই পরিবর্তিত হয় না এমন একটি অবস্থায় পৌঁছানো যায়। উদাহরণস্বরূপ, নিম্নলিখিত ক্ষতি বক্ররেখাটি প্রায় 700 পুনরাবৃত্তিতে অভিসৃতি নির্দেশ করে:

কার্টেসিয়ান প্লট। X-অক্ষ হল ক্ষতি। Y-অক্ষ হল প্রশিক্ষণের সংখ্যা পুনরাবৃত্তি। প্রথম কয়েকটি পুনরাবৃত্তির সময় ক্ষতি খুব বেশি, কিন্তু তীব্রভাবে হ্রাস পায়। প্রায় ১০০ পুনরাবৃত্তির পরে, ক্ষতি এখনও কমছে কিন্তু আরও ধীরে ধীরে। প্রায় ৭০০ পুনরাবৃত্তির পরে, ক্ষতি স্থির থাকে।

একটি মডেল তখনই একত্রিত হয় যখন অতিরিক্ত প্রশিক্ষণ মডেলটিকে উন্নত করতে পারে না।

গভীর শিক্ষায় , ক্ষতির মান কখনও কখনও অনেক পুনরাবৃত্তির জন্য স্থির থাকে অথবা প্রায় একই রকম থাকে, অবশেষে অবনতির আগে। দীর্ঘ সময় ধরে ধ্রুবক ক্ষতির মান বজায় রাখার সময়, আপনি সাময়িকভাবে অভিসৃতির একটি ভুল ধারণা পেতে পারেন।

আরও দেখুন "আর্লি স্টপিং "।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে মডেল কনভারজেন্স এবং লস কার্ভ দেখুন।

দ

ডেটাফ্রেম

#মৌলিক বিষয়াবলী

মেমোরিতে ডেটাসেট উপস্থাপনের জন্য একটি জনপ্রিয় পান্ডাস ডেটা টাইপ।

একটি ডেটাফ্রেম একটি টেবিল বা স্প্রেডশিটের অনুরূপ। ডেটাফ্রেমের প্রতিটি কলামের একটি নাম (একটি হেডার) থাকে এবং প্রতিটি সারি একটি অনন্য সংখ্যা দ্বারা চিহ্নিত করা হয়।

একটি ডেটাফ্রেমের প্রতিটি কলাম একটি 2D অ্যারের মতো গঠন করা হয়, তবে প্রতিটি কলামকে তার নিজস্ব ডেটা টাইপ বরাদ্দ করা যেতে পারে।

অফিসিয়াল pandas.DataFrame রেফারেন্স পৃষ্ঠাটিও দেখুন।

ডেটা সেট বা ডেটাসেট

#মৌলিক বিষয়াবলী

কাঁচা তথ্যের একটি সংগ্রহ, যা সাধারণত (কিন্তু একচেটিয়াভাবে নয়) নিম্নলিখিত ফর্ম্যাটগুলির মধ্যে একটিতে সংগঠিত হয়:

একটি স্প্রেডশিট
CSV (কমা দ্বারা পৃথক করা মান) ফর্ম্যাটে একটি ফাইল

গভীর মডেল

#মৌলিক বিষয়াবলী

একাধিক লুকানো স্তর ধারণকারী একটি নিউরাল নেটওয়ার্ক ।

একটি গভীর মডেলকে একটি গভীর নিউরাল নেটওয়ার্কও বলা হয়।

প্রশস্ত মডেলের সাথে তুলনা করুন।

ঘন বৈশিষ্ট্য

#মৌলিক বিষয়াবলী

একটি বৈশিষ্ট্য যেখানে বেশিরভাগ বা সমস্ত মান শূন্য নয়, সাধারণত ভাসমান-পয়েন্ট মানের একটি টেনসর । উদাহরণস্বরূপ, নিম্নলিখিত 10-উপাদান টেনসর ঘন কারণ এর 9টি মান শূন্য নয়:

৮

৩

৭

৫

২

৪

৯

৬

স্পার্স বৈশিষ্ট্যের সাথে তুলনা করুন।

গভীরতা

#মৌলিক বিষয়াবলী

একটি নিউরাল নেটওয়ার্কে নিম্নলিখিতগুলির যোগফল:

লুকানো স্তরের সংখ্যা
আউটপুট স্তরের সংখ্যা, যা সাধারণত ১
যেকোনো এম্বেডিং স্তরের সংখ্যা

উদাহরণস্বরূপ, পাঁচটি লুকানো স্তর এবং একটি আউটপুট স্তর সহ একটি নিউরাল নেটওয়ার্কের গভীরতা 6।

লক্ষ্য করুন যে ইনপুট স্তরটি গভীরতাকে প্রভাবিত করে না।

বিচ্ছিন্ন বৈশিষ্ট্য

#মৌলিক বিষয়াবলী

সম্ভাব্য মানগুলির একটি সীমিত সেট সহ একটি বৈশিষ্ট্য । উদাহরণস্বরূপ, একটি বৈশিষ্ট্য যার মান শুধুমাত্র প্রাণী , উদ্ভিজ্জ , বা খনিজ হতে পারে তা একটি বিচ্ছিন্ন (বা শ্রেণীবদ্ধ) বৈশিষ্ট্য।

ধারাবাহিক বৈশিষ্ট্যের সাথে তুলনা করুন।

গতিশীল

#মৌলিক বিষয়াবলী

ঘন ঘন বা একটানা কিছু করা। মেশিন লার্নিং-এ ডাইনামিক এবং অনলাইন শব্দ দুটি সমার্থক শব্দ। মেশিন লার্নিং-এ ডাইনামিক এবং অনলাইনের সাধারণ ব্যবহারগুলি নিম্নরূপ:

একটি গতিশীল মডেল (বা অনলাইন মডেল ) হল এমন একটি মডেল যা ঘন ঘন বা ক্রমাগত পুনরায় প্রশিক্ষণ দেওয়া হয়।
গতিশীল প্রশিক্ষণ (বা অনলাইন প্রশিক্ষণ ) হল ঘন ঘন বা ধারাবাহিকভাবে প্রশিক্ষণের প্রক্রিয়া।
গতিশীল অনুমান (বা অনলাইন অনুমান ) হল চাহিদা অনুযায়ী ভবিষ্যদ্বাণী তৈরির প্রক্রিয়া।

গতিশীল মডেল

#মৌলিক বিষয়াবলী

এমন একটি মডেল যা ঘন ঘন (হয়তো ধারাবাহিকভাবে) পুনঃপ্রশিক্ষিত হয়। একটি গতিশীল মডেল হল একটি "জীবনব্যাপী শিক্ষার্থী" যা ক্রমাগত বিবর্তিত তথ্যের সাথে খাপ খাইয়ে নেয়। একটি গতিশীল মডেলকে অনলাইন মডেলও বলা হয়।

স্ট্যাটিক মডেলের সাথে তুলনা করুন।

ই

তাড়াতাড়ি থামানো

#মৌলিক বিষয়াবলী

নিয়মিতকরণের একটি পদ্ধতি যার মধ্যে প্রশিক্ষণের ক্ষতি হ্রাস পাওয়ার আগেই প্রশিক্ষণ শেষ করা জড়িত। প্রাথমিকভাবে থামার ক্ষেত্রে, যখন একটি বৈধতা ডেটাসেটের ক্ষতি বাড়তে শুরু করে তখন আপনি ইচ্ছাকৃতভাবে মডেলটিকে প্রশিক্ষণ বন্ধ করে দেন; অর্থাৎ, যখন সাধারণীকরণের কর্মক্ষমতা খারাপ হয়।

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

তাড়াতাড়ি থামানোটা হয়তো স্বজ্ঞাততার বিপরীত মনে হতে পারে। সর্বোপরি, ক্ষতি কমতে থাকা অবস্থায় একজন মডেলকে প্রশিক্ষণ বন্ধ করতে বলাটা হয়তো একজন শেফকে ডেজার্ট সম্পূর্ণরূপে বেক হওয়ার আগেই রান্না বন্ধ করতে বলার মতো মনে হতে পারে। তবে, একটি মডেলকে খুব বেশি সময় ধরে প্রশিক্ষণ দেওয়ার ফলে অতিরিক্ত ফিটিং হতে পারে। অর্থাৎ, যদি আপনি একটি মডেলকে খুব বেশি সময় ধরে প্রশিক্ষণ দেন, তাহলে মডেলটি প্রশিক্ষণের তথ্যের সাথে এত ঘনিষ্ঠভাবে মানিয়ে নিতে পারে যে মডেলটি নতুন উদাহরণগুলিতে ভাল ভবিষ্যদ্বাণী করতে পারে না।

তাড়াতাড়ি প্রস্থানের সাথে তুলনা করুন।

এম্বেডিং স্তর

#মৌলিক বিষয়াবলী

একটি বিশেষ লুকানো স্তর যা একটি উচ্চ-মাত্রিক শ্রেণীগত বৈশিষ্ট্যের উপর প্রশিক্ষণ দেয় যাতে ধীরে ধীরে একটি নিম্ন মাত্রার এম্বেডিং ভেক্টর শিখতে পারে। একটি এম্বেডিং স্তর একটি নিউরাল নেটওয়ার্ককে কেবল উচ্চ-মাত্রিক শ্রেণীগত বৈশিষ্ট্যের উপর প্রশিক্ষণের চেয়ে অনেক বেশি দক্ষতার সাথে প্রশিক্ষণ দিতে সক্ষম করে।

উদাহরণস্বরূপ, পৃথিবী বর্তমানে প্রায় ৭৩,০০০ গাছের প্রজাতি সমর্থন করে। ধরুন গাছের প্রজাতি আপনার মডেলের একটি বৈশিষ্ট্য , তাই আপনার মডেলের ইনপুট স্তরে ৭৩,০০০ উপাদান দীর্ঘ একটি এক-গরম ভেক্টর রয়েছে। উদাহরণস্বরূপ, সম্ভবত baobab এভাবে উপস্থাপন করা হবে:

৭৩,০০০ উপাদানের একটি অ্যারে। প্রথম ৬,২৩২টি উপাদানের মান ০। পরবর্তী উপাদানের মান ১। শেষ ৬৬,৭৬৭টি উপাদানের মান শূন্য।

৭৩,০০০-এলিমেন্টের একটি অ্যারে অনেক লম্বা। যদি আপনি মডেলটিতে একটি এম্বেডিং লেয়ার যোগ না করেন, তাহলে ৭২,৯৯৯ শূন্য গুণ করার কারণে প্রশিক্ষণ অনেক সময়সাপেক্ষ হবে। সম্ভবত আপনি ১২টি মাত্রার এম্বেডিং লেয়ারটি বেছে নেবেন। ফলস্বরূপ, এম্বেডিং লেয়ারটি ধীরে ধীরে প্রতিটি গাছের প্রজাতির জন্য একটি নতুন এম্বেডিং ভেক্টর শিখবে।

কিছু পরিস্থিতিতে, হ্যাশিং একটি এমবেডিং লেয়ারের একটি যুক্তিসঙ্গত বিকল্প।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে এম্বেডিং দেখুন।

যুগ

#মৌলিক বিষয়াবলী

সম্পূর্ণ প্রশিক্ষণ সেটের উপর একটি সম্পূর্ণ প্রশিক্ষণ পাস যাতে প্রতিটি উদাহরণ একবার প্রক্রিয়া করা হয়ে থাকে।

একটি epoch হল N / ব্যাচ সাইজ প্রশিক্ষণ পুনরাবৃত্তি , যেখানে N হল মোট উদাহরণের সংখ্যা।

উদাহরণস্বরূপ, ধরুন নিম্নলিখিতগুলি:

ডেটাসেটে ১,০০০টি উদাহরণ রয়েছে।
ব্যাচের আকার ৫০টি উদাহরণ।

অতএব, একটি একক যুগের জন্য ২০টি পুনরাবৃত্তি প্রয়োজন:

1 epoch = (N/batch size) = (1,000 / 50) = 20 iterations

উদাহরণ

#মৌলিক বিষয়াবলী

এক সারির বৈশিষ্ট্য এবং সম্ভবত একটি লেবেলের মান। তত্ত্বাবধানে থাকা শিক্ষার উদাহরণ দুটি সাধারণ বিভাগে পড়ে:

একটি লেবেলযুক্ত উদাহরণে এক বা একাধিক বৈশিষ্ট্য এবং একটি লেবেল থাকে। প্রশিক্ষণের সময় লেবেলযুক্ত উদাহরণ ব্যবহার করা হয়।
একটি লেবেলবিহীন উদাহরণে এক বা একাধিক বৈশিষ্ট্য থাকে কিন্তু কোনও লেবেল থাকে না। লেবেলবিহীন উদাহরণগুলি অনুমানের সময় ব্যবহার করা হয়।

উদাহরণস্বরূপ, ধরুন আপনি শিক্ষার্থীদের পরীক্ষার ফলাফলের উপর আবহাওয়ার প্রভাব নির্ধারণের জন্য একটি মডেলকে প্রশিক্ষণ দিচ্ছেন। এখানে তিনটি লেবেলযুক্ত উদাহরণ দেওয়া হল:

ফিচার			লেবেল
তাপমাত্রা	আর্দ্রতা	চাপ	পরীক্ষার স্কোর
১৫	৪৭	৯৯৮	ভালো
১৯	৩৪	১০২০	চমৎকার
১৮	৯২	১০১২	দরিদ্র

এখানে তিনটি লেবেলবিহীন উদাহরণ দেওয়া হল:

তাপমাত্রা	আর্দ্রতা	চাপ
১২	৬২	১০১৪
২১	৪৭	১০১৭
১৯	৪১	১০২১

একটি ডেটাসেটের সারি সাধারণত একটি উদাহরণের জন্য কাঁচা উৎস। অর্থাৎ, একটি উদাহরণে সাধারণত ডেটাসেটের কলামগুলির একটি উপসেট থাকে। তদুপরি, একটি উদাহরণের বৈশিষ্ট্যগুলিতে সিন্থেটিক বৈশিষ্ট্যও অন্তর্ভুক্ত থাকতে পারে, যেমন বৈশিষ্ট্য ক্রস ।

আরও তথ্যের জন্য মেশিন লার্নিং কোর্সের ভূমিকায় তত্ত্বাবধানে থাকা শিক্ষা দেখুন।

চ

মিথ্যা নেতিবাচক (FN)

#মৌলিক বিষয়াবলী

#মেট্রিক

একটি উদাহরণ যেখানে মডেলটি ভুল করে নেতিবাচক শ্রেণীর ভবিষ্যদ্বাণী করে। উদাহরণস্বরূপ, মডেলটি ভবিষ্যদ্বাণী করে যে একটি নির্দিষ্ট ইমেল বার্তা স্প্যাম নয় (নেতিবাচক শ্রেণী), কিন্তু সেই ইমেল বার্তাটি আসলে স্প্যাম ।

মিথ্যা পজিটিভ (FP)

#মৌলিক বিষয়াবলী

#মেট্রিক

একটি উদাহরণ যেখানে মডেলটি ভুল করে ইতিবাচক শ্রেণীর ভবিষ্যদ্বাণী করে। উদাহরণস্বরূপ, মডেলটি ভবিষ্যদ্বাণী করে যে একটি নির্দিষ্ট ইমেল বার্তা স্প্যাম (ধনাত্মক শ্রেণী), কিন্তু সেই ইমেল বার্তাটি আসলে স্প্যাম নয় ।

মিথ্যা পজিটিভ রেট (FPR)

#মৌলিক বিষয়াবলী

#মেট্রিক

মডেলটি ভুল করে ধনাত্মক শ্রেণীর পূর্বাভাস দিয়েছে এমন প্রকৃত নেতিবাচক উদাহরণের অনুপাত। নিম্নলিখিত সূত্রটি মিথ্যা ধনাত্মক হার গণনা করে:

$$\text{false positive rate} = \frac{\text{false positives}}{\text{false positives} + \text{true negatives}}$$

মিথ্যা ধনাত্মক হার হল একটি ROC বক্ররেখার x-অক্ষ।

আরও তথ্যের জন্য শ্রেণীবিভাগ: মেশিন লার্নিং ক্র্যাশ কোর্সে ROC এবং AUC দেখুন।

বৈশিষ্ট্য

#মৌলিক বিষয়াবলী

একটি মেশিন লার্নিং মডেলের একটি ইনপুট ভেরিয়েবল। একটি উদাহরণে এক বা একাধিক বৈশিষ্ট্য রয়েছে। উদাহরণস্বরূপ, ধরুন আপনি শিক্ষার্থীদের পরীক্ষার স্কোরের উপর আবহাওয়ার প্রভাব নির্ধারণের জন্য একটি মডেলকে প্রশিক্ষণ দিচ্ছেন। নিম্নলিখিত টেবিলে তিনটি উদাহরণ দেখানো হয়েছে, যার প্রতিটিতে তিনটি বৈশিষ্ট্য এবং একটি লেবেল রয়েছে:

ফিচার			লেবেল
তাপমাত্রা	আর্দ্রতা	চাপ	পরীক্ষার স্কোর
১৫	৪৭	৯৯৮	৯২
১৯	৩৪	১০২০	৮৪
১৮	৯২	১০১২	৮৭

লেবেলের সাথে তুলনা করুন।

বৈশিষ্ট্য ক্রস

#মৌলিক বিষয়াবলী

"ক্রসিং" শ্রেণীবদ্ধ বা বাকেটেড বৈশিষ্ট্য দ্বারা গঠিত একটি কৃত্রিম বৈশিষ্ট্য ।

উদাহরণস্বরূপ, একটি "মেজাজের পূর্বাভাস" মডেল বিবেচনা করুন যা নিম্নলিখিত চারটি বালতির একটিতে তাপমাত্রা উপস্থাপন করে:

freezing
chilly
temperate
warm

এবং নিম্নলিখিত তিনটি বালতির একটিতে বাতাসের গতি প্রতিনিধিত্ব করে:

still
light
windy

বৈশিষ্ট্য ক্রস ছাড়াই, রৈখিক মডেলটি পূর্ববর্তী সাতটি বিভিন্ন বালতির প্রতিটিতে স্বাধীনভাবে প্রশিক্ষণ দেয়। সুতরাং, মডেলটি, উদাহরণস্বরূপ, windy উপর প্রশিক্ষণ নির্বিশেষে, freezing প্রশিক্ষণ দেয়।

বিকল্পভাবে, আপনি তাপমাত্রা এবং বাতাসের গতির একটি বৈশিষ্ট্য ক্রস তৈরি করতে পারেন। এই সিন্থেটিক বৈশিষ্ট্যটির নিম্নলিখিত 12টি সম্ভাব্য মান থাকবে:

freezing-still
freezing-light
freezing-windy
chilly-still
chilly-light
chilly-windy
temperate-still
temperate-light
temperate-windy
warm-still
warm-light
warm-windy

ফিচার ক্রসের সাহায্যে, মডেলটি freezing-windy দিন এবং freezing-still দিনের মধ্যে মেজাজের পার্থক্য শিখতে পারে।

যদি আপনি দুটি বৈশিষ্ট্য থেকে একটি সিন্থেটিক বৈশিষ্ট্য তৈরি করেন যার প্রতিটিতে অনেকগুলি ভিন্ন বালতি থাকে, তাহলে ফলস্বরূপ বৈশিষ্ট্য ক্রসে বিপুল সংখ্যক সম্ভাব্য সংমিশ্রণ থাকবে। উদাহরণস্বরূপ, যদি একটি বৈশিষ্ট্যে 1,000 বালতি থাকে এবং অন্য বৈশিষ্ট্যে 2,000 বালতি থাকে, তাহলে ফলস্বরূপ বৈশিষ্ট্য ক্রসে 2,000,000 বালতি থাকে।

আনুষ্ঠানিকভাবে, ক্রস একটি কার্টেসিয়ান পণ্য ।

ফিচার ক্রসগুলি বেশিরভাগ ক্ষেত্রে লিনিয়ার মডেলের সাথে ব্যবহৃত হয় এবং নিউরাল নেটওয়ার্কের সাথে খুব কমই ব্যবহৃত হয়।

আরও তথ্যের জন্য শ্রেণীবদ্ধ তথ্য দেখুন: মেশিন লার্নিং ক্র্যাশ কোর্সে বৈশিষ্ট্য ক্রস ।

বৈশিষ্ট্য প্রকৌশল

#মৌলিক বিষয়াবলী

#টেনসরফ্লো

একটি প্রক্রিয়া যাতে নিম্নলিখিত ধাপগুলি অন্তর্ভুক্ত থাকে:

কোন মডেলকে প্রশিক্ষণ দেওয়ার ক্ষেত্রে কোন বৈশিষ্ট্যগুলি কার্যকর হতে পারে তা নির্ধারণ করা।
ডেটাসেট থেকে কাঁচা ডেটা সেই বৈশিষ্ট্যগুলির দক্ষ সংস্করণে রূপান্তর করা।

উদাহরণস্বরূপ, আপনি নির্ধারণ করতে পারেন যে temperature একটি কার্যকর বৈশিষ্ট্য হতে পারে। তারপর, আপনি বিভিন্ন temperature পরিসর থেকে মডেলটি কী শিখতে পারে তা অপ্টিমাইজ করার জন্য বাকেটিং নিয়ে পরীক্ষা করতে পারেন।

ফিচার ইঞ্জিনিয়ারিংকে কখনও কখনও ফিচার এক্সট্রাকশন বা ফিচারাইজেশন বলা হয়।

TensorFlow সম্পর্কে অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

TensorFlow-এ, ফিচার ইঞ্জিনিয়ারিং বলতে প্রায়শই raw লগ ফাইল এন্ট্রিগুলিকে tf.Example প্রোটোকল বাফারে রূপান্তর করা বোঝায়। আরও দেখুন tf.Transform ।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে সংখ্যাসূচক তথ্য দেখুন: কীভাবে একটি মডেল ফিচার ভেক্টর ব্যবহার করে ডেটা গ্রহণ করে ।

বৈশিষ্ট্য সেট

#মৌলিক বিষয়াবলী

আপনার মেশিন লার্নিং মডেল যে বৈশিষ্ট্যগুলির উপর প্রশিক্ষণ নেয় তার একটি গ্রুপ। উদাহরণস্বরূপ, আবাসনের দামের পূর্বাভাস দেয় এমন একটি মডেলের জন্য একটি সাধারণ বৈশিষ্ট্য সেটে পোস্টাল কোড, সম্পত্তির আকার এবং সম্পত্তির অবস্থা থাকতে পারে।

বৈশিষ্ট্য ভেক্টর

#মৌলিক বিষয়াবলী

বৈশিষ্ট্য মানের অ্যারে একটি উদাহরণ নিয়ে গঠিত। প্রশিক্ষণের সময় এবং অনুমানের সময় বৈশিষ্ট্য ভেক্টর ইনপুট করা হয়। উদাহরণস্বরূপ, দুটি পৃথক বৈশিষ্ট্য সহ একটি মডেলের জন্য বৈশিষ্ট্য ভেক্টর হতে পারে:

[0.92, 0.56]

চারটি স্তর: একটি ইনপুট স্তর, দুটি লুকানো স্তর এবং একটি আউটপুট স্তর। ইনপুট স্তরে দুটি নোড রয়েছে, একটিতে মান 0.92 এবং অন্যটিতে মান 0.56 রয়েছে।

প্রতিটি উদাহরণ ফিচার ভেক্টরের জন্য আলাদা আলাদা মান সরবরাহ করে, তাই পরবর্তী উদাহরণের জন্য ফিচার ভেক্টরটি এরকম কিছু হতে পারে:

[0.73, 0.49]

ফিচার ইঞ্জিনিয়ারিং নির্ধারণ করে কিভাবে ফিচার ভেক্টরে ফিচার উপস্থাপন করতে হয়। উদাহরণস্বরূপ, পাঁচটি সম্ভাব্য মান সহ একটি বাইনারি ক্যাটাগরিকাল ফিচারকে ওয়ান-হট এনকোডিং দিয়ে উপস্থাপন করা যেতে পারে। এই ক্ষেত্রে, একটি নির্দিষ্ট উদাহরণের জন্য ফিচার ভেক্টরের অংশে চারটি শূন্য এবং তৃতীয় অবস্থানে একটি একক 1.0 থাকবে, যা নিম্নরূপ:

[0.0, 0.0, 1.0, 0.0, 0.0]

আরেকটি উদাহরণ হিসেবে, ধরুন আপনার মডেলটিতে তিনটি বৈশিষ্ট্য রয়েছে:

একটি বাইনারি ক্যাটাগরিকাল বৈশিষ্ট্য যার পাঁচটি সম্ভাব্য মান এক-গরম এনকোডিং দ্বারা প্রতিনিধিত্ব করা হয়; উদাহরণস্বরূপ: [0.0, 1.0, 0.0, 0.0, 0.0]
আরেকটি বাইনারি ক্যাটাগরিকাল বৈশিষ্ট্য যার তিনটি সম্ভাব্য মান এক-গরম এনকোডিং দ্বারা প্রতিনিধিত্ব করা হয়েছে; উদাহরণস্বরূপ: [0.0, 0.0, 1.0]
একটি ভাসমান-বিন্দু বৈশিষ্ট্য; উদাহরণস্বরূপ: 8.3 ।

এই ক্ষেত্রে, প্রতিটি উদাহরণের জন্য বৈশিষ্ট্য ভেক্টর নয়টি মান দ্বারা প্রতিনিধিত্ব করা হবে। পূর্ববর্তী তালিকার উদাহরণ মানগুলি বিবেচনা করে, বৈশিষ্ট্য ভেক্টরটি হবে:

0.0
1.0
0.0
0.0
0.0
0.0
0.0
1.0
8.3

প্রতিক্রিয়া লুপ

#মৌলিক বিষয়াবলী

মেশিন লার্নিং-এ, এমন একটি পরিস্থিতি যেখানে একটি মডেলের ভবিষ্যদ্বাণী একই মডেল বা অন্য মডেলের প্রশিক্ষণের ডেটাকে প্রভাবিত করে। উদাহরণস্বরূপ, একটি মডেল যা সিনেমা সুপারিশ করে তা লোকেরা যে সিনেমাগুলি দেখে তা প্রভাবিত করবে, যা পরবর্তী সিনেমা সুপারিশ মডেলগুলিকে প্রভাবিত করবে।

আরও তথ্যের জন্য "প্রোডাকশন এমএল সিস্টেম: মেশিন লার্নিং ক্র্যাশ কোর্সে জিজ্ঞাসা করার জন্য প্রশ্ন" দেখুন।

গ

সাধারণীকরণ

#মৌলিক বিষয়াবলী

নতুন, পূর্বে অদেখা তথ্যের উপর সঠিক ভবিষ্যদ্বাণী করার ক্ষমতা একটি মডেলের । একটি মডেল যা সাধারণীকরণ করতে পারে তা একটি অতিরিক্ত ফিটিং মডেলের বিপরীত।

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

আপনি প্রশিক্ষণ সেটের উদাহরণগুলির উপর একটি মডেলকে প্রশিক্ষণ দেন। ফলস্বরূপ, মডেলটি প্রশিক্ষণ সেটের তথ্যের বিশেষত্বগুলি শিখে। সাধারণীকরণ মূলত জিজ্ঞাসা করে যে আপনার মডেল প্রশিক্ষণ সেটে নেই এমন উদাহরণগুলির উপর ভাল ভবিষ্যদ্বাণী করতে পারে কিনা।

সাধারণীকরণকে উৎসাহিত করার জন্য, নিয়মিতকরণ একটি মডেলকে প্রশিক্ষণ সেটের ডেটার বিশেষত্বের সাথে কম প্রশিক্ষণ দিতে সাহায্য করে।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে সাধারণীকরণ দেখুন।

সাধারণীকরণ বক্ররেখা

#মৌলিক বিষয়াবলী

পুনরাবৃত্তির সংখ্যার উপর ভিত্তি করে প্রশিক্ষণ ক্ষতি এবং বৈধতা ক্ষতি উভয়ের একটি প্লট।

একটি সাধারণীকরণ বক্ররেখা আপনাকে সম্ভাব্য ওভারফিটিং সনাক্ত করতে সাহায্য করতে পারে। উদাহরণস্বরূপ, নিম্নলিখিত সাধারণীকরণ বক্ররেখা ওভারফিটিং নির্দেশ করে কারণ বৈধতা ক্ষতি শেষ পর্যন্ত প্রশিক্ষণ ক্ষতির চেয়ে উল্লেখযোগ্যভাবে বেশি হয়ে যায়।

একটি কার্টেসিয়ান গ্রাফ যেখানে y-অক্ষকে ক্ষতি এবং x-অক্ষকে পুনরাবৃত্তি লেবেল করা হয়েছে। দুটি প্লট প্রদর্শিত হয়। একটি প্লট প্রশিক্ষণ ক্ষতি দেখায় এবং অন্যটি বৈধতা ক্ষতি দেখায়। দুটি প্লট একইভাবে শুরু হয়, কিন্তু প্রশিক্ষণ ক্ষতি প্রশিক্ষণ ক্ষতির চেয়ে অনেক কম হ্রাস পায়।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে সাধারণীকরণ দেখুন।

গ্রেডিয়েন্ট ডিসেন্ট

#মৌলিক বিষয়াবলী

ক্ষতি কমানোর জন্য একটি গাণিতিক কৌশল। গ্রেডিয়েন্ট ডিসেন্ট পুনরাবৃত্তিমূলকভাবে ওজন এবং পক্ষপাত সামঞ্জস্য করে, ধীরে ধীরে ক্ষতি কমানোর জন্য সর্বোত্তম সমন্বয় খুঁজে বের করে।

গ্রেডিয়েন্ট ডিসেন্ট মেশিন লার্নিংয়ের চেয়ে অনেক পুরনো—অনেক বেশি পুরনো।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে লিনিয়ার রিগ্রেশন: গ্রেডিয়েন্ট ডিসেন্ট দেখুন।

বাস্তব সত্য

#মৌলিক বিষয়াবলী

বাস্তবতা।

আসলে যে ঘটনাটি ঘটেছিল।

উদাহরণস্বরূপ, একটি বাইনারি শ্রেণিবিন্যাস মডেল বিবেচনা করুন যা ভবিষ্যদ্বাণী করে যে কোনও শিক্ষার্থী তার বিশ্ববিদ্যালয়ের প্রথম বর্ষের মধ্যে ছয় বছরের মধ্যে স্নাতক হবে কিনা। এই মডেলের মূল সত্য হল সেই শিক্ষার্থী আসলে ছয় বছরের মধ্যে স্নাতক হয়েছে কিনা।

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

আমরা মডেলের গুণমানকে বাস্তব সত্যের সাথে তুলনা করি। তবে, বাস্তব সত্য সবসময় সম্পূর্ণ, সত্যবাদী হয় না। উদাহরণস্বরূপ, বাস্তব সত্যের সম্ভাব্য ত্রুটিগুলির নিম্নলিখিত উদাহরণগুলি বিবেচনা করুন:

স্নাতকের উদাহরণে, আমরা কি নিশ্চিত যে প্রতিটি শিক্ষার্থীর স্নাতক রেকর্ড সর্বদা সঠিক? বিশ্ববিদ্যালয়ের রেকর্ড-রক্ষণাবেক্ষণ কি ত্রুটিহীন?
ধরুন লেবেলটি একটি ভাসমান-বিন্দু মান যা যন্ত্র দ্বারা পরিমাপ করা হয় (উদাহরণস্বরূপ, ব্যারোমিটার)। আমরা কীভাবে নিশ্চিত হতে পারি যে প্রতিটি যন্ত্র একইভাবে ক্যালিব্রেট করা হয়েছে অথবা প্রতিটি রিডিং একই পরিস্থিতিতে নেওয়া হয়েছে?
যদি লেবেলটি মানুষের মতামতের বিষয় হয়, তাহলে আমরা কীভাবে নিশ্চিত হতে পারি যে প্রতিটি মানব রেটার একইভাবে ঘটনাগুলি মূল্যায়ন করছেন? ধারাবাহিকতা উন্নত করার জন্য, বিশেষজ্ঞ মানব রেটাররা কখনও কখনও হস্তক্ষেপ করেন।

জ

লুকানো স্তর

#মৌলিক বিষয়াবলী

ইনপুট স্তর (বৈশিষ্ট্য) এবং আউটপুট স্তর (ভবিষ্যদ্বাণী) এর মধ্যে একটি নিউরাল নেটওয়ার্কের স্তর। প্রতিটি লুকানো স্তরে এক বা একাধিক নিউরন থাকে। উদাহরণস্বরূপ, নিম্নলিখিত নিউরাল নেটওয়ার্কে দুটি লুকানো স্তর রয়েছে, প্রথমটিতে তিনটি নিউরন এবং দ্বিতীয়টিতে দুটি নিউরন রয়েছে:

একটি গভীর নিউরাল নেটওয়ার্কে একাধিক লুকানো স্তর থাকে। উদাহরণস্বরূপ, পূর্ববর্তী চিত্রটি একটি গভীর নিউরাল নেটওয়ার্কের কারণ মডেলটিতে দুটি লুকানো স্তর রয়েছে।

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে নিউরাল নেটওয়ার্ক: নোড এবং লুকানো স্তরগুলি দেখুন।

হাইপারপ্যারামিটার

#মৌলিক বিষয়াবলী

আপনি অথবা একটি হাইপারপ্যারামিটার টিউনিং পরিষেবা যে ভেরিয়েবলগুলি ব্যবহার করেনএকটি মডেলের ধারাবাহিক প্রশিক্ষণের সময় সামঞ্জস্য করুন। উদাহরণস্বরূপ, শেখার হার একটি হাইপারপ্যারামিটার। আপনি একটি প্রশিক্ষণ সেশনের আগে শেখার হার 0.01 এ সেট করতে পারেন। যদি আপনি নির্ধারণ করেন যে 0.01 খুব বেশি, তাহলে আপনি সম্ভবত পরবর্তী প্রশিক্ষণ সেশনের জন্য শেখার হার 0.003 এ সেট করতে পারেন।

বিপরীতে, প্যারামিটার হল প্রশিক্ষণের সময় মডেলটি যে বিভিন্ন ওজন এবং পক্ষপাত শেখে ।

আমি

স্বাধীনভাবে এবং অভিন্নভাবে বিতরণ করা (iid)

#মৌলিক বিষয়াবলী

এমন একটি ডিস্ট্রিবিউশন থেকে নেওয়া ডেটা যা পরিবর্তিত হয় না এবং যেখানে প্রতিটি মান পূর্বে নেওয়া মানের উপর নির্ভর করে না। একটি iid হল মেশিন লার্নিংয়ের আদর্শ গ্যাস —একটি কার্যকর গাণিতিক গঠন কিন্তু বাস্তব জগতে প্রায় কখনওই সঠিকভাবে পাওয়া যায় না। উদাহরণস্বরূপ, একটি ওয়েব পৃষ্ঠায় দর্শকদের বন্টন একটি সংক্ষিপ্ত সময়ের মধ্যে iid হতে পারে; অর্থাৎ, সেই সংক্ষিপ্ত সময়ের মধ্যে বিতরণটি পরিবর্তিত হয় না এবং একজনের ভিজিট সাধারণত অন্যজনের ভিজিটের উপর নির্ভর করে না। তবে, আপনি যদি সেই সময়ের উইন্ডোটি প্রসারিত করেন, তাহলে ওয়েব পৃষ্ঠার ভিজিটরের মধ্যে ঋতুগত পার্থক্য দেখা দিতে পারে।

অনুমান

#মৌলিক বিষয়াবলী

#জেনারেটিভএআই

In traditional machine learning, the process of making predictions by applying a trained model to unlabeled examples . See Supervised Learning in the Intro to ML course to learn more.

In large language models , inference is the process of using a trained model to generate a response to an input prompt .

Inference has a somewhat different meaning in statistics. See the Wikipedia article on statistical inference for details.

input layer

#মৌলিক বিষয়াবলী

The layer of a neural network that holds the feature vector . That is, the input layer provides examples for training or inference . For example, the input layer in the following neural network consists of two features:

Four layers: an input layer, two hidden layers, and an output layer.

ব্যাখ্যাযোগ্যতা

#মৌলিক বিষয়াবলী

The ability to explain or to present an ML model's reasoning in understandable terms to a human.

Most linear regression models, for example, are highly interpretable. (You merely need to look at the trained weights for each feature.) Decision forests are also highly interpretable. Some models, however, require sophisticated visualization to become interpretable.

You can use the Learning Interpretability Tool (LIT) to interpret ML models.

পুনরাবৃত্তি

#মৌলিক বিষয়াবলী

A single update of a model's parameters—the model's weights and biases —during training . The batch size determines how many examples the model processes in a single iteration. For instance, if the batch size is 20, then the model processes 20 examples before adjusting the parameters.

When training a neural network , a single iteration involves the following two passes:

A forward pass to evaluate loss on a single batch.
A backward pass ( backpropagation ) to adjust the model's parameters based on the loss and the learning rate.

See Gradient descent in Machine Learning Crash Course for more information.

ল

L ₀ regularization

#মৌলিক বিষয়াবলী

A type of regularization that penalizes the total number of nonzero weights in a model. For example, a model having 11 nonzero weights would be penalized more than a similar model having 10 nonzero weights.

L ₀ regularization is sometimes called L0-norm regularization .

Click the icon for additional notes.

L ₀ regularization is generally impractical in large models because L ₀ regularization turns training into a convex optimization problem.

L ₁ loss

#মৌলিক বিষয়াবলী

#মেট্রিক

A loss function that calculates the absolute value of the difference between actual label values and the values that a model predicts. For example, here's the calculation of L ₁ loss for a batch of five examples :

Actual value of example	Model's predicted value	Absolute value of delta
৭	৬	১
৫	৪	১
৮	১১	৩
৪	৬	2
9	৮	১
		8 = L ₁ loss

L ₁ loss is less sensitive to outliers than L ₂ loss .

The Mean Absolute Error is the average L ₁ loss per example.

Click the icon to see the formal math.

$$ L_1 loss = \sum_{i=0}^n | y_i - \hat{y}_i |$$

কোথায়:

$n$ is the number of examples.
$y$ is the actual value of the label.
$\hat{y}$ is the value that the model predicts for $y$.

See Linear regression: Loss in Machine Learning Crash Course for more information.

L ₁ regularization

#মৌলিক বিষয়াবলী

A type of regularization that penalizes weights in proportion to the sum of the absolute value of the weights. L ₁ regularization helps drive the weights of irrelevant or barely relevant features to exactly 0 . A feature with a weight of 0 is effectively removed from the model.

Contrast with L ₂ regularization .

L ₂ loss

#মৌলিক বিষয়াবলী

#মেট্রিক

A loss function that calculates the square of the difference between actual label values and the values that a model predicts. For example, here's the calculation of L ₂ loss for a batch of five examples :

Actual value of example	Model's predicted value	Square of delta
৭	৬	১
৫	৪	১
৮	১১	9
৪	৬	৪
9	৮	১
		16 = L ₂ loss

Due to squaring, L ₂ loss amplifies the influence of outliers . That is, L ₂ loss reacts more strongly to bad predictions than L ₁ loss . For example, the L ₁ loss for the preceding batch would be 8 rather than 16. Notice that a single outlier accounts for 9 of the 16.

Regression models typically use L ₂ loss as the loss function.

The Mean Squared Error is the average L ₂ loss per example. Squared loss is another name for L ₂ loss.

Click the icon to see the formal math.

$$ L_2 loss = \sum_{i=0}^n {(y_i - \hat{y}_i)}^2$$

কোথায়:

$n$ is the number of examples.
$y$ is the actual value of the label.
$\hat{y}$ is the value that the model predicts for $y$.

See Logistic regression: Loss and regularization in Machine Learning Crash Course for more information.

L ₂ regularization

#মৌলিক বিষয়াবলী

A type of regularization that penalizes weights in proportion to the sum of the squares of the weights. L ₂ regularization helps drive outlier weights (those with high positive or low negative values) closer to 0 but not quite to 0 . Features with values very close to 0 remain in the model but don't influence the model's prediction very much.

L ₂ regularization always improves generalization in linear models .

Contrast with L ₁ regularization .

See Overfitting: L2 regularization in Machine Learning Crash Course for more information.

লেবেল

#মৌলিক বিষয়াবলী

In supervised machine learning , the "answer" or "result" portion of an example .

Each labeled example consists of one or more features and a label. For example, in a spam detection dataset, the label would probably be either "spam" or "not spam." In a rainfall dataset, the label might be the amount of rain that fell during a certain period.

See Supervised Learning in Introduction to Machine Learning for more information.

labeled example

#মৌলিক বিষয়াবলী

An example that contains one or more features and a label . For example, the following table shows three labeled examples from a house valuation model, each with three features and one label:

শোবার ঘরের সংখ্যা	বাথরুমের সংখ্যা	House age	House price (label)
৩	2	১৫	$৩৪৫,০০০
2	১	৭২	$179,000
৪	2	৩৪	$392,000

In supervised machine learning , models train on labeled examples and make predictions on unlabeled examples .

Contrast labeled example with unlabeled examples.

See Supervised Learning in Introduction to Machine Learning for more information.

ল্যাম্বডা

#মৌলিক বিষয়াবলী

Synonym for regularization rate .

Lambda is an overloaded term. Here we're focusing on the term's definition within regularization .

স্তর

#মৌলিক বিষয়াবলী

A set of neurons in a neural network . Three common types of layers are as follows:

The input layer , which provides values for all the features .
One or more hidden layers , which find nonlinear relationships between the features and the label.
The output layer , which provides the prediction.

For example, the following illustration shows a neural network with one input layer, two hidden layers, and one output layer:

A neural network with one input layer, two hidden layers, and one
output layer. The input layer consists of two features. The first
hidden layer consists of three neurons and the second hidden layer
consists of two neurons. The output layer consists of a single node.

In TensorFlow , layers are also Python functions that take Tensors and configuration options as input and produce other tensors as output.

শেখার হার

#মৌলিক বিষয়াবলী

A floating-point number that tells the gradient descent algorithm how strongly to adjust weights and biases on each iteration . For example, a learning rate of 0.3 would adjust weights and biases three times more powerfully than a learning rate of 0.1.

Learning rate is a key hyperparameter . If you set the learning rate too low, training will take too long. If you set the learning rate too high, gradient descent often has trouble reaching convergence .

Click the icon for a more mathematical explanation.

During each iteration, the gradient descent algorithm multiplies the learning rate by the gradient. The resulting product is called the gradient step .

রৈখিক

#মৌলিক বিষয়াবলী

A relationship between two or more variables that can be represented solely through addition and multiplication.

The plot of a linear relationship is a line.

Contrast with nonlinear .

রৈখিক মডেল

#মৌলিক বিষয়াবলী

A model that assigns one weight per feature to make predictions . (Linear models also incorporate a bias .) In contrast, the relationship of features to predictions in deep models is generally nonlinear .

Linear models are usually easier to train and more interpretable than deep models. However, deep models can learn complex relationships between features.

Linear regression and logistic regression are two types of linear models.

Click the icon to see the math.

A linear model follows this formula:

$$y' = b + w_1x_1 + w_2x_2 + … w_nx_n$$

কোথায়:

y' is the raw prediction. (In certain kinds of linear models, this raw prediction will be further modified. For example, see logistic regression .)
b is the bias .
w is a weight , so w ₁ is the weight of the first feature, w ₂ is the weight of the second feature, and so on.
x is a feature , so x ₁ is the value of the first feature, x ₂ is the value of the second feature, and so on.

For example, suppose a linear model for three features learns the following bias and weights:

খ = ৭
w ₁ = -2.5
w ₂ = -1.2
w ₃ = 1.4

Therefore, given three features (x ₁ , x ₂ , and x ₃ ), the linear model uses the following equation to generate each prediction:

y' = 7 + (-2.5)(x₁) + (-1.2)(x₂) + (1.4)(x₃)

Suppose a particular example contains the following values:

x ₁ = 4
x ₂ = -10
x ₃ = 5

Plugging those values into the formula yields a prediction for this example:

y' = 7 + (-2.5)(4) + (-1.2)(-10) + (1.4)(5)
y' = 16

Linear models include not only models that use only a linear equation to make predictions but also a broader set of models that use a linear equation as just one component of the formula that makes predictions. For example, logistic regression post-processes the raw prediction (y') to produce a final prediction value between 0 and 1, exclusively.

রৈখিক রিগ্রেশন

#মৌলিক বিষয়াবলী

A type of machine learning model in which both of the following are true:

The model is a linear model .
The prediction is a floating-point value. (This is the regression part of linear regression .)

Contrast linear regression with logistic regression . Also, contrast regression with classification .

See Linear regression in Machine Learning Crash Course for more information.

লজিস্টিক রিগ্রেশন

#মৌলিক বিষয়াবলী

A type of regression model that predicts a probability. Logistic regression models have the following characteristics:

The label is categorical . The term logistic regression usually refers to binary logistic regression , that is, to a model that calculates probabilities for labels with two possible values. A less common variant, multinomial logistic regression , calculates probabilities for labels with more than two possible values.
The loss function during training is Log Loss . (Multiple Log Loss units can be placed in parallel for labels with more than two possible values.)
The model has a linear architecture, not a deep neural network. However, the remainder of this definition also applies to deep models that predict probabilities for categorical labels.

For example, consider a logistic regression model that calculates the probability of an input email being either spam or not spam. During inference, suppose the model predicts 0.72. Therefore, the model is estimating:

A 72% chance of the email being spam.
A 28% chance of the email not being spam.

A logistic regression model uses the following two-step architecture:

The model generates a raw prediction (y') by applying a linear function of input features.
The model uses that raw prediction as input to a sigmoid function , which converts the raw prediction to a value between 0 and 1, exclusive.

Like any regression model, a logistic regression model predicts a number. However, this number typically becomes part of a binary classification model as follows:

If the predicted number is greater than the classification threshold , the binary classification model predicts the positive class.
If the predicted number is less than the classification threshold, the binary classification model predicts the negative class.

See Logistic regression in Machine Learning Crash Course for more information.

Log Loss

#মৌলিক বিষয়াবলী

The loss function used in binary logistic regression .

Click the icon to see the math.

The following formula calculates Log Loss:

$$\text{Log Loss} = \sum_{(x,y)\in D} -y\log(y') - (1 - y)\log(1 - y')$$

কোথায়:

$(x,y)\in D$ is the dataset containing many labeled examples, which are $(x,y)$ জোড়া।
$y$ is the label in a labeled example. Since this is logistic regression, every value of $y$ must either be 0 or 1.
$y'$ is the predicted value (somewhere between 0 and 1, exclusive), given the set of features in $x$.

See Logistic regression: Loss and regularization in Machine Learning Crash Course for more information.

log-odds

#মৌলিক বিষয়াবলী

The logarithm of the odds of some event.

Click the icon to see the math.

If the event is a binary probability, then odds refers to the ratio of the probability of success ( p ) to the probability of failure (1- p ). For example, suppose that a given event has a 90% probability of success and a 10% probability of failure. In this case, odds is calculated as follows:

$$ {\text{odds}} = \frac{\text{p}} {\text{(1-p)}} = \frac{.9} {.1} = {\text{9}} $$

The log-odds is simply the logarithm of the odds. By convention, "logarithm" refers to natural logarithm , but logarithm could actually be any base greater than 1. Sticking to convention, the log-odds of our example is therefore:

$$ {\text{log-odds}} = ln(9) ~= 2.2 $$

The log-odds function is the inverse of the sigmoid function .

ক্ষতি

#মৌলিক বিষয়াবলী

#মেট্রিক

During the training of a supervised model , a measure of how far a model's prediction is from its label .

A loss function calculates the loss.

See Linear regression: Loss in Machine Learning Crash Course for more information.

loss curve

#মৌলিক বিষয়াবলী

A plot of loss as a function of the number of training iterations . The following plot shows a typical loss curve:

A Cartesian graph of loss versus training iterations, showing a
rapid drop in loss for the initial iterations, followed by a gradual
drop, and then a flat slope during the final iterations.

Loss curves can help you determine when your model is converging or overfitting .

Loss curves can plot all of the following types of loss:

training loss
validation loss
test loss

ক্ষতি ফাংশন

#মৌলিক বিষয়াবলী

#মেট্রিক

During training or testing, a mathematical function that calculates the loss on a batch of examples. A loss function returns a lower loss for models that makes good predictions than for models that make bad predictions.

The goal of training is typically to minimize the loss that a loss function returns.

Many different kinds of loss functions exist. Pick the appropriate loss function for the kind of model you are building. For example:

L ₂ loss (or Mean Squared Error ) is the loss function for linear regression .
Log Loss is the loss function for logistic regression .

ম

মেশিন লার্নিং

#মৌলিক বিষয়াবলী

A program or system that trains a model from input data. The trained model can make useful predictions from new (never-before-seen) data drawn from the same distribution as the one used to train the model.

Machine learning also refers to the field of study concerned with these programs or systems.

See the Introduction to Machine Learning course for more information.

majority class

#মৌলিক বিষয়াবলী

The more common label in a class-imbalanced dataset . For example, given a dataset containing 99% negative labels and 1% positive labels, the negative labels are the majority class.

Contrast with minority class .

See Datasets: Imbalanced datasets in Machine Learning Crash Course for more information.

mini-batch

#মৌলিক বিষয়াবলী

A small, randomly selected subset of a batch processed in one iteration . The batch size of a mini-batch is usually between 10 and 1,000 examples.

For example, suppose the entire training set (the full batch) consists of 1,000 examples. Further suppose that you set the batch size of each mini-batch to 20. Therefore, each iteration determines the loss on a random 20 of the 1,000 examples and then adjusts the weights and biases accordingly.

It is much more efficient to calculate the loss on a mini-batch than the loss on all the examples in the full batch.

minority class

#মৌলিক বিষয়াবলী

The less common label in a class-imbalanced dataset . For example, given a dataset containing 99% negative labels and 1% positive labels, the positive labels are the minority class.

Contrast with majority class .

Click the icon for additional notes.

A training set with a million examples sounds impressive. However, if the minority class is poorly represented, then even a very large training set might be insufficient. Focus less on the total number of examples in the dataset and more on the number of examples in the minority class.

If your dataset doesn't contain enough minority class examples, consider using downsampling (the definition in the second bullet) to supplement the minority class.

See Datasets: Imbalanced datasets in Machine Learning Crash Course for more information.

মডেল

#মৌলিক বিষয়াবলী

In general, any mathematical construct that processes input data and returns output. Phrased differently, a model is the set of parameters and structure needed for a system to make predictions. In supervised machine learning , a model takes an example as input and infers a prediction as output. Within supervised machine learning, models differ somewhat. For example:

A linear regression model consists of a set of weights and a bias .
A neural network model consists of:
- A set of hidden layers , each containing one or more neurons .
- The weights and bias associated with each neuron.
A decision tree model consists of:
- The shape of the tree; that is, the pattern in which the conditions and leaves are connected.
- The conditions and leaves.

You can save, restore, or make copies of a model.

Unsupervised machine learning also generates models, typically a function that can map an input example to the most appropriate cluster .

Click the icon to compare algebraic and programming functions to ML models.

An algebraic function such as the following is a model:

  f(x, y) = 3x -5xy + y² + 17

The preceding function maps input values ( x and y ) to output.

Similarly, a programming function like the following is also a model:

def half_of_greater(x, y):
  if (x > y):
    return(x / 2)
  else
    return(y / 2)

A caller passes arguments to the preceding Python function, and the Python function generates output (via the return statement).

Although a deep neural network has a very different mathematical structure than an algebraic or programming function, a deep neural network still takes input (an example) and returns output (a prediction).

A human programmer codes a programming function manually. In contrast, a machine learning model gradually learns the optimal parameters during automated training.

multi-class classification

#মৌলিক বিষয়াবলী

In supervised learning, a classification problem in which the dataset contains more than two classes of labels. For example, the labels in the Iris dataset must be one of the following three classes:

Iris setosa
আইরিস ভার্জিনিকা
আইরিস ভার্সিকলার

A model trained on the Iris dataset that predicts Iris type on new examples is performing multi-class classification.

In contrast, classification problems that distinguish between exactly two classes are binary classification models . For example, an email model that predicts either spam or not spam is a binary classification model.

In clustering problems, multi-class classification refers to more than two clusters.

See Neural networks: Multi-class classification in Machine Learning Crash Course for more information.

ন

negative class

#মৌলিক বিষয়াবলী

#মেট্রিক

In binary classification , one class is termed positive and the other is termed negative . The positive class is the thing or event that the model is testing for and the negative class is the other possibility. For example:

The negative class in a medical test might be "not tumor."
The negative class in an email classification model might be "not spam."

Contrast with positive class .

নিউরাল নেটওয়ার্ক

#মৌলিক বিষয়াবলী

A model containing at least one hidden layer . A deep neural network is a type of neural network containing more than one hidden layer. For example, the following diagram shows a deep neural network containing two hidden layers.

A neural network with an input layer, two hidden layers, and an
output layer.

Each neuron in a neural network connects to all of the nodes in the next layer. For example, in the preceding diagram, notice that each of the three neurons in the first hidden layer separately connect to both of the two neurons in the second hidden layer.

Neural networks implemented on computers are sometimes called artificial neural networks to differentiate them from neural networks found in brains and other nervous systems.

Some neural networks can mimic extremely complex nonlinear relationships between different features and the label.

আরও তথ্যের জন্য মেশিন লার্নিং ক্র্যাশ কোর্সে নিউরাল নেটওয়ার্ক দেখুন।

নিউরন

#মৌলিক বিষয়াবলী

In machine learning, a distinct unit within a hidden layer of a neural network . Each neuron performs the following two-step action:

Calculates the weighted sum of input values multiplied by their corresponding weights.
Passes the weighted sum as input to an activation function .

A neuron in the first hidden layer accepts inputs from the feature values in the input layer . A neuron in any hidden layer beyond the first accepts inputs from the neurons in the preceding hidden layer. For example, a neuron in the second hidden layer accepts inputs from the neurons in the first hidden layer.

The following illustration highlights two neurons and their inputs.

A neural network with an input layer, two hidden layers, and an
output layer. Two neurons are highlighted: one in the first
hidden layer and one in the second hidden layer. The highlighted
neuron in the first hidden layer receives inputs from both features
in the input layer. The highlighted neuron in the second hidden layer
receives inputs from each of the three neurons in the first hidden
layer.

A neuron in a neural network mimics the behavior of neurons in brains and other parts of nervous systems.

node (neural network)

#মৌলিক বিষয়াবলী

A neuron in a hidden layer .

See Neural Networks in Machine Learning Crash Course for more information.

অরৈখিক

#মৌলিক বিষয়াবলী

A relationship between two or more variables that can't be represented solely through addition and multiplication. A linear relationship can be represented as a line; a nonlinear relationship can't be represented as a line. For example, consider two models that each relate a single feature to a single label. The model on the left is linear and the model on the right is nonlinear:

Two plots. One plot is a line, so this is a linear relationship.
The other plot is a curve, so this is a nonlinear relationship.

See Neural networks: Nodes and hidden layers in Machine Learning Crash Course to experiment with different kinds of nonlinear functions.

nonstationarity

#মৌলিক বিষয়াবলী

A feature whose values change across one or more dimensions, usually time. For example, consider the following examples of nonstationarity:

The number of swimsuits sold at a particular store varies with the season.
The quantity of a particular fruit harvested in a particular region is zero for much of the year but large for a brief period.
Due to climate change, annual mean temperatures are shifting.

Contrast with stationarity .

স্বাভাবিকীকরণ

#মৌলিক বিষয়াবলী

Broadly speaking, the process of converting a variable's actual range of values into a standard range of values, such as:

-১ থেকে +১
০ থেকে ১
Z-scores (roughly, -3 to +3)

For example, suppose the actual range of values of a certain feature is 800 to 2,400. As part of feature engineering , you could normalize the actual values down to a standard range, such as -1 to +1.

Normalization is a common task in feature engineering . Models usually train faster (and produce better predictions) when every numerical feature in the feature vector has roughly the same range.

সংখ্যাসূচক তথ্য

#মৌলিক বিষয়াবলী

Features represented as integers or real-valued numbers. For example, a house valuation model would probably represent the size of a house (in square feet or square meters) as numerical data. Representing a feature as numerical data indicates that the feature's values have a mathematical relationship to the label. That is, the number of square meters in a house probably has some mathematical relationship to the value of the house.

Not all integer data should be represented as numerical data. For example, postal codes in some parts of the world are integers; however, integer postal codes shouldn't be represented as numerical data in models. That's because a postal code of 20000 is not twice (or half) as potent as a postal code of 10000. Furthermore, although different postal codes do correlate to different real estate values, we can't assume that real estate values at postal code 20000 are twice as valuable as real estate values at postal code 10000. Postal codes should be represented as categorical data instead.

Numerical features are sometimes called continuous features .

See Working with numerical data in Machine Learning Crash Course for more information.

ও

অফলাইন

#মৌলিক বিষয়াবলী

Synonym for static .

offline inference

#মৌলিক বিষয়াবলী

The process of a model generating a batch of predictions and then caching (saving) those predictions. Apps can then access the inferred prediction from the cache rather than rerunning the model.

For example, consider a model that generates local weather forecasts (predictions) once every four hours. After each model run, the system caches all the local weather forecasts. Weather apps retrieve the forecasts from the cache.

Offline inference is also called static inference .

Contrast with online inference . See Production ML systems: Static versus dynamic inference in Machine Learning Crash Course for more information.

one-hot encoding

#মৌলিক বিষয়াবলী

Representing categorical data as a vector in which:

One element is set to 1.
All other elements are set to 0.

One-hot encoding is commonly used to represent strings or identifiers that have a finite set of possible values. For example, suppose a certain categorical feature named Scandinavia has five possible values:

"Denmark"
"Sweden"
"Norway"
"Finland"
"Iceland"

One-hot encoding could represent each of the five values as follows:

দেশ	Vector
"Denmark"	১	0	0	0	0
"Sweden"	0	১	0	0	0
"Norway"	0	0	১	0	0
"Finland"	0	0	0	১	0
"Iceland"	0	0	0	0	১

Thanks to one-hot encoding, a model can learn different connections based on each of the five countries.

Representing a feature as numerical data is an alternative to one-hot encoding. Unfortunately, representing the Scandinavian countries numerically is not a good choice. For example, consider the following numeric representation:

"Denmark" is 0
"Sweden" is 1
"Norway" is 2
"Finland" is 3
"Iceland" is 4

With numeric encoding, a model would interpret the raw numbers mathematically and would try to train on those numbers. However, Iceland isn't actually twice as much (or half as much) of something as Norway, so the model would come to some strange conclusions.

See Categorical data: Vocabulary and one-hot encoding in Machine Learning Crash Course for more information.

one-vs.-all

#মৌলিক বিষয়াবলী

Given a classification problem with N classes, a solution consisting of N separate binary classification model—one binary classification model for each possible outcome. For example, given a model that classifies examples as animal, vegetable, or mineral, a one-vs.-all solution would provide the following three separate binary classification models:

animal versus not animal
vegetable versus not vegetable
mineral versus not mineral

অনলাইন

#মৌলিক বিষয়াবলী

Synonym for dynamic .

online inference

#মৌলিক বিষয়াবলী

Generating predictions on demand. For example, suppose an app passes input to a model and issues a request for a prediction. A system using online inference responds to the request by running the model (and returning the prediction to the app).

Contrast with offline inference .

See Production ML systems: Static versus dynamic inference in Machine Learning Crash Course for more information.

output layer

#মৌলিক বিষয়াবলী

The "final" layer of a neural network. The output layer contains the prediction.

The following illustration shows a small deep neural network with an input layer, two hidden layers, and an output layer:

অতিরিক্ত মানানসই

#মৌলিক বিষয়াবলী

Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.

Regularization can reduce overfitting. Training on a large and diverse training set can also reduce overfitting.

Click the icon for additional notes.

Overfitting is like strictly following advice from only your favorite teacher. You'll probably be successful in that teacher's class, but you might "overfit" to that teacher's ideas and be unsuccessful in other classes. Following advice from a mixture of teachers will enable you to adapt better to new situations.

See Overfitting in Machine Learning Crash Course for more information.

প

পান্ডা

#মৌলিক বিষয়াবলী

A column-oriented data analysis API built on top of numpy . Many machine learning frameworks, including TensorFlow, support pandas data structures as inputs. See the pandas documentation for details.

প্যারামিটার

#মৌলিক বিষয়াবলী

The weights and biases that a model learns during training . For example, in a linear regression model, the parameters consist of the bias ( b ) and all the weights ( w ₁ , w ₂ , and so on) in the following formula:

$$y' = b + w_1x_1 + w_2x_2 + … w_nx_n$$

In contrast, hyperparameters are the values that you (or a hyperparameter tuning service) supply to the model. For example, learning rate is a hyperparameter.

positive class

#মৌলিক বিষয়াবলী

#মেট্রিক

The class you are testing for.

For example, the positive class in a cancer model might be "tumor." The positive class in an email classification model might be "spam."

Contrast with negative class .

Click the icon for additional notes.

The term positive class can be confusing because the "positive" outcome of many tests is often an undesirable result. For example, the positive class in many medical tests corresponds to tumors or diseases. In general, you want a doctor to tell you, "Congratulations! Your test results were negative." Regardless, the positive class is the event that the test is seeking to find.

Admittedly, you're simultaneously testing for both the positive and negative classes.

প্রক্রিয়াকরণ পরবর্তী

#দায়িত্বশীল

#মৌলিক বিষয়াবলী

মডেলটি চালানোর পরে মডেলের আউটপুট সামঞ্জস্য করা। পোস্ট-প্রসেসিং মডেলগুলিকে পরিবর্তন না করেই ন্যায্যতার সীমাবদ্ধতা প্রয়োগ করতে ব্যবহার করা যেতে পারে।

উদাহরণস্বরূপ, কেউ একটি বাইনারি শ্রেণীবিভাগ মডেলে পোস্ট-প্রসেসিং প্রয়োগ করতে পারে একটি শ্রেণীবিভাগ থ্রেশহোল্ড সেট করে যাতে কোনও বৈশিষ্ট্যের জন্য সুযোগের সমতা বজায় থাকে এবং সেই বৈশিষ্ট্যের সমস্ত মানের জন্য প্রকৃত ধনাত্মক হার একই কিনা তা পরীক্ষা করে।

নির্ভুলতা

#মৌলিক বিষয়াবলী

#মেট্রিক

A metric for classification models that answers the following question:

When the model predicted the positive class , what percentage of the predictions were correct?

সূত্রটি এখানে:

$$\text{Precision} = \frac{\text{true positives}} {\text{true positives} + \text{false positives}}$$

কোথায়:

true positive means the model correctly predicted the positive class.
false positive means the model mistakenly predicted the positive class.

For example, suppose a model made 200 positive predictions. Of these 200 positive predictions:

150 were true positives.
50 were false positives.

এই ক্ষেত্রে:

$$\text{Precision} = \frac{\text{150}} {\text{150} + \text{50}} = 0.75$$

Contrast with accuracy and recall .

ভবিষ্যদ্বাণী

#মৌলিক বিষয়াবলী

A model's output. For example:

The prediction of a binary classification model is either the positive class or the negative class.
The prediction of a multi-class classification model is one class.
The prediction of a linear regression model is a number.

proxy labels

#মৌলিক বিষয়াবলী

Data used to approximate labels not directly available in a dataset.

For example, suppose you must train a model to predict employee stress level. Your dataset contains a lot of predictive features but doesn't contain a label named stress level. Undaunted, you pick "workplace accidents" as a proxy label for stress level. After all, employees under high stress get into more accidents than calm employees. Or do they? Maybe workplace accidents actually rise and fall for multiple reasons.

As a second example, suppose you want is it raining? to be a Boolean label for your dataset, but your dataset doesn't contain rain data. If photographs are available, you might establish pictures of people carrying umbrellas as a proxy label for is it raining? Is that a good proxy label? Possibly, but people in some cultures may be more likely to carry umbrellas to protect against sun than the rain.

Proxy labels are often imperfect. When possible, choose actual labels over proxy labels. That said, when an actual label is absent, pick the proxy label very carefully, choosing the least horrible proxy label candidate.

See Datasets: Labels in Machine Learning Crash Course for more information.

র

আরএজি

#মৌলিক বিষয়াবলী

Abbreviation for retrieval-augmented generation .

রেটার

#মৌলিক বিষয়াবলী

A human who provides labels for examples . "Annotator" is another name for rater.

See Categorical data: Common issues in Machine Learning Crash Course for more information.

স্মরণ করা

#মৌলিক বিষয়াবলী

#মেট্রিক

A metric for classification models that answers the following question:

When ground truth was the positive class , what percentage of predictions did the model correctly identify as the positive class?

সূত্রটি এখানে:

\[\text{Recall} = \frac{\text{true positives}} {\text{true positives} + \text{false negatives}} \]

কোথায়:

true positive means the model correctly predicted the positive class.
false negative means that the model mistakenly predicted the negative class .

For instance, suppose your model made 200 predictions on examples for which ground truth was the positive class. Of these 200 predictions:

180 were true positives.
20 were false negatives.

এই ক্ষেত্রে:

\[\text{Recall} = \frac{\text{180}} {\text{180} + \text{20}} = 0.9 \]

Click the icon for notes about class-imbalanced datasets.

Recall is particularly useful for determining the predictive power of classification models in which the positive class is rare. For example, consider a class-imbalanced dataset in which the positive class for a certain disease occurs in only 10 patients out of a million. Suppose your model makes five million predictions that yield the following outcomes:

30 True Positives
20 False Negatives
4,999,000 True Negatives
950 False Positives

The recall of this model is therefore:

recall = TP / (TP + FN)
recall = 30 / (30 + 20) = 0.6 = 60%

By contrast, the accuracy of this model is:

accuracy = (TP + TN) / (TP + TN + FP + FN)
accuracy = (30 + 4,999,000) / (30 + 4,999,000 + 950 + 20) = 99.98%

That high value of accuracy looks impressive but is essentially meaningless. Recall is a much more useful metric for class-imbalanced datasets than accuracy.

See Classification: Accuracy, recall, precision and related metrics for more information.

Rectified Linear Unit (ReLU)

#মৌলিক বিষয়াবলী

An activation function with the following behavior:

If input is negative or zero, then the output is 0.
If input is positive, then the output is equal to the input.

উদাহরণস্বরূপ:

If the input is -3, then the output is 0.
If the input is +3, then the output is 3.0.

Here is a plot of ReLU:

ReLU is a very popular activation function. Despite its simple behavior, ReLU still enables a neural network to learn nonlinear relationships between features and the label .

রিগ্রেশন মডেল

#মৌলিক বিষয়াবলী

Informally, a model that generates a numerical prediction. (In contrast, a classification model generates a class prediction.) For example, the following are all regression models:

A model that predicts a certain house's value in Euros, such as 423,000.
A model that predicts a certain tree's life expectancy in years, such as 23.2.
A model that predicts the amount of rain in inches that will fall in a certain city over the next six hours, such as 0.18.

Two common types of regression models are:

Linear regression , which finds the line that best fits label values to features.
Logistic regression , which generates a probability between 0.0 and 1.0 that a system typically then maps to a class prediction.

Not every model that outputs numerical predictions is a regression model. In some cases, a numeric prediction is really just a classification model that happens to have numeric class names. For example, a model that predicts a numeric postal code is a classification model, not a regression model.

নিয়মিতকরণ

#মৌলিক বিষয়াবলী

Any mechanism that reduces overfitting . Popular types of regularization include:

L ₁ regularization
L ₂ regularization
dropout regularization
early stopping (this is not a formal regularization method, but can effectively limit overfitting)

Regularization can also be defined as the penalty on a model's complexity.

Click the icon for additional notes.

Regularization is counterintuitive. Increasing regularization usually increases training loss, which is confusing because, well, isn't the goal to minimize training loss?

Actually, no. The goal isn't to minimize training loss. The goal is to make excellent predictions on real-world examples. Remarkably, even though increasing regularization increases training loss, it usually helps models make better predictions on real-world examples.

See Overfitting: Model complexity in Machine Learning Crash Course for more information.

regularization rate

#মৌলিক বিষয়াবলী

A number that specifies the relative importance of regularization during training. Raising the regularization rate reduces overfitting but may reduce the model's predictive power. Conversely, reducing or omitting the regularization rate increases overfitting.

Click the icon to see the math.

The regularization rate is usually represented as the Greek letter lambda. The following simplified loss equation shows lambda's influence:

$$\text{minimize(loss function + }\lambda\text{(regularization))}$$

where regularization is any regularization mechanism, including;

L ₁ regularization
L ₂ regularization

See Overfitting: L2 regularization in Machine Learning Crash Course for more information.

রিলু

#মৌলিক বিষয়াবলী

Abbreviation for Rectified Linear Unit .

retrieval-augmented generation (RAG)

#মৌলিক বিষয়াবলী

A technique for improving the quality of large language model (LLM) output by grounding it with sources of knowledge retrieved after the model was trained. RAG improves the accuracy of LLM responses by providing the trained LLM with access to information retrieved from trusted knowledge bases or documents.

Common motivations to use retrieval-augmented generation include:

Increasing the factual accuracy of a model's generated responses.
Giving the model access to knowledge it was not trained on.
Changing the knowledge that the model uses.
Enabling the model to cite sources.

For example, suppose that a chemistry app uses the PaLM API to generate summaries related to user queries. When the app's backend receives a query, the backend:

Searches for ("retrieves") data that's relevant to the user's query.
Appends ("augments") the relevant chemistry data to the user's query.
Instructs the LLM to create a summary based on the appended data.

ROC (receiver operating characteristic) Curve

#মৌলিক বিষয়াবলী

#মেট্রিক

A graph of true positive rate versus false positive rate for different classification thresholds in binary classification.

The shape of an ROC curve suggests a binary classification model's ability to separate positive classes from negative classes. Suppose, for example, that a binary classification model perfectly separates all the negative classes from all the positive classes:

A number line with 8 positive examples on the right side and
7 negative examples on the left.

The ROC curve for the preceding model looks as follows:

An ROC curve. The x-axis is False Positive Rate and the y-axis
is True Positive Rate. The curve has an inverted L shape. The curve
starts at (0.0,0.0) and goes straight up to (0.0,1.0). Then the curve
goes from (0.0,1.0) to (1.0,1.0).

In contrast, the following illustration graphs the raw logistic regression values for a terrible model that can't separate negative classes from positive classes at all:

A number line with positive examples and negative classes
completely intermixed.

The ROC curve for this model looks as follows:

An ROC curve, which is actually a straight line from (0.0,0.0)
to (1.0,1.0).

Meanwhile, back in the real world, most binary classification models separate positive and negative classes to some degree, but usually not perfectly. So, a typical ROC curve falls somewhere between the two extremes:

An ROC curve. The x-axis is False Positive Rate and the y-axis
is True Positive Rate. The ROC curve approximates a shaky arc
traversing the compass points from West to North.

The point on an ROC curve closest to (0.0,1.0) theoretically identifies the ideal classification threshold. However, several other real-world issues influence the selection of the ideal classification threshold. For example, perhaps false negatives cause far more pain than false positives.

A numerical metric called AUC summarizes the ROC curve into a single floating-point value.

রুট গড় বর্গ ত্রুটি (RMSE)

#মৌলিক বিষয়াবলী

#মেট্রিক

The square root of the Mean Squared Error .

স

সিগময়েড ফাংশন

#মৌলিক বিষয়াবলী

A mathematical function that "squishes" an input value into a constrained range, typically 0 to 1 or -1 to +1. That is, you can pass any number (two, a million, negative billion, whatever) to a sigmoid and the output will still be in the constrained range. A plot of the sigmoid activation function looks as follows:

The sigmoid function has several uses in machine learning, including:

Converting the raw output of a logistic regression or multinomial regression model to a probability.
Acting as an activation function in some neural networks.

Click the icon to see the math.

The sigmoid function over an input number x has the following formula:

$$ sigmoid(x) = \frac{1}{1 + e^{-\text{x}}} $$

In machine learning, x is generally a weighted sum .

সফটম্যাক্স

#মৌলিক বিষয়াবলী

A function that determines probabilities for each possible class in a multi-class classification model . The probabilities add up to exactly 1.0. For example, the following table shows how softmax distributes various probabilities:

Image is a...	সম্ভাবনা
কুকুর	.৮৫
বিড়াল	.১৩
ঘোড়া	.০২

Softmax is also called full softmax .

Contrast with candidate sampling .

Click the icon to see the math.

The softmax equation is as follows:

$$\sigma_i = \frac{e^{\text{z}_i}} {\sum_{j=1}^{j=K} {e^{\text{z}_j}}} $$

কোথায়:

$\sigma_i$ is the output vector. Each element of the output vector specifies the probability of this element. The sum of all the elements in the output vector is 1.0. The output vector contains the same number of elements as the input vector, $z$.
$z$ is the input vector. Each element of the input vector contains a floating-point value.
$K$ is the number of elements in the input vector (and the output vector).

For example, suppose the input vector is:

[1.2, 2.5, 1.8]

Therefore, softmax calculates the denominator as follows:

$$\text{denominator} = e^{1.2} + e^{2.5} + e^{1.8} = 21.552$$

The softmax probability of each element is therefore:

$$\sigma_1 = \frac{e^{1.2}}{21.552} = 0.154 $$$$\sigma_2 = \frac{e^{2.5}}{21.552} = 0.565 $$$$\sigma_1 = \frac{e^{1.8}}{21.552} = 0.281 $$

So, the output vector is therefore:

$$\sigma = [0.154, 0.565, 0.281]$$

The sum of the three elements in $\sigma$ is 1.0. Phew!

See Neural networks: Multi-class classification in Machine Learning Crash Course for more information.

sparse feature

#মৌলিক বিষয়াবলী

A feature whose values are predominately zero or empty. For example, a feature containing a single 1 value and a million 0 values is sparse. In contrast, a dense feature has values that are predominantly not zero or empty.

In machine learning, a surprising number of features are sparse features. Categorical features are usually sparse features. For example, of the 300 possible tree species in a forest, a single example might identify just a maple tree . Or, of the millions of possible videos in a video library, a single example might identify just "Casablanca."

In a model, you typically represent sparse features with one-hot encoding . If the one-hot encoding is big, you might put an embedding layer on top of the one-hot encoding for greater efficiency.

sparse representation

#মৌলিক বিষয়াবলী

Storing only the position(s) of nonzero elements in a sparse feature.

For example, suppose a categorical feature named species identifies the 36 tree species in a particular forest. Further assume that each example identifies only a single species.

You could use a one-hot vector to represent the tree species in each example. A one-hot vector would contain a single 1 (to represent the particular tree species in that example) and 35 0 s (to represent the 35 tree species not in that example). So, the one-hot representation of maple might look something like the following:

A vector in which positions 0 through 23 hold the value 0, position
24 holds the value 1, and positions 25 through 35 hold the value 0.

Alternatively, sparse representation would simply identify the position of the particular species. If maple is at position 24, then the sparse representation of maple would simply be:

Notice that the sparse representation is much more compact than the one-hot representation.

Click the icon for a slightly more complex example.

Suppose each example in your model must represent the words—but not the order of those words—in an English sentence. English consists of about 170,000 words, so English is a categorical feature with about 170,000 elements. Most English sentences use an extremely tiny fraction of those 170,000 words, so the set of words in a single example is almost certainly going to be sparse data.

Consider the following sentence:

My dog is a great dog

You could use a variant of one-hot vector to represent the words in this sentence. In this variant, multiple cells in the vector can contain a nonzero value. Furthermore, in this variant, a cell can contain an integer other than one. Although the words "my", "is", "a", and "great" appear only once in the sentence, the word "dog" appears twice. Using this variant of one-hot vectors to represent the words in this sentence yields the following 170,000-element vector:

A sparse representation of the same sentence would simply be:

Click the icon if you are confused.

The term "sparse representation" confuses a lot of people because sparse representation is itself not a sparse vector . Rather, sparse representation is actually a dense representation of a sparse vector . The synonym index representation is a little clearer than "sparse representation."

See Working with categorical data in Machine Learning Crash Course for more information.

sparse vector

#মৌলিক বিষয়াবলী

A vector whose values are mostly zeroes. See also sparse feature and sparsity .

squared loss

#মৌলিক বিষয়াবলী

#মেট্রিক

Synonym for L ₂ loss .

স্থির

#মৌলিক বিষয়াবলী

Something done once rather than continuously. The terms static and offline are synonyms. The following are common uses of static and offline in machine learning:

static model (or offline model ) is a model trained once and then used for a while.
static training (or offline training ) is the process of training a static model.
static inference (or offline inference ) is a process in which a model generates a batch of predictions at a time.

Contrast with dynamic .

static inference

#মৌলিক বিষয়াবলী

Synonym for offline inference .

স্থিরতা

#মৌলিক বিষয়াবলী

A feature whose values don't change across one or more dimensions, usually time. For example, a feature whose values look about the same in 2021 and 2023 exhibits stationarity.

In the real world, very few features exhibit stationarity. Even features synonymous with stability (like sea level) change over time.

Contrast with nonstationarity .

stochastic gradient descent (SGD)

#মৌলিক বিষয়াবলী

A gradient descent algorithm in which the batch size is one. In other words, SGD trains on a single example chosen uniformly at random from a training set .

তত্ত্বাবধানে থাকা মেশিন লার্নিং

#মৌলিক বিষয়াবলী

Training a model from features and their corresponding labels . Supervised machine learning is analogous to learning a subject by studying a set of questions and their corresponding answers. After mastering the mapping between questions and answers, a student can then provide answers to new (never-before-seen) questions on the same topic.

Compare with unsupervised machine learning .

See Supervised Learning in the Introduction to ML course for more information.

synthetic feature

#মৌলিক বিষয়াবলী

A feature not present among the input features, but assembled from one or more of them. Methods for creating synthetic features include the following:

Bucketing a continuous feature into range bins.
Creating a feature cross .
Multiplying (or dividing) one feature value by other feature value(s) or by itself. For example, if a and b are input features, then the following are examples of synthetic features:
- আব
- একটি ^২
Applying a transcendental function to a feature value. For example, if c is an input feature, then the following are examples of synthetic features:
- sin(c)
- ln(c)

Features created by normalizing or scaling alone are not considered synthetic features.

হ

test loss

#মৌলিক বিষয়াবলী

#মেট্রিক

A metric representing a model's loss against the test set . When building a model , you typically try to minimize test loss. That's because a low test loss is a stronger quality signal than a low training loss or low validation loss .

A large gap between test loss and training loss or validation loss sometimes suggests that you need to increase the regularization rate .

প্রশিক্ষণ

#মৌলিক বিষয়াবলী

The process of determining the ideal parameters (weights and biases) comprising a model . During training, a system reads in examples and gradually adjusts parameters. Training uses each example anywhere from a few times to billions of times.

See Supervised Learning in the Introduction to ML course for more information.

training loss

#মৌলিক বিষয়াবলী

#মেট্রিক

A metric representing a model's loss during a particular training iteration. For example, suppose the loss function is Mean Squared Error . Perhaps the training loss (the Mean Squared Error) for the 10th iteration is 2.2, and the training loss for the 100th iteration is 1.9.

A loss curve plots training loss versus the number of iterations. A loss curve provides the following hints about training:

A downward slope implies that the model is improving.
An upward slope implies that the model is getting worse.
A flat slope implies that the model has reached convergence .

For example, the following somewhat idealized loss curve shows:

A steep downward slope during the initial iterations, which implies rapid model improvement.
A gradually flattening (but still downward) slope until close to the end of training, which implies continued model improvement at a somewhat slower pace then during the initial iterations.
A flat slope towards the end of training, which suggests convergence.

The plot of training loss versus iterations. This loss curve starts
with a steep downward slope. The slope gradually flattens until the
slope becomes zero.

Although training loss is important, see also generalization .

training-serving skew

#মৌলিক বিষয়াবলী

The difference between a model's performance during training and that same model's performance during serving .

training set

#মৌলিক বিষয়াবলী

The subset of the dataset used to train a model .

Traditionally, examples in the dataset are divided into the following three distinct subsets:

a training set
a validation set
a test set

Ideally, each example in the dataset should belong to only one of the preceding subsets. For example, a single example shouldn't belong to both the training set and the validation set.

See Datasets: Dividing the original dataset in Machine Learning Crash Course for more information.

true negative (TN)

#মৌলিক বিষয়াবলী

#মেট্রিক

An example in which the model correctly predicts the negative class . For example, the model infers that a particular email message is not spam , and that email message really is not spam .

true positive (TP)

#মৌলিক বিষয়াবলী

#মেট্রিক

An example in which the model correctly predicts the positive class . For example, the model infers that a particular email message is spam, and that email message really is spam.

true positive rate (TPR)

#মৌলিক বিষয়াবলী

#মেট্রিক

Synonym for recall . That is:

$$\text{true positive rate} = \frac {\text{true positives}} {\text{true positives} + \text{false negatives}}$$

True positive rate is the y-axis in an ROC curve .

উ

underfitting

#মৌলিক বিষয়াবলী

Producing a model with poor predictive ability because the model hasn't fully captured the complexity of the training data. Many problems can cause underfitting, including:

Training on the wrong set of features .
Training for too few epochs or at too low a learning rate .
Training with too high a regularization rate .
Providing too few hidden layers in a deep neural network.

See Overfitting in Machine Learning Crash Course for more information.

unlabeled example

#মৌলিক বিষয়াবলী

An example that contains features but no label . For example, the following table shows three unlabeled examples from a house valuation model, each with three features but no house value:

শোবার ঘরের সংখ্যা	বাথরুমের সংখ্যা	House age
৩	2	১৫
2	১	৭২
৪	2	৩৪

In supervised machine learning , models train on labeled examples and make predictions on unlabeled examples .

In semi-supervised and unsupervised learning, unlabeled examples are used during training.

Contrast unlabeled example with labeled example .

তত্ত্বাবধানহীন মেশিন লার্নিং

#ক্লাস্টারিং

#মৌলিক বিষয়াবলী

Training a model to find patterns in a dataset, typically an unlabeled dataset.

The most common use of unsupervised machine learning is to cluster data into groups of similar examples. For example, an unsupervised machine learning algorithm can cluster songs based on various properties of the music. The resulting clusters can become an input to other machine learning algorithms (for example, to a music recommendation service). Clustering can help when useful labels are scarce or absent. For example, in domains such as anti-abuse and fraud, clusters can help humans better understand the data.

Contrast with supervised machine learning .

Click the icon for additional notes.

Another example of unsupervised machine learning is principal component analysis (PCA) . For example, applying PCA on a dataset containing the contents of millions of shopping carts might reveal that shopping carts containing lemons frequently also contain antacids.

See What is Machine Learning? in the Introduction to ML course for more information.

V

বৈধতা

#মৌলিক বিষয়াবলী

The initial evaluation of a model's quality. Validation checks the quality of a model's predictions against the validation set .

Because the validation set differs from the training set , validation helps guard against overfitting .

You might think of evaluating the model against the validation set as the first round of testing and evaluating the model against the test set as the second round of testing.

validation loss

#মৌলিক বিষয়াবলী

#মেট্রিক

A metric representing a model's loss on the validation set during a particular iteration of training.

validation set

#মৌলিক বিষয়াবলী

The subset of the dataset that performs initial evaluation against a trained model . Typically, you evaluate the trained model against the validation set several times before evaluating the model against the test set .

Traditionally, you divide the examples in the dataset into the following three distinct subsets:

a training set
a validation set
a test set

Ideally, each example in the dataset should belong to only one of the preceding subsets. For example, a single example shouldn't belong to both the training set and the validation set.

See Datasets: Dividing the original dataset in Machine Learning Crash Course for more information.

হ

ওজন

#মৌলিক বিষয়াবলী

A value that a model multiplies by another value. Training is the process of determining a model's ideal weights; inference is the process of using those learned weights to make predictions.

Click the icon to see an example of weights in a linear model.

Imagine a linear model with two features. Suppose that training determines the following weights (and bias ):

The bias, b, has a value of 2.2
The weight, w ₁ associated with one feature is 1.5.
The weight, w ₂ associated with the other feature is 0.4.

Now imagine an example with the following feature values:

The value of one feature, x ₁ , is 6.
The value of the other feature, x ₂ , is 10.

This linear model uses the following formula to generate a prediction, y':

$$y' = b + w_1x_1 + w_2x_2$$

Therefore, the prediction is:

$$y' = 2.2 + (1.5)(6) + (0.4)(10) = 15.2$$

If a weight is 0, then the corresponding feature doesn't contribute to the model. For example, if w ₁ is 0, then the value of x ₁ is irrelevant.

See Linear regression in Machine Learning Crash Course for more information.

ওজনযুক্ত যোগফল

#মৌলিক বিষয়াবলী

The sum of all the relevant input values multiplied by their corresponding weights. For example, suppose the relevant inputs consist of the following:

ইনপুট মান	ইনপুট ওজন
2	-১.৩
-১	০.৬
৩	০.৪

The weighted sum is therefore:

weighted sum = (2)(-1.3) + (-1)(0.6) + (3)(0.4) = -2.0

A weighted sum is the input argument to an activation function .

Z

Z-score normalization

#মৌলিক বিষয়াবলী

A scaling technique that replaces a raw feature value with a floating-point value representing the number of standard deviations from that feature's mean. For example, consider a feature whose mean is 800 and whose standard deviation is 100. The following table shows how Z-score normalization would map the raw value to its Z-score:

Raw value	Z-score
৮০০	0
৯৫০	+1.5
৫৭৫	-২.২৫

The machine learning model then trains on the Z-scores for that feature instead of on the raw values.

See Numerical data: Normalization in Machine Learning Crash Course for more information.

ক

নির্ভুলতা

নির্ভুলতা এবং শ্রেণী-ভারসাম্যহীন ডেটাসেট সম্পর্কে বিস্তারিত জানতে আইকনে ক্লিক করুন।

সক্রিয়করণ ফাংশন

একটি উদাহরণ দেখতে আইকনে ক্লিক করুন।

কৃত্রিম বুদ্ধিমত্তা

AUC (ROC বক্ররেখার অধীনে এলাকা)

AUC এবং ROC বক্ররেখার মধ্যে সম্পর্ক সম্পর্কে জানতে আইকনে ক্লিক করুন।

AUC-এর আরও আনুষ্ঠানিক সংজ্ঞার জন্য আইকনে ক্লিক করুন।

খ

পশ্চাদপ্রসারণ

ব্যাচ

ব্যাচের আকার

পক্ষপাত (নীতিশাস্ত্র/ন্যায্যতা)

পক্ষপাত (গণিত) বা পক্ষপাত শব্দ

বাইনারি শ্রেণীবিভাগ

বাকেটিংয়ের কাজ

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

গ

শ্রেণীবদ্ধ তথ্য

শ্রেণী

শ্রেণীবিভাগ মডেল

শ্রেণীবিভাগের সীমা

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

শ্রেণীবদ্ধকারী

শ্রেণী-ভারসাম্যহীন ডেটাসেট

ক্লিপিং

বিভ্রান্তি ম্যাট্রিক্স

ধারাবাহিক বৈশিষ্ট্য

অভিসৃতি

দ

ডেটাফ্রেম

ডেটা সেট বা ডেটাসেট

গভীর মডেল

ঘন বৈশিষ্ট্য

গভীরতা

বিচ্ছিন্ন বৈশিষ্ট্য

গতিশীল

গতিশীল মডেল

ই

তাড়াতাড়ি থামানো

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

এম্বেডিং স্তর

যুগ

উদাহরণ

চ

মিথ্যা নেতিবাচক (FN)

মিথ্যা পজিটিভ (FP)

মিথ্যা পজিটিভ রেট (FPR)

বৈশিষ্ট্য

বৈশিষ্ট্য ক্রস

বৈশিষ্ট্য প্রকৌশল

TensorFlow সম্পর্কে অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

বৈশিষ্ট্য সেট

বৈশিষ্ট্য ভেক্টর

প্রতিক্রিয়া লুপ

গ

সাধারণীকরণ

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

সাধারণীকরণ বক্ররেখা

গ্রেডিয়েন্ট ডিসেন্ট

বাস্তব সত্য

অতিরিক্ত নোটের জন্য আইকনে ক্লিক করুন।

জ

লুকানো স্তর

হাইপারপ্যারামিটার

আমি

স্বাধীনভাবে এবং অভিন্নভাবে বিতরণ করা (iid)

অনুমান

input layer

ব্যাখ্যাযোগ্যতা

পুনরাবৃত্তি

ল

L 0 regularization

Click the icon for additional notes.

L 1 loss

Click the icon to see the formal math.

L 1 regularization

L 2 loss

Click the icon to see the formal math.

L ₀ regularization

L ₁ loss

L ₁ regularization

L ₂ loss

L ₂ regularization