Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - Asif Khan Shakir

Pages: [1] 2 3
1
You can compare a neural network to a chess game with a computer. It has algorithms, according to which it determines tactics, depending on your moves and actions. The programmer enters data on how each figure moves into the computer’s database, determines the boundaries of the chessboard, introduces a huge number of strategies that chess players play by. At the same time, the computer may, for example, be able to learn from you and other people, and it can become a deep neural network. In a while, playing with different players, it can become invincible.

The neural network is not a creative system, but a deep neural network is much more complicated than the first one. It can recognize voice commands, recognize sound and graphics, do an expert review, and perform a lot of other actions that require prediction, creative thinking, and analytics. Only the human brain has such possibilities. The neural network can get one result (a word, an action, a number, or a solution), while the deep neural network solves the problem more globally and can draw conclusions or predictions depending on the information supplied and the desired result. The neural network requires a specific input of data and algorithms of solutions, and the deep neural network can solve a problem without a significant amount of marked data.

2
Software Engineering / The Data Science Puzzle — 2020 Edition
« on: February 20, 2020, 08:18:04 PM »
With a new year upon us, let's take a fresh look at the current state of the data science puzzle. What are the most important constituent concepts of the data science landscape? How do they fit together? Which of these have been elevated in importance since the previous installment, and which are less important?

As a few years have passed since I last treated this particular topic, it might be worth having a look at this out of interest, and for comparison. We will proceed by first looking at the concept definitions from last time, and then look at how things have changed since then.

We start with the perceived original driver of the data science revolution, big data. What I said in 2017:

Big Data is still important to data science. Take your pick of metaphors, but any way you look at it, Big Data is the raw material that [...] continues to fuel the data science revolution.

As relates to Big Data, I believe that justification of data-acquisition and -retention from a business point of view, expectations that Big Data projects start providing actual financial returns, and the challenges related to data privacy and security will become the big Big Data stories not only of 2017 but moving forward in general. In short, it's time for big returns from, and big protections for, Big Data.

However, as others have opined, Big Data now "just is," and is perhaps no longer an entity deserving of the special attention it has received for the better part of a decade.

While I don't condone the capitlization of most key terms in general, "big data" seemed to previously demand this treatment given its near-fabled status and brand name-like station. Notice this time around I have reneged this status, which goes hand in hand with the idea that big data is no longer top level data science terminology. As alluded to in the final sentence, moving forward big data is simply "data," and we could reword part of that excerpt to read, "data is the raw material that continues to fuel the data science revolution."

Look, at this point we should all be aware of how important data is to the process of data science (it's right there in the name). Whether our data is big or small or lies somewhere else on the data sizing spectrum really doesn't require distinguishing from the outset. We all want to science the data and provide value, whether the data is a lot or a little. "Big data" may provide us with more or unique opportunities for the types of analytics and modeling to employ, but this seems akin to distinguishing the size of our nails from the get-go just so we know what size and type of hammer to bring along for a given job.

Data is everywhere. Much of it is big. It's time we stop emphasizing so, just like it's time we stop saying "smart" phone. The phones are all basically smart now, and making special note of it really says more about you than it does about the phone.

One thing I stand by, however, is that the challenges related to data privacy and security will only grow in importance as the years march on, and we can add ethics into that mix as well, though seriously treating these topics is beyond the scope of this article.

Here's what I said about machine learning as a component of data science last time:

Machine learning is one of the primary technical drivers of data science. The goal of data science is to extract insight from data, and machine learning is the engine which allows this process to be automated. Machine learning algorithms continue to facilitate the automatic improvement of computer programs from experience, and these algorithms are becoming increasingly vital to a variety of diverse fields.

I stand by this, and would only make the argument that machine learning is more than one of the primary technical drivers of data extraction, it is the the primary technical driver.

There are a variety of aspects to data science; we are discussing a number of them in this very article. However, when thinking about extracting insight from data which cannot be seen with the "naked eye" via descriptive statistics or the visualization of these stats or some type of business intelligence reporting — all of which can be very useful and provide invaluable illumination in the proper circumstance — machine learning is the natural path to take, a path which has automation baked in.

Machine learning is not synonymous with data science; however, given the reliance on machine learning to extract insight from data, you can forgive the many who often make this mistake.


3
Software Engineering / Top 5 must-have Data Science skills for 2020
« on: February 20, 2020, 07:52:08 PM »
Data Science is a competitive field, and people are quickly building more and more skills and experience. This has given rise to the booming job description of Machine Learning Engineer, and therefore, my advice for 2020 is that all Data Scientists need to be developers as well.

To stay competitive, make sure to prepare yourself for new ways of working that come with new tools.

 

1. Agile
Agile is a method of organizing work that is already much used by dev teams. Data Science roles are filled more and more by people who’s original skillset is pure software development, and this gives rise to the role of Machine Learning Engineer.More and more, Data Scientists/Machine Learning Engineers are managed as developers: continuously making improvements to Machine Learning elements in an existing codebase. For this type of role, Data Scientists have to know the Agile way of working based on the Scrum method. It defines several roles for different people, and this role definition makes sure that continuous improvement and be implemented smoothly.

 

2. Github
Git and Github are software for developers that are of great help when managing different versions of software. They track all changes that are made to a code base, and in addition, they add real ease in collaboration when multiple developers make changes to the same project at the same time.With the role of Data Scientist becoming more dev-heavy, it becomes key to be able to handle those dev tools. Git is becoming a serious job requirement, and it takes time to get used to best practices for using Git. It is easy to start working on Git when you’re alone or when your co-works are new, but when you join a team with Git experts and you’re still a newbie, you might struggle more than you think.

3. Industrialization
What is also changing in Data Science is the way we think about our projects. The Data Scientist is still the person who answers business questions with machine learning, as it has always been. But Data Science projects are more and more often developed for production systems, for example, as a micro-service in a larger software. At the same time, advanced types of models are getting more and more CPU and RAM intensive to execute, especially when working with Neural Networks and Deep Learning.

In terms of job descriptions of a Data Scientist, it is becoming more important to not only think about the accuracy of your model but also take into account the time of execution or other industrialization aspects of your project.

4. Cloud and Big Data
While industrialization of Machine Learning is becoming a more serious constraint for Data Scientists, it has also become a serious constraint for Data Engineers and IT in general. Where the Data Scientist can work on reducing the time needed by a model, the IT people can contribute by changing to faster compute services that are generally obtained in one or both of the following:

Cloud: moving compute resources to external vendors like AWS, Microsoft Azure, or Google Cloud makes it very easy to set up a very fast Machine Learning environment that can be accessed from a distance. This asks from Data Scientists to have a basic understanding of Cloud functioning, for example: working with servers at distance instead of your computer, or working on Linux rather than on Windows / Mac.

Big Data: a second aspect of faster IT is using Hadoop and Spark, which are tools that allow for the parallelization of tasks on many computers at the same time (worker nodes). This asks for using a different approach to implementing models as a Data Scientist because your code must allow for parallel execution.
 

5. NLP, Neural Networks, and Deep Learning
Recently, it has still been accepted for a Data Scientist to consider that NLP and image recognition as mere specializations of Data Science that not all have to master. But the use cases for image classification and NLP get more and more frequent even in ‘regular’ business. At current times, it has become unacceptable to not have at least basic knowledge of such models.

Even if you do not have direct applications of such models in your job, a hands-on project is easy to find and will allow you to understand the steps needed in image and text projects.


4
Software Engineering / Top 5 Data Science Trends for 2020
« on: February 20, 2020, 04:18:09 PM »
Technology is evolving continuously, and so are we. In the upcoming years, there will be massive growth in the AI and Machine Learning field. There is already a considerable amount of data to be managed, and with new technological advancements, we can utilize big data in many ways. For that, we have to stay up to date with the latest trends in data science.

Data Science is not a single term; it covers a variety of topics and networks, such as the Internet of Things, Deep Learning, AI, etc. In simple terms, we can count data science as a complete blend of data inference, algorithm computation, analysis, and technology that helps in solving multifaceted business problems.

Moreover, it provides businesses with advanced tools and technologies that allow them to automate complicated business processes linked with extracting, analyzing, and presenting raw data. With so much happening in the technical field, and the data being generated at a rapid speed, it is crucial to know about the latest as well as the upcoming trends in data science.

To keep you up-to-date with the trends in data science, we have created a list of top 5 data science trends that are set to push your business to achieve great success.

1. Access to Artificial Intelligence and intelligent apps

AI has become the mainstream technology for both small and large businesses, and it will bloom in the next few years. At present, we are at the initial stage of using artificial intelligence, but in 2020, we will see more advanced applications of AI in all fields. The reason AI is growing rapidly is that it allows enterprises to improve their overall business processes, and provides a better way of handling both customer and client’s data.

Though utilizing AI will still remain a challenge for many, as exploring the advancement of this technology is not that simple. In 2020, we will find more advanced apps developed with AI, Machine learning, and other technologies that can improve the way we work. Another trend that will take over the market is automated machine learning that will be helpful in transforming data science with better data management. So, you might need specialized training for executing deep learning.

2. Rapid growth in the IoT

According to a report by IDC, it is expected that the investment in IoT technology would reach $1 trillion by the end of 2020, which clearly clarifies the growth of smart and connected devices. Even in 2019, we used apps and devices that allow us to control our home appliances like AC, TV, etc. Most of you might not now that it is only possible via IoT.

If you have ever come across smart devices like Google Assistant or Microsoft Cortana that allow us to automate the regular things, then you will get an idea that the Internet of things is grabbing continuous attention from users. Thus, it will encourage businesses to invest in this technology, especially in smartphone development, that uses IoT the most.

3.Evolution of Big data analytics

When it comes to data science, we simply cannot ignore Big Data analysis, which helps businesses gain a competitive edge over data and achieve their objectives. Nowadays, enterprises use different tools and technologies, especially python, to analyze big data. Also, businesses are focused on identifying the reasons behind certain events that take place at present. And that’s where predictive analytics is used; it helps companies identify what can happen in the future.

For instance, predictive analysis helps you identify the interests of your customers by their purchase or browsing history. Depending on that, you will be able to create smarter strategies to target new customers and retain the current one.

4.Edge Computing will be on the rise

At present, edge computing is propelled by sensors. But with the growth of IoT, Edge computing will take over mainstream cloud systems. Edge computing allows businesses to store streaming data close to the data sources so that they can be analyzed in real-time. Moreover, it offers a great alternative to Big Data analytics that require high-end storage devices and higher network bandwidth.

As the number of devices and sensors for collecting data is increasing rapidly, companies are adopting edge computing, as it is capable of resolving issues related to bandwidth, latency, and connectivity. When combined with cloud technology, edge computing can provide a synchronized structure that will be helpful in minimizing the risks involved in data analysis and management.

5. Demand for Data science Security Professionals

The adoption of artificial intelligence and machine learning will give rise to many new roles in the industry. One role that will be highly in demand is data science, security professionals. As both AI and ML entirely depends on data, and to process this data efficiently, data scientists must have expertise in data science as well as command over computer science.

Though the business market already has access to many experts who are proficient in data science and computer science, there is still a need for more professional data security professionals who can process data to customers securely. For that, data security scientists must be well versed with the latest technologies of data science or big data analysis. For example, python is amongst the most used languages in data science and data analysis, so having a clear understanding of python concepts can help you tackle the problems related to data science security.

 

Conclusion
Data Science has become one of the growing fields in all industries, especially the IT industry. Thus, businesses adopting data science techniques and technologies must stay up-to-date with the latest trends. In this article, we covered five data science trends that will be on the top of the list in 2020. You can take help from these trends to analyze where you need to improve your business processes in order to achieve maximum growth and ROI.

5
Evening Program (FSIT) / Python Programming-what,where &how to learn
« on: July 14, 2019, 07:17:56 PM »
While Python 3 is the new standard and most companies will want to eventually replace all code by Python 3, a lot of applications are written in Python 2, and I think the language will stay relevant for quite some time.

 

I'd say you can't really go very wrong choosing to learn one of the two Python versions. There are indeed some difference but it is still the same language and it would be good to learn the differences between the two anyway.

 

As for tutorials there are these:

- Code Academy Python - this is a very basic course to learn Python syntax and programming principles, but it explains it well
https://www.codecademy.com/en/tracks/python

- Learn Python the Hard Way - a more detailed explanation of Python 2
https://shop.learncodethehardway.org/access/buy/2/

- Also the official Python site has tutorials
https://wiki.python.org/moin/BeginnersGuide

 

However, I think the best way to learn a language is doing a project. There are lot's of Python meetups (depending on where you live though) - maybe join a group that writes a project in Python?

 

As for an IDE, I'm a great fan of PyCharm, in my opinion it is by far the best editor for Python. It has a free Community version which will give you most of the great features. I don't know of any other editors that are comparable to PyCharm for Python coding.

6
ডিটিজাল জীবনে অনাকাঙ্খিত ঘটনাগুলো হুটহাট করেই ঘটে। কোন কিছু বুঝে উঠার আগেই সাড়ে সর্ব্বনাশ টা ঘটে যায়। তখন আমাদের মনে হয় হয়ত আরেকটু সতর্ক হওয়া দরকার ছিল। যাই হোক, চোর চলে যাওয়ার পর মাথায় এত বুদ্ধির আগমন ক্ষতিপূরণ যে কতটা করতে পারে তা আমার মত আপনারাও ভালো যানেন।তাই সময় থাকতে সব বিষয়েই একটু সতর্ক থাকা দরকার। তাহলে হয়ত কিছু অনাঙ্খিত ব্যাপার এড়িয়ে চলা সম্ভব হবে। আপনার আমার কম্পিউটার যে কোন সময় ক্র্যাশ করতেই পারে। এটাকে অনাঙ্খিত না বললেও চলে। পিসি তে বসে একটা জরুরী রিপোর্টের উপর কাজ করছেন অথবা একটি অনলাইন গেমে আজ রেকর্ড পয়েন্ট পেয়ে যেতে মাত্র কিছুক্ষনই বাকী। হঠাৎ পিসি ফ্রিজ হয়ে গেল! কোন সাড়া শব্দ নাই। কিছুক্ষন অপেক্ষা করে আশায় বুক বেধে আবার চেষ্টা করলেন। কিছুই হল না। পিসি আর কথা বলে না।

এরপর.

সেই ডেড ব্লু স্ক্রীন এরর। আপনাকে পিসি রিবুট করতে বলে নয়ত আপনার সিস্টেম ক্র্যাশ করতে পারে। এই ফ্রিজিং এর ক্ষিতকর ইফেক্ট এর ব্যাপারে হয়ত অনেকেরই আইডিয়া আছে। এই ফ্রিজিং যে সমস্ত কারণে ঘটে থাকে এবং যেভাবে আমরা এই ফ্রিজিং/ ক্র্যাশিং থেকে বেচে থাকতে পারব তার উপর একটু ধারনা নেয়া যাক -

সফটওয়্যার সমস্যা

সফটওয়্যার প্রবলেমগুলো তখনই দেখা দেয় যখন আপেডশানে সমস্যা হয় অথবা ডিভাইস ড্রাইভার সংক্রান্ত কোন সমস্যা দেখা দেয়। যদি এই রকম সমস্যা দেখা দেয় তখন আমাদের যা করনীয় তা হল  - এই ফল্টি সফটওয়্যারকে আনইন্সটল করে দেয়া অথবা রিইন্সটল করে দেয়া। সম্ভব হলে আপডেট করে নেয়া।
ত্রুটিপূর্ণ RAM

আমাদের অনেকের পিসিতে ই একাধীক RAM চিপ আছে। এই চিপ সংক্রান্ত সমস্যাগুলো দেখা দেয় যখন এগুলোর কনফিগারেশানে কোন সমস্যা হয় অথবা ইন্টারনালি ফ্যাব্রিকেশান এর সময় কোন ফল্ট থেকে যায়। আর ভয়ের কথা হল এই চিপ সংক্রান্ত সমস্যাগুলো খুবই বিপজ্জনক হয়ে থাকে এবং ফ্যাটাল এরর ঘটিয়ে থাকে। যা আপনার পিসিকে তার নরমাল ফাংশান হতে বিরত রাখতে পারে সর্বোপরি ক্র্যাশও ঘটাতে পারে।

এই সমস্ত RAM প্রবলেম গুলোকে চেক করার জন্যে আপনাকে কোন বিস্বস্ত মেমরি টেস্টিং টুল ব্যবহার করতে পারেন যার সাহায্যে আপনি বিভিন্ন টেষ্টের মাধ্যমে এরর ফাইন্ড আউট ও করতে পাবেন। তাছাড়া আপনি চাইলে আপনার চিপগুলো (অন্তত একটি) কে একবার খুলে ফেলে পিসি স্টার্ট করে দেখতে পারেন যে সেই এররগুলোর কি অবস্থা। এবং সেই থেকে আপনি হয়ত আপনার এরর কে খুঁজেও পেয়ে যেতে পারেন।
হার্ডডিস্কে সমস্যা

এই সমস্যাটি আমাদের অনেকেরই হয়ে থাকে। সাধারনত অপ্রয়োজনীয় টুল ইন্সটল করে পিসি কে জ্যাম করে ফেললেও এই সমস্যা হতে পারে। সে ক্ষেত্রে নতুন পিসি ব্যবহারকারীদের আরেকটু সতর্ক হতে হবে। কারন এই জিনিস তারা একটু বেশী করে থাকে। অনেক ক্ষেত্রে ডিস্ক ডিফ্র্যাগমেন্টেশানও অনেক সময় অনাকঙ্খিতভাবে পিসিকে স্লো ডাউন করে দিতে পারে এবং অনেক ক্ষেত্রে ফিজিক্যাল ড্যামেজ করে দিতে পারে। এই জন্যে আপনাকে বুঝে শুনে সফটওয়্যারে ইন্সটল করতে হবে। আর আর ইন্সটলিং এর জন্যে আপনি একটি ভাল থার্ড পার্টি আন ইন্সটলার সফটওয়্যার ব্যবহার করতে পারেন। যেমন  - রিভো আনইন্সটলার। যাতে সফটওয়্যার আন ইন্সটল করার পর কোন লেফটওভার ফাইল থেকে না যায়। আপনি চাইলে একটি ডিস্ক ক্লিনআপ সফটওয়্যার ব্যবহার করে দেখতে পারেন। এমনই একটি ডিস্ক ক্লিনার হল সিক্লিনার। যা আপনার জাঙ্ক ফাইলগুলো কে পিসি থেকে রিমুভ করে দেবে।
ভাইরাল এবং স্পাইওয়্যার ইনফেকশান

ভাইরাসের মত ম্যালিসিয়াস ইনফেকশান ট্রোজান, ওর্মস এবং ভিভিন্ন স্পাইওয়্যার আপনার সিস্টেমে চুপিচুপি একটা বাজে অভিজ্ঞতার স্বাদ পাইয়ে দিতে পারে। এই সমস্ত ম্যালওয়্যার বিভিনন্ন ধরনের কম্পিউটার এরর জেনারেট করতে পারে। এর মধ্যে আছে  - সিস্টেম ক্র্যাশ, পার্সোনাল ডাটা চুরি হওয়া এবং সবচেয়ে বাজে ব্যাপারটা হল আপনার হার্ডডিস্ক ক্র্যাশ হওয়া।

এই সমস্ত সমস্যা থেকে বাঁচতে হলে আপনাকে একটি ভালো এবং বাস্বস্ত অ্যান্টি ভাইরাস কে বেছে নিতে হবে। এই ক্ষেত্রে আপনাকে একটু সতর্কতা অবলম্বন করতে হবে, নয়ত আবার সমস্ত উটকো ঝামেলা পোহাতে হতে পারে আপনাকে। তাই আবারো বলে রাখি, সাবধান! আপনার অ্যান্টি ভাইরাস আপনাকেই বেছে নিতে হবে এবং সেগুলোর আপডেটিং এর ক্ষেত্রেও আপনাকে কড়া নজর রাখতে হবে।

রেজিষ্ট্রি সংক্রান্ত সমস্যা

রেজিষ্ট্রি ব্যাপারটাকে আমরা অনেকেই গুরুত্ব দেইনা। যদিও বাস্তবে একটি খুবই ইম্পরটেন্ট সেকশান। আনস্টেবল, ফ্র্যাগমেন্টেড এবং স্লো রেজিস্ট্রিরির কারণে আপনার সিস্টেমটা যদি ক্র্যাশ করেই ফেলে তাহলে আর দয়া করে অবাক হবেন না। এই সমস্ত রেজিষ্ট্রি সংক্রান্ত সমস্যা এড়াতে একটি ভালো রেজিষ্ট্রি ক্লিনার ব্যবহার করতে পারেন এবং নিয়মিত আপনার রেজিস্ট্রি ক্লিয়ার করে রাখতে পারেন।

7
Software Engineering / Top 20 Python libraries for Data Science
« on: July 06, 2019, 02:54:42 AM »
1. NumPy
NumPy is the first choice among developers and data scientists who are aware of the technologies which are dealing with data-oriented stuff. It is a Python package available for performing scientific computations. It is registered under the BSD license.

Through NumPy, you can leverage n-dimensional array objects, C, C++, Fortran program based integration tools, functions for performing complex mathematical operations like Fourier transformation, linear algebra, random number etc. One can also use NumPy as a multi-dimensional container to treat generic data. Thus, you can effectively integrate your database by choosing varieties of operations to perform with.

NumPy is installed under the TensorFlow and other complex machine learning platforms empowering their operations internally. Since it is an Array interface, it allows us multiple options to reshape large datasets. It can be used for treating images, sound waves representations, and other binary operations. If you have just marked your presence in this data science or ML field, you must have a great understanding of NumPy to process your real-world data sets.

2. Theano
Theano is another useful Python library assists data scientists in performing large multi-dimensional arrays related computing operations. It is more like TensorFlow but the only difference is, it is not that efficient.

It is getting used for distributed and parallel computing based tasks. Through it, you can optimize, express or evaluate you array-enabled mathematical operations. It is tightly coupled with NumPy powered by implemented numpy.ndarray function.

Due to GPU based infrastructure, it holds the capability to process operations in faster ways than CPU. It stands fit for speed and stability optimizations delivering us the expected outcomes.

For faster evaluation, its dynamic C code generator is popular among data scientists. Here, they can perform unit-testing to identify flaws in the whole model.

3. Keras
Keras is one of the most powerful Python libraries which allow high-level neural networks APIs for integration. Theses APIs execute over the top of TensorFlow, Theano and CNTK. Keras was created for reducing challenges faced in complex researches allowing them to compute faster. For one who is using deep learning libraries for their work, Keras is the best option.

It allows fast prototyping, supports recurrent and convolution networks individually and also their combination, execution over GPU and CPU.

Keras provides a user-friendly environment reducing your effort in cognitive load with simple APIs giving us the required results.  Due to its modular nature, one can use varieties of modules from neural layers, optimizers, activation functions etc.., for developing a new model.

It is an open source library written in Python. For data scientists having trouble adding new modules, Keras is a good option where they can simply add a new module as classes and functions.

4. PyTorch
PyTorch is considered one of the largest machine learning libraries for data scientists and researchers. It helps them in dynamic computational graphs design, fast tensor computations accelerated through GPUs., and various other complex tasks. In neural network algorithms, PyTorch APIs plays an effective role.

The hybrid front-end PyTorch platform is very easy to use allows us transitioning in graph mode for optimizations. For achieving accurate results in asynchronous collective operations and establishing a peer to peer communication it provides a native supports to the users.

With native ONNX (Open Neural Network Exchange. support, one can export models to leverage visualizers, platforms, run-times, and various other resources. The best part of PyTorch it enables a cloud-based environment for easy scaling of resources used in deployment or testing.

It is developed on the concept of another ML library called as Torch. Since the past few years, PyTorch is getting more popular among data scientists due to trending data-centric demands.

5. SciPy
SciPy is another Python library for researchers, developers and data scientists. Do not get confused with the SciPy stack and library. It provides statistics, optimizations, integration and linear algebra packages for computation. It is based on NumPy concept to deal with complex mathematical problems.

It provides numerical routines for optimization and integration. It inherits varieties of sub-modules to choose from. If you have just started your data science career, SciPy can be very helpful to guide you throughout the whole numerical computations thing.

We can see how Python programming is assisting data scientists in crunching and analyzing large and unstructured data sets. Other libraries like TensorFlow, SciKit-Learn, Eli5 are also available to assist them throughout this journey.

6. PANDAS
PANDAS referred as Python Data Analysis Library. PANDAS is another open source Python library for availing high-performance data structures and analysis tools. It is developed over the Numpy package. It contains DataFrame as its main data structure.

With DataFrame you can store and manage data from tables by performing manipulation over rows and columns. Methods like square bracket notations reduce person’s effort in data analysis tasks like square bracket notations. Here, you will get tools for accessing data in-memory data structures performing read and write tasks even if they are in multiple formats such as CSV, SQL, HDFS or excel etc.

7. PyBrain
PyBrain is another powerful modular ML library available in Python. PyBrain stands for Python Based Reinforcement Learning, Artificial Intelligence, and Neural Network Library. For entry-level data scientists, it offers flexible modules and algorithms for advanced research. It has varieties for algorithms for evolution, neural networks, supervised and unsupervised learning.  For real-life tasks, it has emerged as the best tool which is developed across the neural network in the kernel.

8. SciKit-Learn
Scikit-Learn is a simple tool for data analysis and mining-related tasks. It is open-source and licensed under the BSD. Anyone can access or reuse it in various contexts. SciKit is developed over the Numpy, Scipy, and Matplotlib. It is being used for classification, regression and clustering o manage spam, image recognition, drug response, stock pricing, customer segmentation etc. It also allows dimensionality reduction, model selection and pre-processing.

9. Matplotlib
This 2D plotting library of Python is very famous among data scientists for designing varieties of figures in multiple formats which is compatible across their respected platforms. One can easily use it in their Python code, IPython shells or Jupyter notebook, application servers.  With Matplotlib, you can make histograms, plots, bar charts, scatter plots etc.

10. Tensorflow
This open source library was designed by Google to compute data low graphs with the empowered machine learning algorithms. It was designed to fulfill high demand for the training neural networks work. It is not just limited to the scientific computations performed by Google rater it is widely being used in the popular real-world application.

Due to its high performance and flexible architecture the deployment for all CPUs, GPUs or TPUs becomes easy task performing PC server clustering to the edge devices.

11. Seaborn
Seaborn was designed to visualize the complex statistical models. It has the potential to deliver accurate graphs such as heat maps. Seaborn was created on the concept of Matplotlib and somehow it is highly dependent on that. Minor to minor data distributions can be easily visualized through this library which is why it has become familiar among data scientists and developers.

12. Bokeh
Bokeh is one more visualization library for designing interactive plots. Just like the last one, it is also developed on matplotlib. Due to the used data-driven documents (D3.js. support it presents interactive designs in the web browser.

13. Plotly
Let’s talk about the Plotly which is one of the most famous web-based frameworks for data scientists. This toolbox offers designing of visualization models with varieties of APIs supported by multiple programing languages including Python. You can easily use interactive graphics and numerous robust accessible through its main website plot.ly. For using Plotly in your working model you need to set up available API keys properly. The accessible graphics are processed on the server side and once successfully executed they will appear on your browser screen.

14. NLTK
NLTK is pronounced as the Natural Language ToolKit. As per its name, this library is very helpful for accomplishing Natural language processing tasks. Initially, it was developed to promote the teaching models and other NLP enabled research such as the cognitive theory of artificial intelligence and linguistic models etc., which has become a successful resource in its field driving the real world innovations from artificial intelligence.

With NLTK one can perform operations like text tagging, stemming, classifications, regression, tokenization, corpus tree creation, name entities recognition, semantic reasoning, and various other complex AI tasks. Now challenging works requiring large building blocks like semantic analysis and automation or summarization has become an easier task which can be easily completed with NLTK.

15. Gensim
Gensim is an open source Python-based library which allows topic modeling and space vector computations with the implemented varieties of tools. It is compatible with the large texts making efficient operations and their in-memory processing. It uses the NumPy and SciPy modules for providing efficient and easy to handle the environment.

It uses the unstructured digital texts and processes them with the inbuilt algorithms like word2vec, hierarchical Dirichlet processes (HDP), latent Dirichlet allocation (LDA) and latent semantic analysis (LSA).

16. Scrapy

Scrapy is also pronounced as the spider bots. This library is responsible for crawling programs and retrieving of the structured data from the web applications.  This open source library is written in Python. As per the name it was designed for scraping. It is the complete framework with the potential to collect data through APIs and act like a crawler.

Through it, one can write codes, reuse universal programs and create scalable crawlers for their application. Scrapy is created across the Spider class which contains the instructions for a crawler.

17. Statsmodels
This Python library is responsible for providing the data exploration modules with multiple methods to perform statistical analysis and assertions. The use of regression techniques, robust linear models, analysis models, time series and discrete choice model makes it popular among other data science libraries. It has the plotting function for statistical analysis to achieve high-performance outcomes while processing large statistical data sets.

18. Kivy
This open-source Python library provides a natural user interface which can be easily accessed over the Android, iOS, Linux or Windows. It is licensed open source under MIT. The library is very helpful in building mobile apps and multi-touch applications.

Initially, it was developed for Kivy iOS. It avails the elements like the graphics library, extensive support to hardware such as the mouse, keyboard and wide range of widgets. One can also use it as an intermediate language to create custom widgets.

19. PyQt
PyQt is a Python binding toolkit for cross-platform GUI. It is implemented as a Python plugin. PyQt is a free application which is licensed under the GNU General Public License. PyQt have almost 440 classes and more than 6000 functions to make a user’s journey easier. It includes classes for accessing SQL databases, an XML parser, active X controller classes, SVG support, and many more useful resources to reduce user’s challenges.

20. OpenCV
OpenCV is designed for driving growth of the real-time computing application development. It was created by Intel. This open-source platform is licensed under BSD and free to use for anyone. It includes 2D and 3D feature toolkits, object identification algorithms, mobile robotics, face recognition, gesture recognition, motion tracking, segmentation, SFM, AR, boosting, gradient boosting trees, Naive Bayes classifier and many other useful packages.

8
In general, the more trees you use the better get the results. However, the improvement decreases as the number of trees increases, i.e. at a certain point the benefit in prediction performance from learning more trees will be lower than the cost in computation time for learning these additional trees.
Random forests are ensemble methods, and you average over many trees. Similarly, if you want to estimate an average of a real-valued random variable (e.g. the average heigth of a citizen in your country) you can take a sample. The expected variance will decrease as the square root of the sample size, and at a certain point the cost of collecting a larger sample will be higher than the benefit in accuracy obtained from such larger sample.
In your case you observe that in a single experiment on a single test set a forest of 10 trees performs better than a forest of 500 trees. This may be due to statistical variance. If this would happen systematically, I would hypothesize that there is something wrong with the implementation.
Typical values for the number of trees is 10, 30 or 100. I think in only very few practical cases more than 300 trees outweights the cost of learning them (well, except maybe if you have a really huge dataset).

10
Software Engineering / Code Simplicity
« on: April 20, 2019, 12:40:37 AM »
Kindness and Code
It is very easy to think of software development as being an entirely technical activity, where humans don’t really matter and everything is about the computer. However, the opposite is actually true.

Software engineering is fundamentally a human discipline.

Many of the mistakes made over the years in trying to fix software development have been made by focusing purely on the technical aspects of the system without thinking about the fact that it is human beings who write the code. When you see somebody who cares about optimization more than readability of code, when you see somebody who won’t write a comment but will spend all day tweaking their shell scripts to be fewer lines, when you have somebody who can’t communicate but worships small binaries, you’re seeing various symptoms of this problem.

In reality, software systems are written by people. They are read by people, modified by people, understood or not by people. They represent the mind of the developers that wrote them. They are the closest thing to a raw representation of thought that we have on Earth. They are not themselves human, alive, intelligent, emotional, evil, or good. It’s people that have those qualities. Software is used entirely and only to serve people. They are the product of people, and they are usually the product of a group of those people who had to work together, communicate, understand each other, and collaborate effectively. As such, there’s an important point to be made about working with a group of software engineers:

There is no value to being cruel to other people in the development community.

It doesn’t help to be rude to the people that you work with. It doesn’t help to angrily tell them that they are wrong and that they shouldn’t be doing what they are doing. It does help to make sure that the laws of software design are applied, and that people follow a good path in terms of making systems that can be easily read, understood, and maintained. It doesn’t require that you be cruel to do this, though. Sometimes you do have to tell people that they haven’t done the right thing. But you can just be matter of fact about it—you don’t have to get up in their face or attack them personally for it.

For example, let’s say somebody has written a bad piece of code. You have two ways you could comment on this:

“I can’t believe you think this is a good idea. Have you ever read a book on software design? Obviously you don’t do this.”

That’s the rude way—it’s an attack on the person themselves. Another way you could tell them what’s wrong is this:

“This line of code is hard to understand, and this looks like code duplication. Can you refactor this so that it’s clearer?”

In some ways, the key point here is that you’re commenting on the code, and not on the developer. But also, the key point is that you’re not being a jerk. I mean, come on. The first response is obviously rude. Does it make the person want to work with you, want to contribute more code, or want to get better? No. The second response, on the other hand, lets the person know that they’re taking a bad path and that you’re not going to let that bad code into the codebase.

The whole reason that you’re preventing that programmer from submitting bad code has to do with people in the first place. Either it’s about your users or it’s about the other developers who will have to read the system. Usually, it’s about both, since making a more maintainable system is done entirely so that you can keep on helping users effectively. But one way or another, your work as a software engineer has to do with people.

Yes, a lot of people are going to read the code and use the program, and the person whose code you’re reviewing is just one person. So it’s possible to think that you can sacrifice some kindness in the name of making this system good for everybody. Maybe you’re right. But why be rude or cruel when you don’t have to be? Why create that environment on your team that makes people scared of doing the wrong thing, instead of making them happy for doing the right thing?

This extends beyond just code reviews, too. Other software engineers have things to say. You should listen to them, whether you agree or not. Acknowledge their statements politely. Communicate your ideas to them in some constructive fashion.

And look, sometimes people get angry. Be understanding. Sometimes you’re going to get angry too, and you’d probably like your teammates to be understanding when you do.

This might all sound kind of airy-fairy, like some sort of unimportant psychobabble BS. But look. I’m not saying, “Everybody is always right! You should agree with everybody all the time! Don’t ever tell anybody that they are wrong! Nobody ever does anything bad!” No, people are frequently wrong and there are many bad things in the world and in software engineering that you have to say no to. The world is not a good place, always. It’s full of stupid people. Some of those stupid people are your co-workers. But even so, you’re not going to be doing anything effective by being rude to those stupid people. They don’t need your hatred—they need your compassion and your assistance. And most of your co-workers are probably not stupid people. They are probably intelligent, well-meaning individuals who sometimes make mistakes, just like you do. Give them the benefit of the doubt. Work with them, be kind, and make better software as a result.

11
Science and Information / Anroid Database Create
« on: April 18, 2019, 02:03:12 PM »
পিসি,ইন্টারনেট, ইমেইল আইডি এবং এক্সেল ওয়ার্কশীট এ কাজ করার অভিজ্ঞতা

স্টেপ ১ঃ প্রথমেই লগইন করে নিতে হবে নিজের ইমেইল আইডি বা যে কাজের জন্য ডেটাবেজ বানাবো তার জন্য তৈরী ইমেইল আইডিতে।
স্টেপ ২ঃ গুগল ফর্ম এ গিয়ে নিজের ডেটাবেজ এ যে ধরনের বা যে যে টাইপের ডেটা ডেটাবেজ এ রাখবেন তার জন্য কোয়েশ্চন তৈরী করুন। ফর্ম এর জন্য সুন্দর একটি নাম (ডেটাবেজ এর নাম হিসেবে ব্যবহার হবে যেটা) দিন। আপনাদের বুঝার সুবিধার্তে কন্ট্যাক্ট ইনফরমেশন নামের ডাটাবেজ এর জন্য একটি গুগল ফর্ম এর স্ক্রিনশট দিলাম।

স্টেপ ৩ঃ  এরপর ঐ ফর্ম এর লিঙ্ক টি কপি করে কোথাও সংরক্ষন করুন। আওউটপুট স্প্রেডশীট এ গিয়ে ইচ্ছামতো ডিজাইন করুন। এরপর ঐ শীট এর লিংকটিও সংরক্ষণ করুন।
স্টেপ ৪ঃ  এবার এপ্সগেইজার বা অন্য কোন Website Apk Creator দিয়ে দুইটি অ্যাপ ক্রিয়েট করুন যেখানে লিংক হিসেবে এড করবেন পূর্ববর্তীতে সংরক্ষন করা ফর্ম ও শীট এর দুইটি লিঙ্ক।
স্টেপ ৫ঃ  এবার Apk দুইটি Android Phone  এ ডাউনলোড করে মোবাইলে ইন্সটল করুন।
ব্যাস, তৈরী হয়ে গেল আপনার ডাটাবেজ অ্যাপ।

ফর্ম লিংক দিয়ে তৈরী করা অ্যাপে ডাটা এন্ট্রি করুন ইচ্ছামতো এবং তার আউটপুট দেখুন শীট লিঙ্ক দিয়ে তৈরী করা অ্যাপ এ। (অবশ্যই ডাটা কানেকশন/ওয়াইফাই এর মাধ্যমে ইন্টারনেট কানেক্টেড থাকতে হবে)

15
Software Engineering / HOW IS THE JOB OUTLOOK FOR DATA SCIENTISTS?
« on: April 18, 2019, 01:09:30 PM »
Data science is an interdisciplinary area of scientific processes, methods, systems, and algorithms to obtain knowledge or awareness of data in an array of forms. The field combines data analysis, statistics, and machine learning and their associated methods to gain an understanding and evaluate data. It uses theories and procedures drawn from a variety of fields and data scientists are analytical data experts who possess technical abilities to solve multifaceted problems. Data scientists take a large amount of complicated data points, both structured and unstructured, and use their abilities to clean and organize them. They sold business challenges by applying their analytic skills, including industry knowledge, contextual understanding, and doubt of current assumptions. Individuals interested in working in the data science field often wonder about the job outlook for data scientists.

Job Duties of Data Scientists
Data scientists conduct independent research and extra large volumes of data from various internal and external sources. To assess and interpret this data, data scientists implement advanced analytics programs, statistical methods, and machine learning to prepare it for use in modeling. When examining data, data scientists thoroughly clean and condense the information to discard anything that is irrelevant to the task. They look for trends, opportunities, and hidden weaknesses within the data. Data scientists also communication their findings to management and recommend cost effective modifications to current strategies and procedures.

Education and Skills for Data Scientists
To become a data scientist, individuals typically need a master’s degree in data science or related area. The master’s degree program provides a thorough understanding of the field as well as opportunities for internships and networking activities. Aspiring data scientists also commonly gain professional certification to remain competitive in the filed such as the Certified Analytics Professional and Cloudera Certified Professional: CCP Data Engineer-Cloudera. In addition to high quality education, data scientists need analytic problem solving skills, intellectual curiosity, effective communication, and sound knowledge of the industry.

Job Outlook for Data Scientists
As stated by the United States of Labor Statistics, the employment of all computer and information research scientists is expected to rise 19 percent by the year 2026, which is deemed much faster than the average for all professions. About 5,400 new jobs are projected over the decade. As demand for new and improve technology increases in the data science field, the demand for qualified data scientists will rise. The rapid growth in data collection will result in a heightened need for data-mining services.

Salary for Data Scientists
The new increasing demand for data scientists results in competitive starting and continuing salaries. According to Payscale, the average pay for data scientists is about $91,000 per year, with the top 10 percent earning more than $120,000 and the lowest 10 percent making less than $62,000 per year. Actual annual pay varies on an array of factors, including location, industry, employer, skills, experience, and job duties.

Data scientists are vital to a variety of organizations as they are part computer scientist, mathematician, and trend spotter. The increasing popularity of this profession is a reflection of how entities think about big data. These professionals help increase revenue and discover business insights to boost production. For any organization seeking to enrich their business and becoming more data driven, data scientists are the answer.



Pages: [1] 2 3