Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - Tapushe Rabaya Toma

Pages: 1 [2] 3 4 ... 7
Data Mining and Big Data / OLAP (online analytical processing)
« on: January 13, 2019, 12:31:54 PM »
OLAP (online analytical processing) is a computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view. OLAP business intelligence queries often aid in trends analysis, financial reporting, sales forecasting, budgeting and other planning purposes.

For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September and then see a comparison of other product sales in Florida in the same time period.

To Read More:

Data Mining and Big Data / OLAP and data mining: What’s the difference?
« on: January 13, 2019, 12:29:06 PM »
Defining OLAP and data mining

OLAP is a design paradigm, a way to seek information out of the physical data store. OLAP is all about summation. It aggregates information from multiple systems, and stores it in a multi-dimensional format. These could be a star schema, snowflake schema or a hybrid kind of a schema.

Data mines leverage information within and without the organization to aid in answering business questions. They involve ratios and algorithms like decision trees, nearest neighbor classification and mural networks, along with clustering of data.

To Read More:

Data Mining and Big Data / Why Your Big Data Needs to Be Agile
« on: January 13, 2019, 12:00:49 PM »
When we talk about the digital transformation, we often refer to the importance of agile businesses. Enterprises willing to adapt to new technology are better poised to come out on top –but companies aren’t the only things benefiting from agility that allows them to pivot in real time.

Big data, as we know it, is just getting started. The mainstream adoption of the Internet of Things (IoT) has led to massive amounts of data accruing in centers. The more businesses adopt IoT, the sooner they’ll realize that data comes with responsibility. Big data, just like big business, must be agile. But how to move mountains of data, and with agility? That is a heavy question to ponder.

To Read more:

Data Mining and Big Data / The Big Opportunity of Big Data as a Service
« on: January 13, 2019, 11:59:12 AM »
Over the past few years, we’ve seen the as-a-Service (aaS) model emerging as a popular business practice. With the booming trend of moving on-premise functions to the cloud, almost every imaginable business function is being delivered as-a-service. There’s software-as-a-service, infrastructure-as-a-service, data-as-a-service, and the newest members of the clan—marketing-as-a-service, operations-as-a-service, and even desktop-as-a-service have arrived on the scene. Can big data be far behind? No. In fact, many vendors like Amazon, Microsoft, and Google have started offering cloud-based big data services.

Several reports have found that growing business investment in the cloud is pushing overall IT spending into massive figures. While it’s hard to put an exact estimate on the global big data market, we do know one thing for sure—big data technology and the service sector is a multi-million dollar market that’s growing at a rapid pace.

To Read More:

A Gantt chart can be very useful in planning and carrying out a project. There are a number of ways to create a Gantt chart: from pen and paper or whiteboard to very complex software programs.

This article will discuss the five basic steps that are required to create a basic Gantt chart. The examples presented make use of the automated tools in SmartDraw to create a Gantt chart.

Continue Reading:

Software Project Management / Secrets of Successful Project Managers
« on: September 10, 2018, 01:15:30 PM »
Managing a project means facing a lot of challenging issues. There is a lot of effort involved. It begins with creating the right plan, then directing its progress to keep it on track, and finally seeing it through to fruition.

Whether you're a project manager by title or through necessity, there are some things you can do to ensure your project's success. Here are 10 secrets we were able to obtain from some of the world's best project managers.

Continue Reading:

Software Project Management / 10 Icebergs That Will Sink Your Project
« on: September 10, 2018, 01:12:51 PM »
The Titanic was said to be unsinkable. We know what happened.
Many projects seem just as solid at the outset, buoyed by the optimism that naturally comes with new things. But there are a myriad of obstacles that can quickly turn a project into a disaster.
The ten potential icebergs that can sink your ship:
1. Unclear goals
2. Insufficient detail
3. Scope creep
4. Wrong people for the job
5. Accountability issues
6. Inconsistent processes
7. Poor communication
8. Unrealistic deadlines
9. Risk management
10. Stakeholder apathy

Full Article: 

Psychological Support / Take the Stress Out of Deadlines
« on: July 25, 2018, 03:04:48 PM »
Three Simple Steps to Getting Any Project Done on Time
Here are the three steps the world's most successful people use to complete a project of any size on time.

1. Identify the biggest, most important tasks first. Begin by breaking each of them into smaller tasks. Do this until each task can be accomplished in no more than two days. Break up big, vague tasks like "update the website" into a larger number of more specific tasks. This will enable you to estimate, with much greater accuracy, the time you need to complete the entire project.

2. Assign tasks and chart the schedule. Make a Gantt chart—a table showing each task, the person who will do it, the start date and how long it will take. Be realistic about the work load of each person. Spread the tasks out so that a new one doesn't begin until the previous one is complete. If the work of one person has to be complete before a task performed by someone else can start, build this in, too. Be sure to take holidays and weekends into account.

3. Estimate the completion date and monitor progress. The end date of your last task tells you when your project will be complete. Update your chart with the actual dates of completion for each task as the project progresses. This will tell you, at a glance, whether your project is on schedule. If it's not, you will know precisely how far it is behind so you can make better management decisions.

further details:

Faculty Forum / Strategic Planning for Bottom Line Results
« on: July 02, 2018, 04:05:04 PM »
Can a strategic plan truly yield bottom-line results? Or is it just a waste of time?

The answer to both questions can be yes. It's critical to plan for success. More importantly, there are steps to follow to make it happen.

Most businesses don't do any strategic planning. And, of the ones who do, most treat it as nothing more than an academic exercise. This is unfortunate. With some effort and a willingness to make hard decisions, it's possible to create more than just strategy.

It's possible to realize bottom-line results from a well-developed strategic plan that is then put into action and followed through.

Continue Reading:

Faculty Forum / Three Steps to Take Today to Avert Disaster Tomorrow
« on: July 01, 2018, 11:50:44 AM »
Your best insurance policy for the loss of key employees is to document their jobs as thoroughly as possible. This can be done in three steps.

1. Make a list of the regular tasks that each employee does. A flowchart diagram is the most effective and efficient way to do this.
2. For each regular task, carefully document the steps that must be performed. A flowchart is ideal for showing the different steps that occur as the result of an action or decision. Anyone can easily follow the steps as they occur in an orderly, logical progression.
3. Make the information easily accessible to other employees. This makes it much easier to train people in backup roles or to onboard new hires.


Faculty Forum / Three Steps to Better Meetings
« on: June 25, 2018, 11:35:33 AM »
Your meetings will be shorter and more productive if you follow this simple formula.

1. Create a visual agenda ahead of time using a mind map you can edit in real time. Display it on the conference room projector (or on your screen using an online meeting platform such as GoToMeeting™ or WebEx™).

2. Move through the agenda items, noting decisions on the mind map. If action items are needed, decide in the meeting who is responsible. There are a few ways to do this in SmartDraw. You can assign the item to that person using Trello, change the shape of the task so it has a row that can show assignment, or simply add a shape representing the person and drag the task under their name in the hierarchy. Everyone in the meeting sees the action item assigned. There is no room for different interpretations of action taken and the person assigned the task is publicly accountable for completing it.

3. Upon conclusion of the meeting, distribute the action item document, in visual format (which becomes the minutes of the meeting), to each attendee by sending them a link.


Software Engineering / Best Database Certifications 2018
« on: May 09, 2018, 06:49:25 PM »
During the past three decades, we've seen a lot of database platforms come and go, but there's never been any question that database technology can be a crucial component for all kinds of applications and computing tasks.

Database certifications may not be as sexy or bleeding edge as cloud computing, storage or computer forensics. But the reality is that there has been, is, and always will be a need for knowledgeable database professionals at all levels and in many related job roles.

To get a better grasp of the available database certifications, it's useful to group them around specific database-related job roles. In part, this reflects the maturity of database technology, and its integration into most aspects of commercial, scientific and academic computing. As you read about the various database certification programs, keep these job roles in mind:
Database Administrator (DBA): Responsible for installing, configuring and maintaining a database management system (DBMS). Often tied to a specific platform such as Oracle, MySQL, DB2, SQL Server and others.
Database Developer: Works with generic and proprietary APIs to build applications that interact with DBMSs (also platform specific, as with DBA roles).
Database Designer/Database Architect: Researches data requirements for specific applications or users, and designs database structures and application capabilities to match.
Data Analyst/Data Scientist: Responsible for analyzing data from multiple disparate sources to discover previously hidden insight, determine meaning behind the data and make business-specific recommendations.
Data Mining/Business Intelligence (BI) Specialist: Specializes in dissecting, analyzing and reporting on important data streams, such as customer data, supply chain data, transaction data and histories, and others.
Data Warehousing Specialist: Specializes in assembling and analyzing data from multiple operational systems (orders, transactions, supply chain information, customer data and so forth) to establish data history, analyze trends, generate reports and forecasts and support general ad hoc queries.
Careful attention to these database job roles highlight two important kinds of technical issues for would-be database professionals to consider. First, a good general background in relational database management systems, including an understanding of the Structured Query Language (SQL), is a basic prerequisite for all database professionals.

Second, although various efforts to standardize database technology exist, much of the whiz-bang capability that databases and database applications can deliver come from proprietary, vendor-specific technologies. Most serious, heavy-duty database skills and knowledge are tied to specific platforms, including various Oracle products (such as the open source MySQL environment), Microsoft SQL Server, IBM DB2 and more.

It's important to note that NoSQL databases – referred to as "not only SQL" and sometimes "non-relational" – handle many types of data, such as structured, semi-structured, unstructured and polymorphic. NoSQL databases are increasingly used in big data applications, which tend to be associated with certifications for data scientists, data mining/warehousing and business intelligence. Although there is some natural overlap, for the most part, we cover those types of certifications in our annually updated Best Big Data Certifications article.

Before you look at each of our featured certifications in detail, consider their popularity with employers. The results of an informal job search conducted on several high-traffic job boards shows which database certifications employers look for when hiring new employees. The results vary from day to day (and job board to job board), but such numbers provide perspective on database certification demand.


Key-value, document-oriented, column family, graph, relational... Today we seem to have as many kinds of databases as there are kinds of data. While this may make choosing a database harder, it makes choosing the right database easier. Of course, that does require doing your homework. You’ve got to know your databases.

One of the least-understood types of databases out there is the graph database. Designed for working with highly interconnected data, a graph database might be described as more “relational” than a relational database. Graph databases shine when the goal is to capture complex relationships in vast webs of information.
Here is a closer look at what graph databases are, why they’re unlike other databases, and what kinds of data problems they’re built to solve.
Graph database vs. relational database
In a traditional relational or SQL database, the data is organized into tables. Each table records data in a specific format with a fixed number of columns, each column with its own data type (integer, time/date, freeform text, etc.).

This model works best when you’re dealing mainly with data from any one table. It also doesn’t work too badly when you’re aggregating data stored across multiple tables. But that behavior has some notable limits.

Consider a music database, with albums, bands, labels, and performers. If you want to report all the performers that were featured on this album by that band released on these labels—four different tables—you have to explicitly describe those relationships. With a relational database, you accomplish this by way of new data columns (for one-to-one or one-to-many relationships), or new tables (for many-to-many relationships).

This is practical as long as you’re managing a modest number of relationships. If you’re dealing with millions or even billions of relationships—friends of friends of friends, for instance—those queries don’t scale well.
In short, if the relationships between data, not the data itself, are your main concern, then a different kind of database—a graph database—is in order.

Graph database features
The term “graph” comes from the use of the word in mathematics. There it’s used to describe a collection of nodes (or vertices), each containing information (properties), and with labeled relationships (or edges) between the nodes.

A social network is a good example of a graph. The people in the network would be the nodes, the attributes of each person (such as name, age, and so on) would be properties, and the lines connecting the people (with labels such as “friend” or “mother” or “supervisor”) would indicate their relationship.

In a conventional database, queries about relationships can take a long time to process. This is because relationships are implemented with foreign keys and queried by joining tables. As any SQL DBA can tell you, performing joins is expensive, especially when you must sort through large numbers of objects—or, worse, when you must join multiple tables to perform the sorts of indirect (e.g. “friend of a friend”) queries that graph databases excel at.

Graph databases work by storing the relationships along with the data. Because related nodes are physically linked in the database, accessing those relationships is as immediate as accessing the data itself. In other words, instead of calculating the relationship as relational databases must do, graph databases simply read the relationship from storage. Satisfying queries is a simple matter of walking, or “traversing,” the graph. 

A graph database not only stores the relationships between objects in a native way, making queries about relationships fast and easy, but allows you to include different kinds of objects and different kinds of relationships in the graph. Like other NoSQL databases, a graph database is schema-less. Thus, in terms of performance and flexibility, graph databases hew closer to document databases or key-value stores than they do relational or table-oriented databases.

Graph database use cases
Graph databases work best when the data you’re working with is highly connected and should be represented by how it links or refers to other data, typically by way of many-to-many relationships.

Again, a social network is a useful example. Graph databases reduce the amount of work needed to construct and display the data views found in social networks, such as activity feeds, or determining whether or not you might know a given person due to their proximity to other friends you have in the network.

Another application for graph databases is finding patterns of connection in graph data that would be difficult to tease out via other data representations. Fraud detection systems use graph databases to bring to light relationships between entities that might otherwise have been hard to notice.

Similarly, graph databases are a natural fit for applications that manage the relationships or interdependencies between entities. You will often find graph databases behind recommendation engines, content and asset management systems, identity and access management systems, and regulatory compliance and risk management solutions.

Graph database queries
Graph databases—like other NoSQL databases—typically use their own custom query methodology instead of SQL.

One commonly used graph query language is Cypher, originally developed for the Neo4j graph database. Since late 2015 Cypher has been developed as a separate open source project, and a number of other vendors have adopted it as a query system for their products (e.g., SAP HANA).

Here is an example of a Cypher query that returns a search result for everyone who is a friend of Scott:

MATCH (a:Person {name:’Scott’})-[:FRIENDOF]->(b)
The arrow symbol (->) is used in Cypher queries to represent a directed relationship in the graph.

Another common graph query language, Gremlin, was devised for the Apache TinkerPop graph computing framework. Gremlin syntax is similar to that used by some languages’ ORM database access libraries.

Here is an example of a “friends of Scott” query in Gremlin:

Many graph databases have support for Gremlin by way of a library, either built-in or third-party.

Yet another query language is SPARQL. It was originally developed by the W3C to query data stored in the Resource Description Framework (RDF) format for metadata. In other words, SPARQL wasn’t devised for graph database searches, but can be used for them. On the whole, Cypher and Gremlin have been more broadly adopted.

SPARQL queries have some elements reminiscent of SQL, namely SELECT and WHERE clauses, but the rest of the syntax is radically dissimilar. Don’t think of SPARQL as being related to SQL at all, or for that matter to other graph query languages.

Popular graph databases
Because graph databases serve a relatively niche use case, there aren’t nearly as many of them as there are relational databases. On the plus side, that makes the standout products easier to identify and discuss.

Neo4j is easily the most mature (11 years and counting) and best-known of the graph databases for general use. Unlike previous graph database products, it doesn’t use a SQL back-end. Neo4j is a native graph database that was engineered from the inside out to support large graph structures, as in queries that return hundreds of thousands of relations and more.

Neo4j comes in both free open-source and for-pay enterprise editions, with the latter having no restrictions on the size of a dataset (among other features). You can also experiment with Neo4j online by way of its Sandbox, which includes some sample datasets to practice with.

See InfoWorld’s review of Neo4j for more details.

Microsoft Azure Cosmos DB
The Azure Cosmos DB cloud database is an ambitious project. It’s intended to emulate multiple kinds of databases—conventional tables, document-oriented, column family, and graph—all through a single, unified service with a consistent set of APIs.

To that end, a graph database is just one of the various modes Cosmos DB can operate in. It uses the Gremlin query language and API for graph-type queries, and supports the Gremlin console created for Apache TinkerPop as another interface.

Another big selling point of Cosmos DB is that indexing, scaling, and geo-replication are handled automatically in the Azure cloud, without any knob-twiddling on your end. It isn’t clear yet how Microsoft’s all-in-one architecture measures up to native graph databases in terms of performance, but Cosmos DB certainly offers a useful combination of flexibility and scale.

See InfoWorld’s review of Azure Cosmos DB for more details.

JanusGraph was forked from the TitanDB project, and is now under the governance of the Linux Foundation. It uses any of a number of supported back ends—Apache Cassandra, Apache HBase, Google Cloud Bigtable, Oracle BerkeleyDB—to store graph data, supports the Gremlin query language (as well as other elements from the Apache TinkerPop stack), and can also incorporate full-text search by way of the Apache Solr, Apache Lucene, or Elasticsearch projects.

IBM, one of the JanusGraph project’s supporters, offers a hosted version of JanusGraph on IBM Cloud, called Compose for JanusGraph. Like Azure Cosmos DB, Compose for JanusGraph provides autoscaling and high availability, with pricing based on resource usage.

Software Engineering / Faster data is a safer bet for risk exposure
« on: May 09, 2018, 06:24:10 PM »
 “Rather than just adding up scores at the end of a game, we tried using real-time data. For the last five years, we have been doing sport betting.”
FSB now provides a platform for online gambling and casino sites including 188Bet, BlackType online casino and bookmaker Toals in Northern Ireland.

In effect, it offers sports betting as a service. “Sports betting technology is expensive,” says Lawrence. “Data is becoming a bigger part of this cost.”

The speed of data has a direct impact on the company’s bottom line, he says. “With sports betting, we have the problem of data arriving into a platform that needs to be processed quickly. This data has to be processed fast as it will directly affect margins, so there is always a driver to make data processing faster.”

The company runs the open source PostgreSQL database as its main transactional data store. Lawrence says: “As the company grew organically, we chose open source software that we could get up and running quickly with a commercial support contract.”

But given the need for speedy data processing, he points out: “We needed fast complex query support and the ability to read and write data fast and to scale.”

The company has begun using GridGain, an in-memory database built on Apache Ignite as a data cache.

GridGain says its in-memory computing technology enables massive scale-out of data-intensive applications. It promises to dramatically improve transaction times compared with application architectures with disk-based databases.

Huge amounts of event data
At FSB Technology, huge amounts of event data must be updated constantly, and must be immediately available to a vast number of clients. For example, as a game progresses, new betting opportunities emerge. Odds must be calculated and presented to users in real time, and bets and event results must be processed instantly.

“We have progressed from application database architecture to a cache,” says Lawrence. Data is offloaded and kept in synch on a second data tier for fast in-memory database reads. “For the most part, we use GridGain for reading,” he adds. “On the risk side, when a bet is placed, we have to do fast calculations to ensure we are not exposed.”

FSB can pull complex data structures from its database rapidly and reliably. For example, if a better wants to see all current opportunities for betting on a particular sport, GridGain supports a single, fast query that pulls all the current odds for all the current bets for all the current events for that sport.

“GridGain is extremely flexible,” says Lawrence. “We have been able to create a great user experience for a variety of devices, providing the exact information users need when they need it.”

Although FSB started out using GridGain as a data cache, it is now being used as a layer between the database, says Lawrence. However, there has not been a complete shift to in-memory technology.

For instance, GridGain offers transactional support, but this is an area that FSB is not yet looking at. Lawrence is confident the company will make more use of in-memory technology going forward. “Having developers on the team who have knowledge of GridGain is a great way to get an understanding of new technology,” he says.

Private and public clouds
The company operates across both the private and public clouds, using Rackspace and Google’s public cloud, which means it can add capacity dynamically. This allows it to spin up extra GridGain nodes as and when extra data processing is needed.

Lawrence explains: “Generally, we spin up extra nodes at weekends or during a Champions League game.” This enables FSB to support the extra demands on its systems from peaks in betting, which often occur during major sporting events.

“GridGain’s ability to dynamically add and subtract nodes in the cloud has been critical to cost-effectively scaling our business and meeting our performance goals, even during extreme usage spikes, such as the Grand National,” says Lawrence. “And distributing our cluster across multiple datacentres has ensured availability.”

With GridGain, FSB also easily spun up another instance of the in-memory computing platform for a partner that wanted to run on a dedicated instance of the managed service.

In-memory processing is becoming more mainstream. At the end of last year, Stephen Hawking’s Centre for Theoretical Cosmology (Cosmos) announced it would be using HPE’s in-memory technology to further its research into the early universe and black holes. And in July last year, Japanese telco NTT Docomo rolled out SAP’s Hana in-memory database to analyse data from retail outlets and customer call centres in a bid to improve service levels.

Pages: 1 [2] 3 4 ... 7