Recent Posts

Pages: 1 ... 4 5 [6] 7 8 ... 10
51
Are College Professors Still Relevant In The Age Of AI?


A Robot Teacher teaching a bored and confused student.

In 2025 ChatGPT can give you a full length lecture within seconds and TikTok is much more fun than listening to a professor read through a slideshow they haven’t edited in years, so what is the point of showing up to class anyways?

Even for instructors that care about teaching, keeping student’s attention is increasingly challenging from pedagogues at elementary schools to graduate school professors at elite universities as students show up distracted and on their phones. Many are rightfully questioning why they got into the profession in the first place.

AI and the global pandemic have only deepened the problem, and many schools will only continue to rely more on delivering education via new Artificial Intelligence tools to cut down on the rising cost of education, so what are instructors to do when so much is stacked against them?

Some schools have taken drastic measures to eradicate at least part of the problem at its source, banning cell phones in the classroom or during school hours all together, with some U.S. States working to write this into law.

But if students are showing up to class with already depleted dopamine levels from scrolling all morning what else can be done to get their attention back?

Increased usage of technology in the classroom is only likely to exacerbate the issue. Self-paced learning, while convenient, has already proven to have lower completion rates (often falling below below 10%) and in some cases poorer outcomes, especially when specific support systems aren’t put in place and students don’t structure their study time correctly.

There’s also strong empirical evidence to support the need for humans in the classroom. A 2021 study in Frontiers in Psychology found that student motivation is significantly impacted by nonverbal behaviors like eye contact, tone, and body language, leading to increased student attention, engagement, and confidence.

Furthermore, social presence, defined as the feeling of connection with a real person, has proven to improve critical thinking and overall student satisfaction in the learning experience.

It’s safe to assume then that human led instruction is here to stay at least in some form or another, a fair assumption given that especially in a higher education setting students are still likely to pay a premium for access to experts and individualized support.

Forbes Daily: Join over 1 million Forbes Daily subscribers and get our best stories, exclusive reporting and essential analysis of the day’s news in your inbox every weekday.

Bringing curiosity back into the classroom means creating unexpected and delightful opportunities for engagement for students that are otherwise likely to tune out to the same old model of teaching.

Periodically inviting guest speakers that are industry experts or deeply knowledgeable about the topic being taught is a great way to create a pattern interrupt for the student. While it’s important to vet the speaker ahead of time to make sure that their background and insight is interesting enough, experts tend to bring unique insight into the classroom that piques the curiosity of students that are keen on getting a glimpse into what a professional version of their life can look like from a first person perspective.

These sessions can be relatively easy to facilitate if run in a Q&A format with the instructor as the moderator. Students can also be prompted to ask questions throughout the session or the guest can be directed to come ready with certain discussion prompts for the students, further alleviating the work of the instructor.

Case-based learning can also be an effective way to bring practical application to the lessons being taught, something students increasingly crave to ensure what they are learning has utility in the real world.

This can be done in a single classroom session or spread out across a semester, and the practical nature of the material creates opportunities for dynamic classroom activity formats like peer role-playing between the students, and calling on volunteers to role-play with the instructor in front of the room.

Gamifying the learning experience can also help create accountability in the classroom. This can be an analog or digital dashboard that tracks contributions across modalities like speaking, listening, or helping peers. This can be organized as a ranked “Top 10” list to avoid singling out students that are shy or unable to contribute to a particular class.

Other ideas for gamifying the experience can be issuing badges to recognize mastery of skills like “Team Researcher” or “Master Negotiator” with the ability for students to unlock new titles or levels as they demonstrate competency in new areas, or providing students with learning credits in the form of a classroom currency for certain behaviors that can unlock access to introductions to professionals, or 1:1 resume reviews and mock interview sessions.

When it comes to assignments another way to meet students where they are is to incorporate social media mediums like TikToks or Reels as Assignments that allow students to summarize or dramatize a lecture concept in a 30 to 60 second video. Students will be challenged to think about how to create compelling content around class material in a short form video and instructors can even create a competition around which content gets the most organic views.

Learning is not only about downloading concepts to pass a test and graduate with a passable GPA. It’s about creating memorable experiences that can help solidify the information being taught while working to create a well rounded individual that is equipped to make informed decisions about their future and the impact they want to have on the communities they belong to.

As educators it’s our job to continuously experiment with how we connect to students even as technology or cultural trends challenge the effectiveness of how things used to be done. After all, the ability to make a mark on our students is what makes our job more fulfilling.

Source: https://www.forbes.com/sites/sergeirevzin/2025/07/23/are-college-professors-still-relevant-in-the-age-of-ai/
52
জিমে কেন হার্ট অ্যাটাক হয়

‘ওন্ড’ ব্যান্ডের ভোকালিস্ট, বেজিস্ট ও শব্দ প্রকৌশলী এ কে রাতুল গতকাল ২৭ জুলাই জিমে আকস্মিক হৃদ্‌রোগে আক্রান্ত হন। দ্রুততম সময়ে হাসপাতালে নেওয়া হলেও বাঁচানো যায়নি তাঁকে। হৃৎপিণ্ডের সুস্থতার জন্যই তো শরীরচর্চার পরামর্শ দেন চিকিৎসকেরা। তবে কখনো কখনো শরীরচর্চা করতে গিয়েও কারও কারও হার্ট অ্যাটাকের খবর পাওয়া যায়। এমনকি আকস্মিক হার্ট অ্যাটাক কারও মৃত্যুর কারণও হতে পারে। আদতে কি শরীরচর্চায় হার্ট অ্যাটাকের ঝুঁকি বাড়ে?


গতকাল ২৭ জুলাই জিমে আকস্মিক হৃদ্‌রোগে আক্রান্ত হয়ে মারা গেছেন ‘ওন্ড’ ব্যান্ডের ভোকালিস্ট, বেজিস্ট ও শব্দ প্রকৌশলী এ কে রাতুলছবি: ফেসবুক থেকে

শরীরচর্চা করা হলে দেহে রক্তসঞ্চালন বাড়ে। এই রক্তপ্রবাহের জন্য হৃৎপিণ্ডকে কিছুটা বাড়তি কাজ করতে হয়। হৃৎপিণ্ডের গতিও বাড়ে। তাই নিয়মমাফিক শরীরচর্চা করা হলে হৃদ্‌রোগের ঝুঁকি কমে। তবে অল্প কিছু ক্ষেত্রে হৃৎপিণ্ড এই বাড়তি কাজটা করতে গিয়েই মুশকিলে পড়তে পারে। এমনটাই বলছিলেন ধানমন্ডির পপুলার ডায়াগনস্টিক সেন্টারের মেডিসিন কনসালট্যান্ট ডা. সাইফ হোসেন খান।

জিমে কেন হার্ট অ্যাটাক
সাধারণত ভারী ব্যায়াম জিমেই করা হয়। যেকোনো ভারী ব্যায়ামের সময় বা দ্রুতগতিতে ব্যায়াম করার সময় দেহে রক্তপ্রবাহ বাড়াতে গিয়ে হৃৎপিণ্ডকে অনেকটা বেশি কাজ করতে হয়। বাড়তি কাজ করতে হৃৎপিণ্ডের নিজেরও তো রক্তের জোগান চাই। কিছু শারীরিক সমস্যার কারণে হৃৎপিণ্ডের নিজস্ব রক্তপ্রবাহ বাধা পেতে পারে। এমন পরিস্থিতিতে হতে পারে হার্ট অ্যাটাক। জিম ছাড়া অন্য যেকোনো জায়গায় এ ধরনের শরীরচর্চা করলেও এমনটা হতে পারে।

কারা আছেন ঝুঁকিতে
আগে থেকে যাঁদের হৃৎপিণ্ডে কোনো সমস্যা আছে, তাঁদের ক্ষেত্রে এ ধরনের ঝুঁকি বেশি। তেমন কোনো লক্ষণ না থাকায় অনেকে জানতেও পারেন না, হৃৎপিণ্ডের কোনো সমস্যা আছে তাঁর। তা ছাড়া উচ্চ রক্তচাপ, ডায়াবেটিস বা প্রি-ডায়াবেটিসে আক্রান্ত ব্যক্তি এবং ধূমপায়ীরাও আছেন ঝুঁকিতে। অতিরিক্ত ক্যাফেইন এবং এনার্জি ড্রিংক গ্রহণেও এমন ঝুঁকি বাড়তে পারে। পরিবারে হৃদ্‌রোগের ইতিহাস থাকলে ঝুঁকি বাড়ে। কারও কারও ক্ষেত্রে পানিশূন্যতা বা লবণের ঘাটতিও হতে পারে কারণ।

চাই সতর্কতা
অনেকেই কম বয়সে বিভিন্ন দীর্ঘমেয়াদি রোগে আক্রান্ত হন। তবে অনেক ক্ষেত্রে এসব রোগের তেমন কোনো উপসর্গ থাকে না। তাই আপনি নিজেকে সম্পূর্ণ সুস্থ মনে করলেও ভারী ব্যায়াম বা দ্রুতগতির ব্যায়ামের চর্চা শুরুর আগে চিকিৎসকের পরামর্শ নিন। আর দীর্ঘমেয়াদি কোনো রোগে ভুগলে অবশ্যই চিকিৎসকের কাছ থেকে জেনে নেবেন, কী ধরনের ব্যায়াম আপনার জন্য নিরাপদ। এমনকি ব্যায়ামে অভ্যস্ত হলেও ব্যায়ামের সময় নিজের প্রতি একটু খেয়াল রাখুন। খারাপ লাগলে সঙ্গে সঙ্গেই থামিয়ে দিন নিজেকে।

Source: https://forum.daffodilvarsity.edu.bd/index.php?action=post;board=1792.0
53
বর্তমান প্রতিযোগিতামূলক ব্যবসায়িক জগতে শুধু ভালো পণ্য বা সেবা থাকলেই সফল হওয়া যায় না। সফল হতে হলে প্রয়োজন বাজার বিশ্লেষণ, সম্পর্ক গড়ে তোলা, সুযোগ খুঁজে বের করা এবং সেই অনুযায়ী সিদ্ধান্ত নেওয়ার কৌশল। এই কাজগুলো যিনি দক্ষতার সাথে করেন, তিনিই হন একজন বিজনেস ডেভেলপমেন্ট এক্সপার্ট।



কিন্তু প্রশ্ন হলো—আপনাকে একজন সফল বিজনেস ডেভেলপমেন্ট এক্সপার্ট হিসেবে গড়ে তুলতে কী কী দক্ষতা থাকা জরুরি?


🔹 ১. যোগাযোগ দক্ষতা (Communication Skills)
যেকোনো সফল বিজনেস ডেভেলপারকে স্পষ্টভাবে কথা বলতে এবং শুনতে জানতে হয়।
👉 ক্লায়েন্টকে বোঝা, মনের ভাব প্রকাশ, ও পেশাদারভাবে মিটিং পরিচালনা—এসবেই তার দক্ষতা প্রকাশ পায়।

🔹 ২. সম্পর্ক গড়ে তোলার ক্ষমতা (Relationship Building)
বিশ্বাস আর সম্পর্কই ব্যবসার মূল ভিত্তি।
👉 একজন দক্ষ বিজনেস ডেভেলপার জানেন কিভাবে ক্লায়েন্ট, পার্টনার এবং স্টেকহোল্ডারের সাথে দীর্ঘমেয়াদি সম্পর্ক গড়ে তুলতে হয়।

🔹 ৩. বাজার বিশ্লেষণ ও তথ্যভিত্তিক সিদ্ধান্ত (Market Research & Data-driven Thinking)
👉 আপনি যদি ঠিকভাবে বাজার ও প্রতিযোগী বিশ্লেষণ করতে না পারেন, তাহলে নতুন সুযোগ খুঁজে পাওয়া কঠিন।
ডেটা বিশ্লেষণের মাধ্যমে কোন গ্রাহক কোথায়, কী চায়—তা বোঝা অত্যন্ত জরুরি।

🔹 ৪. সেলস ও নেগোশিয়েশন স্কিল (Sales & Negotiation)
👉 অনেক সময় প্রস্তাব দেওয়ার পর ক্লায়েন্ট দ্বিধায় থাকে। সেই জায়গা থেকে চুক্তি পর্যন্ত নিয়ে আসার কাজ একজন বিজনেস ডেভেলপারের। এজন্য দরকার নরম অথচ দৃঢ় নেগোশিয়েশন দক্ষতা।

🔹 ৫. ডিজিটাল প্ল্যাটফর্ম ও টুলস ব্যবহারে দক্ষতা
👉 বর্তমান সময়ে LinkedIn, CRM সফটওয়্যার, ইমেইল মার্কেটিং, ও সোশ্যাল মিডিয়া ব্যবহার না জানলে আপনি পিছিয়ে পড়বেন।
ডিজিটাল টুলস নিয়ে যত জানবেন, আপনার কাজ তত সহজ হবে।

🔹 ৬. সমস্যা সমাধানের মনোভাব (Problem Solving Attitude)
👉 ক্লায়েন্ট বা কোম্পানি যখন সমস্যায় পড়ে, তখন শুধু সমালোচনা নয়, সমাধানের দিক খুঁজে বের করা একজন বিজনেস ডেভেলপমেন্ট এক্সপার্টের বড় গুণ।

🔹 ৭. নেতৃত্ব ও উদ্যোগ নেওয়ার ক্ষমতা (Leadership & Initiative)
👉 সঠিক সময়ে সিদ্ধান্ত নেওয়া, দলকে পথ দেখানো এবং নতুন কিছু করার ঝুঁকি নেওয়ার মানসিকতা থাকতে হবে।

54
Prompts Engineering / ChatGPT Prompts For Businesses !
« Last post by Shamim Ansary on July 21, 2025, 10:11:09 AM »
Top 20 plug-and-play prompts to instantly boost your productivity, marketing, and strategy game.

Here’s what’s inside:

1. Business Planning & Strategy
From turning messy meeting notes into clear action plans to drafting investor-ready proposals, these prompts help streamline decision-making.

2. Marketing & Branding
Generate SEO blog outlines, LinkedIn captions, product names, and 30-day social content calendars tailored to your audience.

3. Communication & Copywriting
Polish your messages, improve website copy, create cold emails, and reword complex topics in CEO-friendly language.

4. Customer Experience
Handle complaints with empathy, build product FAQ sections, and analyze feedback for recurring themes all with ChatGPT.

5. Team Management & Internal Ops
Draft meeting agendas, summarize reports, and share key industry trends to keep your team aligned and informed.

These aren’t just prompts, they’re time saver team-mates for your business !

Copy, paste, and start getting results in minutes!


Source: LinkedIn
55
Commerce / Why Bangladesh should rethink its Crypto ban
« Last post by Imrul Hasan Tusher on July 19, 2025, 12:37:16 PM »
Why Bangladesh should rethink its Crypto ban

The first light over the Buriganga breaks in bronze ripples, catching the hulls of wooden launches that have ferried commerce for centuries-yet on those same decks young traders now scroll price charts for assets their own government has declared forbidden. Since Bangladesh Bank's September 2022 circular re-affirmed that "virtual currencies or assets are not permitted" under the Foreign Exchange Regulation Act of 1947, the nation has rowed against a tide that is swelling elsewhere. Globally the crypto-asset market, once dismissed as a speculative side-show, is again worth more than US $2 trillion, buoyed by the launch of spot Bitcoin ETFs in New York and a blossoming stable-coin economy stretching from São Paulo to Seoul. Ten other countries still enforce full bans, but most peers have shifted to "regulate, not prohibit," persuaded by the economic mathematics of remittances, online labour, and next-generation fintech. For Bangladesh-ranked among the world's ten largest recipients of migrant income-holding that line risks forfeiting hard currency, talent, and future relevance in the digital marketplace.

Consider the diaspora dividend. Official figures show Bangladeshi workers remitted US $23.9 billion in FY 2023-24, a ten-per-cent year-on-year surge that helped families cushion food-price shocks and service rural micro-loans. Yet the World Bank's Remittance Prices Worldwide portal still lists an average corridor cost of ?6 % for South Asia, double the UN Sustainable Development Goal of three per cent. If even one-third of those transfers migrated to audited USD-backed stable-coins carrying a 1.5 % fee, Bangladeshi households could retain roughly US $360 million a year-more than the annual development budgets of several public universities. That money would not sleep under mattresses; it would circulate through grocery stalls in Rangpur, machine-shops in Gazipur, and tuition fees in Mymensingh, widening the consumption base on which VAT and income-tax collections ultimately depend. A total ban diverts such flows either back to costlier conventional wires or deeper into hawala channels invisible to regulators, forfeiting both savings and oversight.

The logic extends to Bangladesh's quietly booming freelance sector. Payoneer's 2025 index ranks the country eighth in global online-labour earnings, with a 26 % growth rate-second only to India among lower-middle-income economies. Yet seasoned developers and graphic artists report net fees of 7-9 % plus two-week delays when withdrawing dollars through legacy payment processors. Many platforms now offer US-dollar stable-coin payouts that settle in minutes for pennies. Denied legal on-ramps, Bangladeshi freelancers move their profiles to addresses in Dubai or Kuala Lumpur, syphoning intellectual property-and tax potential-out of Dhaka's jurisdiction. Licensing a handful of domestic exchanges under strict KYC/AML rules could keep that revenue at home and feed the Central Bank's foreign-exchange coffers.

Those coffers are thinner than policymakers would like. Bangladesh Bank data pinpoints official reserves at US $20.5 billion in May 2025, down from nearly 27 billion a year earlier, despite the introduction of a crawling-peg regime to slow taka depreciation. At the same time the World Bank notes the Bank's interventions drained almost US $4 billion in the first eight months of FY 24. Tokenised T-bill pilots in Singapore and the UAE show how regulated, on-chain short-term securities can widen foreign-investor participation without immediate convertibility risk. Prohibition prevents Bangladesh from even rehearsing such instruments in a sandbox, while neighbours like India and Pakistan study wholesale Central-Bank Digital Currencies (w-CBDCs) that can mesh with private tokens yet preserve sovereign control.

Critics will retort that crypto also invites volatility, scams, and capital flight-and they are right. But the IMF-FSB synthesis paper delivered to G-20 finance ministers in September 2023 outlines twelve policy pillars, from licensing and custody standards to cross-border information-sharing, that transform those risks into manageable parameters rather than existential threats. India provides a regional template: a flat 30 % tax on digital-asset gains curbed froth yet preserved lawful innovation; its exchanges now report granular data directly to the tax authorities. Nigeria, after double-checking money-laundering patterns, reversed its own ban in late 2024, citing the need for "regulatory capture, not regulatory vacuum." Bangladesh, endowed with a technologically literate central bank (its National Blockchain Strategy was drafted back in 2020), is more than capable of enacting similarly tiered safeguards.

The upside of such prudence is tangible: the Chainalysis 2024 Global Crypto Adoption Index lists seven CSAO countries in the top twenty, with India #1, Indonesia #3, Vietnam #5, and the Philippines #8. Each of those economies logged record remittance inflows and hosted venture capital for Web3 start-ups in 2024. Meanwhile Bangladesh's brightest Solidity programmers depart for Bangalore or Dubai, taking with them what economists call "brain-currency." A modest domestic Web3 corridor-ring-fenced wallets, capped retail exposure, mandatory key escrow inside Bangladeshi data centres-could, according to conservative ASEAN multipliers, generate 15,000 skilled jobs and US $800 million in yearly value-added exports by 2030. That is not utopian speculation; it is a slice of a regional pie already baking next door.

Nor must a crypto thaw undermine Bangladesh Bank's own digital-taka ambitions. The Bank's strategic plan for FY 24-26 includes feasibility studies for a retail CBDC designed to streamline subsidy disbursement and clamp down on grey-market dollars. Global evidence suggests CBDCs flourish when they interoperate with carefully licensed private tokens: Brazil's drex, Singapore's Ubin+, China's digital yuan in Hong Kong. A parallel network of supervised stable-coins could let Bangladeshi exporters, e-commerce firms, and migrants swap e-taka for dollar tokens under the central bank's watchful eye-amplifying monetary sovereignty rather than diluting it.

Quantitatively the calculus is stark. Trimming 4.5 percentage points off the average remittance fee preserves ?US $1.1 billion over three years. Capturing a 0.5 % slice of global Web3 venture flows (US $14 billion in 2024) nets another US $70 million in FDI. Add India-style tax on exchange gains and a digital-asset sandbox fee, and the annual fiscal upside comfortably clears US $1 billion-a figure that dwarfs projected enforcement costs of an outright ban, which include digital-forensics outlays, court backlogs, and the intangible but real erosion of investor sentiment.

So what might a uniquely Bangladeshi middle path look like? First, a regulatory sandbox under the Securities and Exchange Commission, capped at US $50 million total volume and open to diaspora remittance corridors. Second, licensed exchanges restricted to approved fiat-backed stable-coins and tokenised T-bills, with customer keys stored in FIPS-certified hardware within Dhaka. Third, graduated tax slabs-seven per cent on first-yearrealised gains, scaling to twenty per cent above a threshold-to fund a sovereign innovation trust. Finally, mandatory on-chain analytics agreements with compliance firms so Bangladesh Bank receives live dashboards of flows, addresses, and risk scores. The tools exist; the question is whether policymakers choose to wield them.

Jibanananda Das dreamed of a Bengal where golden paddy fields soaked up the evening sun; in the twenty-first century those paddies are cross-hatched by fibre-optic lines and microwave towers. Value now moves at the speed of light, and nations that station tollbooths on that traffic will fund their futures; those that erect barricades risk watching prosperity ripple away like a boat they failed to board. Crypto is not an ideological banner but a technological current, as unstoppable as emails once seemed in the fax era. Bangladesh, resilient and ambitious, can steer with prudent sails rather than chain itself to the dock.

Source: https://www.observerbd.com/news/534940
56
One way to optimize an AI agent is to design its architecture with multiple sub-agents to improve accuracy. However, in conversational AI, optimization doesn’t stop there memory becomes even more crucial.

    As your conversation with the AI agent gets longer and deeper, it uses more memory.

This is due to components like previous context storage, tool calling, database searches, and other dependencies your AI agent relies on.

In this blog, we will code and evaluate 9 beginner-to-advanced memory optimization techniques for AI agents.

You will learn how to apply each technique, along with their advantages and drawbacks from simple sequential approaches to advanced, OS-like memory management implementations.
Summary about Techniques

To keep things clear and practical, we will use a simple AI agent throughout the blog. This will help us observe the internal mechanics of each technique and make it easier to scale and implement these strategies in more complex systems.

All the code (theory + notebook) is available in my GitHub repo:

Setting up the Environment

To optimize and test different memory techniques for AI agents, we need to initialize several components before starting the evaluation. But before initializing, we first need to install the necessary Python libraries.

We will need:

    openai: The client library for interacting with the LLM API.
    numpy: For numerical operations, especially with embeddings.
    faiss-cpu: A library from Facebook AI for efficient similarity search, which will power our retrieval memory. It's a perfect in-memory vector database.
    networkx: For creating and managing the knowledge graph in our Graph-Based Memory strategy.
    tiktoken: To accurately count tokens and manage context window limits.

Let’s install these modules.

pip install openai numpy faiss-cpu networkx tiktoken

Now we need to initialize the client module, which will be used to make LLM calls. Let’s do that.

import os
from openai import OpenAI

API_KEY = "YOUR_LLM_API_KEY"

BASE_URL = "https://api.studio.nebius.com/v1/"

client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY
)

print("OpenAI client configured successfully.")

We will be using open-source models through an API provider such as Bnebius or Together AI. Next, we need to import and decide which open-source LLM will be used to create our AI agent.

import tiktoken
import time

GENERATION_MODEL = "meta-llama/Meta-Llama-3.1-8B-Instruct"
EMBEDDING_MODEL = "BAAI/bge-multilingual-gemma2"

For the main tasks, we are using the LLaMA 3.1 8B Instruct model. Some of the optimizations depend on an embedding model, for which we will be using the Gemma-2-BGE multimodal embedding model.

Next, we need to define multiple helpers that will be used throughout this blog.

Creating Helper Functions

To avoid repetitive code and follow good coding practices, we will define three helper functions that will be used throughout this guide:

    generate_text: Generates content based on the system and user prompts passed to the LLM.
    generate_embeddings: Generates embeddings for retrieval-based approaches.
    count_tokens: Counts the total number of tokens for each retrieval-based approach.

Let’s start by coding the first function, generate_text, which will generate text based on the given input prompt.

def generate_text(system_prompt: str, user_prompt: str) -> str:
    """
    Calls the LLM API to generate a text response.

        Args:
        system_prompt: The instruction that defines the AI's role and behavior.
        user_prompt: The user's input to which the AI should respond.

            Returns:
        The generated text content from the AI, or an error message.
    """
   
    response = client.chat.completions.create(
        model=GENERATION_MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
   
    return response.choices[0].message.content

Our generate_text function takes two inputs: a system prompt and a user prompt. Based on our text generation model, LLaMA 3.1 8B, it generates a response using the client module.

Next, let’s code the generate_embeddings function. We have chosen the Gemma-2 model for this purpose, and we will use the same client module to generate embeddings.

def generate_embedding(text: str) -> list[float]:
    """
    Generates a numerical embedding for a given text string using the embedding model.

        Args:
        text: The input string to be converted into an embedding.

            Returns:
        A list of floats representing the embedding vector, or an empty list on error.
    """
   
    response = client.embeddings.create(
        model=EMBEDDING_MODEL,
        input=text
    )
   
    return response.data[0].embedding

Our embedding function returns the embedding of the given input text using the selected Gemma-2 model.

Now, we need one more function that will count tokens based on the entire AI and user chat history. This helps us understand the overall flow and how it has been optimized.

We will use the most common and modern tokenizer used in many LLM architectures, OpenAI cl100k_base, which is a Byte Pair Encoding (BPE) tokenizer.

BPE, in simpler terms, is a tokenization algorithm that efficiently splits text into sub-word units.

"lower", "lowest" → ["low", "er"], ["low", "est"]

So let’s initialize the tokenizer using the tiktoken module:

tokenizer = tiktoken.get_encoding("cl100k_base")

We can now create a function to tokenize the text and count the total number of tokens.

def count_tokens(text: str) -> int:
    """
    Counts the number of tokens in a given string using the pre-loaded tokenizer.

        Args:
        text: The string to be tokenized.

            Returns:
        The integer count of tokens.
    """

    return len(tokenizer.encode(text))

Great! Now that we have created all the helper functions, we can start exploring different techniques to learn and evaluate them.

Creating Foundational Agent and Memory Class

Now we need to create the core design structure of our agent so that it can be used throughout the guide. Regarding memory, there are three important components that play a key role in any AI agent:

    Adding past messages to the AI agent’s memory to make the agent aware of the context.
    Retrieving relevant content that helps the AI agent generate responses.
    Clearing the AI agent’s memory after each strategy has been implemented.

Object-Oriented Programming (OOP) is the best way to build this memory-based feature, so let’s create that.

class BaseMemoryStrategy(abc.ABC):
    """Abstract Base Class for all memory strategies."""

        @abc.abstractmethod
    def add_message(self, user_input: str, ai_response: str):
        """
        An abstract method that must be implemented by subclasses.
        It's responsible for adding a new user-AI interaction to the memory store.
        """
        pass

    @abc.abstractmethod
    def get_context(self, query: str) -> str:
        """
        An abstract method that must be implemented by subclasses.
        It retrieves and formats the relevant context from memory to be sent to the LLM.
        The 'query' parameter allows some strategies (like retrieval) to fetch context
        that is specifically relevant to the user's latest input.
        """
        pass

    @abc.abstractmethod
    def clear(self):
        """
        An abstract method that must be implemented by subclasses.
        It provides a way to reset the memory, which is useful for starting new conversations.
        """
        pass

We are using @abstractmethod, which is a common coding style when subclasses are reused with different implementations. In our case, each strategy (which is a subclass) includes a different kind of implementation, so it is necessary to use abstract methods in the design.

Now, based on the memory state we recently defined and the helper functions we’ve created, we can build our AI agent structure using OOP principles. Let’s code that and then understand the process.

class AIAgent:
    """The main AI Agent class, designed to work with any memory strategy."""

        def __init__(self, memory_strategy: BaseMemoryStrategy, system_prompt: str = "You are a helpful AI assistant."):
        """
        Initializes the agent.

                Args:
            memory_strategy: An instance of a class that inherits from BaseMemoryStrategy.
                             This determines how the agent will remember the conversation.
            system_prompt: The initial instruction given to the LLM to define its persona and task.
        """

        self.memory = memory_strategy
        self.system_prompt = system_prompt
        print(f"Agent initialized with {type(memory_strategy).__name__}.")

    def chat(self, user_input: str):
        """
        Handles a single turn of the conversation.

                Args:
            user_input: The latest message from the user.
        """

        print(f"\n{'='*25} NEW INTERACTION {'='*25}")
        print(f"User > {user_input}")
               
       
        start_time = time.time()
        context = self.memory.get_context(query=user_input)
        retrieval_time = time.time() - start_time
               
       
        full_user_prompt = f"### MEMORY CONTEXT\n{context}\n\n### CURRENT REQUEST\n{user_input}"

               
       
        prompt_tokens = count_tokens(self.system_prompt + full_user_prompt)
        print("\n--- Agent Debug Info ---")
        print(f"Memory Retrieval Time: {retrieval_time:.4f} seconds")
        print(f"Estimated Prompt Tokens: {prompt_tokens}")
        print(f"\n[Full Prompt Sent to LLM]:\n---\nSYSTEM: {self.system_prompt}\nUSER: {full_user_prompt}\n---")
               
       
        start_time = time.time()
        ai_response = generate_text(self.system_prompt, full_user_prompt)
        generation_time = time.time() - start_time
             
       
        self.memory.add_message(user_input, ai_response)

               
        print(f"\nAgent > {ai_response}")
        print(f"(LLM Generation Time: {generation_time:.4f} seconds)")
        print(f"{'='*70}")

So, our agent is based on 6 simple steps.

    First it retrieves the context from memory based on the strategy we will be using, during the process how much time it takes and so.
    Then it merges the retrieved memory context with the current user input, preparing it as a complete prompt for the LLM.
    Then it prints some debug info, things like how many tokens the prompt might use and how long context retrieval took.
    Then it sends the full prompt (system + user + context) to the LLM and waits for a response.
    Then it updates the memory with this new interaction, so it’s available for future context.
    And finally, it shows the AI’s response along with how long it took to generate, wrapping up this turn of the conversation.

Great! Now that we have coded every component, we can start understanding and implementing each of the memory optimization techniques.

Problem with Sequential Optimization Approach

The very first optimization approach is the most basic and simplest, commonly used by many developers. It was one of the earliest methods to manage conversation history, often used by early chatbots.

This method involves adding each new message to a running log and feeding the entire conversation back to the model every time. It creates a linear chain of memory, preserving everything that has been said so far. Let’s visualize it.

Sequential Approach

Sequential approach works like this …

    User starts a conversation with the AI agent.
    The agent responds.
    This user-AI interaction (a “turn”) is saved as a single block of text.
    For the next turn, the agent takes the entire history (Turn 1 + Turn 2 + Turn 3…) and combines it with the new user query.
    This massive block of text is sent to the LLM to generate the next response.

Using our Memory class, we can now implement the sequential optimization approach. Let's code that.


class SequentialMemory(BaseMemoryStrategy):
    def __init__(self):
        """Initializes the memory with an empty list to store conversation history."""
        self.history = []

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new user-AI interaction to the history.
        Each interaction is stored as two dictionary entries in the list.
        """
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": ai_response})

    def get_context(self, query: str) -> str:
        """
        Retrieves the entire conversation history and formats it into a single
        string to be used as context for the LLM. The 'query' parameter is ignored
        as this strategy always returns the full history.
        """
       
        return "\n".join([f"{turn['role'].capitalize()}: {turn['content']}" for turn in self.history])

    def clear(self):
        """Resets the conversation history by clearing the list."""
        self.history = []
        print("Sequential memory cleared.")

Now you might understand what our base Memory class is doing here. Our subclasses (each approach) will implement the same abstract methods that we define throughout the guide.

Let’s quickly go over the code to understand how it works.

    __init__(self): Initializes an empty self.history list to store the conversation.
    add_message(...): Adds the user's input and AI's response to the history.
    get_context(...): Formats and joins the history into a single "Role: Content" string as context.
    clear(): Resets the history for a new conversation.

We can initialize the memory class and build the AI agent on top of it.

sequential_memory = SequentialMemory()

agent = AIAgent(memory_strategy=sequential_memory)

To test our sequential approach, we need to create a multi-turn chat conversation. Let’s do that.


agent.chat("Hi there! My name is Sam.")

agent.chat("I'm interested in learning about space exploration.")

agent.chat("What was the first thing I told you?")


==== NEW INTERACTION ====
User: Hi there! My name is Sam. 
Bot: Hello Sam! Nice to meet you. What brings you here today? 

>>>> Tokens: 23 | Response Time: 2.25s

==== NEW INTERACTION ====

User: I am interested in learning about space exploration. 
Bot: Awesome! Are you curious about:
- Mars missions 
- Space agencies 
- Private companies (e.g., SpaceX) 
- Space tourism 
- Search for alien life?
...

>>>> Tokens: 92 | Response Time: 4.46s

==== NEW INTERACTION ====
User: What was the first thing I told you? 
Bot: You said, "Hi there! My name is Sam." 
...

>>>> Tokens: 378 | Response Time: 0.52s

The conversation is pretty smooth, but if you pay attention to the token calculation, you’ll notice that it gets bigger and bigger after each turn. Our agent isn’t dependent on any external tool that would significantly increase the token size, so this growth is purely due to the sequential accumulation of messages.

While the sequential approach is easy to implement, it has a major drawback:

    The bigger your agent conversation gets, the more expensive the token cost becomes, so a sequential approach is quite costly.

Sliding Window Approach

To avoid the issue of a large context, the next approach we will focus on is the sliding window approach, where our agent doesn’t need to remember all previous messages, but only the context from a certain number of recent messages.

Instead of retaining the entire conversation history, the agent keeps only the most recent N messages as context. As new messages arrive, the oldest ones are dropped, and the window slides forward.
Sliding Window Approach (Created by )

The process is simple:

    Define a fixed window size, say N = 2 turns.
    The first two turns fill up the window.
    When the third turn happens, the very first turn is pushed out of the window to make space.
    The context sent to the LLM is only what’s currently inside the window.

Now, we can implement the Sliding Window Memory class.


class SlidingWindowMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 4):
        """
        Initializes the memory with a deque of a fixed size.

                Args:
            window_size: The number of conversational turns to keep in memory.
                         A single turn consists of one user message and one AI response.
        """
           
       
        self.history = deque(maxlen=window_size)

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new conversational turn to the history. If the deque is full,
        the oldest turn is automatically removed.
        """
       
       
        self.history.append([
            {"role": "user", "content": user_input},
            {"role": "assistant", "content": ai_response}
        ])

    def get_context(self, query: str) -> str:
        """
        Retrieves the conversation history currently within the window and
        formats it into a single string. The 'query' parameter is ignored.
        """
       
        context_list = []
       
        for turn in self.history:
           
            for message in turn:
               
                context_list.append(f"{message['role'].capitalize()}: {message['content']}")
       
        return "\n".join(context_list)

Our sequential and sliding memory classes are quite similar. The key difference is that we’re adding a window to our context. Let’s quickly go through the code.

    __init__(self, window_size=2): Sets up a deque with a fixed size, enabling automatic sliding of the context window.
    add_message(...): Adds a new turn, old entries are dropped when the deque is full.
    get_context(...): Builds the context from only the messages within the current sliding window.

Let’s initialize the sliding window state memory and build the AI agent on top of it.


sliding_memory = SlidingWindowMemory(window_size=2)

agent = AIAgent(memory_strategy=sliding_memory)

We are using a small window size of 2, which means the agent will remember only the last two messages. To test this optimization approach, we need a multi-turn conversation. So, let’s first try a straightforward conversation.


agent.chat("My name is Priya and I'm a software developer.")

agent.chat("I work primarily with Python and cloud technologies.")


agent.chat("My favorite hobby is hiking.")


==== NEW INTERACTION ====
User: My name is Priya and I am a software developer. 
Bot: Nice to meet you, Priya! What can I assist you with today?

>>>> Tokens: 27 | Response Time: 1.10s

==== NEW INTERACTION ====
User: I work primarily with Python and cloud technologies. 
Bot: That is great! Given your expertise...

>>>> Tokens: 81 | Response Time: 1.40s

==== NEW INTERACTION ====
User: My favorite hobby is hiking.
Bot: It seems we had a nice conversation about your background...

>>>> Tokens: 167 | Response Time: 1.59s

The conversation is quite similar and simple, just like we saw earlier in the sequential approach. However, now if the user asks the agent something that doesn’t exist within the sliding window, let’s observe the expected output.


agent.chat("What is my name?")


==== NEW INTERACTION ====
User: What is my name?
Bot: I apologize, but I dont have access to your name from our recent
conversation. Could you please remind me?

>>>> Tokens: 197 | Response Time: 0.60s

The AI agent couldn’t answer the question because the relevant context was outside the sliding window. However, we did see a reduction in token count due to this optimization.

The downside is clear, important context may be lost if the user refers back to earlier information. The sliding window is a crucial factor to consider and should be tailored based on the specific type of AI agent we are building.

Summarization Based Optimization

As we’ve seen earlier, the sequential approach suffers from a gigantic context issue, while the sliding window approach risks losing important context.

Therefore, there’s a need for an approach that can address both problems, by compacting the context without losing essential information. This can be achieved through summarization.
Summarization Approach (Created by )

Instead of simply dropping old messages, this strategy periodically uses the LLM itself to create a running summary of the conversation. It works like this:

    Recent messages are stored in a temporary holding area, called a “buffer”.
    Once this buffer reaches a certain size (a “threshold”), the agent pauses and triggers a special action.
    It sends the contents of the buffer, along with the previous summary, to the LLM with a specific instruction: “Create a new, updated summary that incorporates these recent messages”.
    The LLM generates a new, consolidated summary. This new summary replaces the old one, and the buffer is cleared.

Let’s implement the summarization optimization approach and observe how it affects the agent’s performance.


class SummarizationMemory(BaseMemoryStrategy):
    def __init__(self, summary_threshold: int = 4):
        """
        Initializes the summarization memory.

                Args:
            summary_threshold: The number of messages (user + AI) to accumulate in the
                             buffer before triggering a summarization.
        """

       
        self.running_summary = ""
       
        self.buffer = []
       
        self.summary_threshold = summary_threshold

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new user-AI interaction to the buffer. If the buffer size
        reaches the threshold, it triggers the memory consolidation process.
        """
       
        self.buffer.append({"role": "user", "content": user_input})
        self.buffer.append({"role": "assistant", "content": ai_response})

       
        if len(self.buffer) >= self.summary_threshold:
           
            self._consolidate_memory()

    def _consolidate_memory(self):
        """
        Uses the LLM to summarize the contents of the buffer and merge it
        with the existing running summary.
        """
        print("\n--- [Memory Consolidation Triggered] ---")
       
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])

                               
        summarization_prompt = (
            f"You are a summarization expert. Your task is to create a concise summary of a conversation. "
            f"Combine the 'Previous Summary' with the 'New Conversation' into a single, updated summary. "
            f"Capture all key facts, names, and decisions.\n\n"
            f"### Previous Summary:\n{self.running_summary}\n\n"
            f"### New Conversation:\n{buffer_text}\n\n"
            f"### Updated Summary:"
        )

               
        new_summary = generate_text("You are an expert summarization engine.", summarization_prompt)
       
        self.running_summary = new_summary
       
        self.buffer = []
        print(f"--- [New Summary: '{self.running_summary}'] ---")

    def get_context(self, query: str) -> str:
        """
        Constructs the context to be sent to the LLM. It combines the long-term
        running summary with the short-term buffer of recent messages.
        The 'query' parameter is ignored as this strategy provides a general context.
        """
       
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])
       
        return f"### Summary of Past Conversation:\n{self.running_summary}\n\n### Recent Messages:\n{buffer_text}"

Our summarization memory component is a bit different compared to the previous approaches. Let’s break down and understand the component we’ve just coded.

    __init__(...): Sets up an empty running_summary string and an empty buffer list.
    add_message(...): Adds messages to the buffer. If the buffer size meets our summary_threshold, it calls the private _consolidate_memory method.
    _consolidate_memory(): This is the new, important part. It formats the buffer content and the existing summary into a special prompt, asks the LLM to create a new summary, updates self.running_summary, and clears the buffer.
    get_context(...): Provides the LLM with both the long-term summary and the short-term buffer, giving it a complete picture of the conversation.

Let’s initialize the summary memory component and build the AI agent on top of it.


summarization_memory = SummarizationMemory(summary_threshold=4)

agent = AIAgent(memory_strategy=summarization_memory)

The initialization is done in the same way as we saw earlier. We’ve set the summary threshold to 4, which means after every 2 turns, a summary will be generated and passed as context to the AI agent, instead of the entire or sliding window conversation history.

This aligns with the core goal of the summarization approach, saving tokens while retaining important information.

Let’s test this approach and evaluate how efficient it is in terms of token usage and preserving relevant context.



agent.chat("I'm starting a new company called 'Innovatech'. Our focus is on sustainable energy.")


agent.chat("Our first product will be a smart solar panel, codenamed 'Project Helios'.")


==== NEW INTERACTION ====
User: I am starting a new company called 'Innovatech'. Ou...
Bot: Congratulations on starting Innovatech! Focusing o ...
>>>> Tokens: 45 | Response Time: 2.55s

==== NEW INTERACTION ====
User: Our first product will be a smart solar panel....
--- [Memory Consolidation Triggered] ---
--- [New Summary: The user started a compan ...
Bot: That is exciting news about  ....

>>>> Tokens: 204 | Response Time: 3.58s

So far, we’ve had two basic conversation turns. Since we’ve set the summary generator parameter to 2, a summary will now be generated for those previous turns.

Let’s proceed with the next turn and observe the impact on token usage.


agent.chat("The marketing budget is set at $50,000.")


agent.chat("What is the name of my company and its first product?")


...

==== NEW INTERACTION ====
User: What is the name of my company and its first product?
Bot: Your company is called 'Innovatech' and its first product is codenamed 'Project Helios'.

>>>> Tokens: 147 | Response Time: 1.05s

Did you notice that in our fourth conversation, the token count dropped to nearly half of what we saw in the sequential and sliding window approaches? That’s the biggest advantage of the summarization approach, it greatly reduces token usage.

However, for it to be truly effective, your summarization prompts need to be carefully crafted to ensure they capture the most important details.

The main downside is that critical information can still be lost in the summarization process. For example, if you continue a conversation for up to 40 turns and include numeric or factual details, such as balance sheet data, there’s a risk that earlier key info (like the gross sales mentioned in the 4th turn) may not appear in the summary anymore.

Let’s take a look at this example, where you had a 40-turn conversation with the AI agent and included several numeric details.

The summary used as context failed to include the gross sales figure from the 4th conversation, which is a clear limitation of this approach.



agent.chat("what was the gross sales of our company in the fiscal year?")


...

==== NEW INTERACTION ====
User: what was the gross sales of our company in the fiscal year?
Bot: I am sorry but I do not have that information. Could you please provide the gross sales figure for the fiscal year?

>>>> Tokens: 1532 | Response Time: 2.831s

You can see that although the summarized information uses fewer tokens, the answer quality and accuracy can decrease significantly or even drop to zero because of problematic context being passed to the AI agent.

This highlights the importance of creating a sub-agent dedicated to fact-checking the LLM’s responses. Such a sub-agent can verify factual accuracy and help make the overall agent more reliable and powerful.

Retrieval Based Memory

This is the most powerful strategy used in many AI agent use cases: RAG-based AI agents. As we saw earlier, previous approaches reduce token usage but risk losing relevant context. RAG, however, is different it retrieves relevant context based on the current user query.

The context is stored in a database, where embedding models play a crucial role by transforming text into vector representations that make retrieval efficient.

Let’s visualize how this process works.

RAG Based Memory

Let’s understand the workflow of RAG-based memory:

    Every time a new interaction happens, it’s not just stored in a list, it’s saved as a “document” in a specialized database. We also generate a numerical representation of this document’s meaning, called an embedding, and store it.
    When the user sends a new message, the agent first converts this new message into an embedding as well.
    It then uses this query embedding to perform a similarity search against all the document embeddings stored in its memory database.
    The system retrieves the top k most semantically relevant documents (e.g., the 3 most similar past conversation turns).
    Finally, only these highly relevant, retrieved documents are injected into the LLM’s context window.

We will be using FAISS for vector storage in this approach. Let’s code this memory component.


import numpy as np
import faiss


class RetrievalMemory(BaseMemoryStrategy):
    def __init__(self, k: int = 2, embedding_dim: int = 3584):
        """
        Initializes the retrieval memory system.

                Args:
            k: The number of top relevant documents to retrieve for a given query.
            embedding_dim: The dimension of the vectors generated by the embedding model.
                           For BAAI/bge-multilingual-gemma2, this is 3584.
        """

       
        self.k = k
       
        self.embedding_dim = embedding_dim
       
        self.documents = []
       
       
        self.index = faiss.IndexFlatL2(self.embedding_dim)

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new conversational turn to the memory. Each part of the turn (user
        input and AI response) is embedded and indexed separately for granular retrieval.
        """
       
       
       
        docs_to_add = [
            f"User said: {user_input}",
            f"AI responded: {ai_response}"
        ]
        for doc in docs_to_add:
           
            embedding = generate_embedding(doc)
           
            if embedding:
               
               
                self.documents.append(doc)
               
                vector = np.array([embedding], dtype='float32')
               
                self.index.add(vector)

    def get_context(self, query: str) -> str:
        """
        Finds the k most relevant documents from memory based on semantic
        similarity to the user's query.
        """
       
        if self.index.ntotal == 0:
            return "No information in memory yet."

               
        query_embedding = generate_embedding(query)
        if not query_embedding:
            return "Could not process query for retrieval."

               
        query_vector = np.array([query_embedding], dtype='float32')

               
       
        distances, indices = self.index.search(query_vector, self.k)

               
       
        retrieved_docs = [self.documents for i in indices[0] if i != -1]

                if not retrieved_docs:
            return "Could not find any relevant information in memory."

               
        return "### Relevant Information Retrieved from Memory:\n" + "\n---\n".join(retrieved_docs)

Let’s go through what’s happening in the code.

    __init__(...): We initialize a list for our text documents and a faiss.IndexFlatL2 to store and search our vectors. We must specify the embedding_dim, which is the size of the vectors our embedding model produces.
    add_message(...): For each turn, we generate an embedding for both the user and AI messages, add the text to our documents list, and add the corresponding vector to our FAISS index.
    get_context(...): This is important. It embeds the user's query, uses self.index.search to find the k most similar vectors, and then uses their indices to pull the original text from our documents list. This retrieved text becomes the context.

As before, we initialize our memory state and build the AI agent using it.


retrieval_memory = RetrievalMemory(k=2)

agent = AIAgent(memory_strategy=retrieval_memory)

We are setting k = 2, which means we fetch only two relevant chunks related to the user's query. When dealing with larger datasets, we typically set k to a higher value such as 5, 7, or even more especially if the chunk size is very small.

Let's test our AI agent with this setup.



agent.chat("I am planning a vacation to Japan for next spring.")

agent.chat("For my software project, I'm using the React framework for the frontend.")

agent.chat("I want to visit Tokyo and Kyoto while I'm on my trip.")

agent.chat("The backend of my project will be built with Django.")


...

==== NEW INTERACTION ====
User: I want to visit Tokyo and Kyoto while I'm on my trip.
Bot: You're interested in visiting Tokyo and Kyoto...

...

These are just basic conversations that we typically run with an AI agent. Now, let’s try a newer conversation based on past information and see how well the relevant context is retrieved and how optimized the token usage is in that scenario.


agent.chat("What cities am I planning to visit on my vacation?")


==== NEW INTERACTION ====
User: What cities am I planning to visit on my vacation?
--- Agent Debug Info ---
[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: MEMORY CONTEXT
Relevant Information Retrieved from Memory:
User said: I want to visit Tokyo and Kyoto while I am on my trip.
---
User said: I am planning a vacation to Japan for next spring.
...

Bot: You are planning to visit Tokyo and Kyoto while on your vacation to Japan next spring.

>>>> Tokens: 65 | Response Time: 0.53s

You can see that the relevant context has been successfully fetched, and the token count is extremely low because we’re retrieving only the pertinent information.

The choice of embedding model and the vector storage database plays a crucial role here. Optimizing that database is another important step to ensure fast and accurate retrieval. FAISS is a popular choice because it offers these capabilities.

However, the downside is that this approach is more complex to implement than it seems. As the database grows larger, the AI agent’s complexity increases significantly.

You’ll likely need parallel query processing and other optimization techniques to maintain performance. Despite these challenges, this approach remains the industry standard for optimizing AI agents.

Memory Augmented Transformers

Beyond these core strategies, AI systems are implementing even more sophisticated approaches that push the boundaries of what’s possible.

We can understand this technique through an example, imagine a regular AI like a student with just one small notepad. They can only write a little bit at a time. So in a long test, they have to erase old notes to make room for new ones.

Now, memory-augmented transformers are like giving that student a bunch of sticky notes. The notepad still handles the current work, but the sticky notes help them save key info from earlier.

    For example: you’re designing a video game with an AI. Early on, you say you want it to be set in space with no violence. Normally, that would get forgotten after a long talk. But with memory, the AI writes “space setting, no violence” on a sticky note.
    Later, when you ask, “What characters would fit our game?”, it checks the note and gives ideas that match your original vision, even hours later.
    It’s like having a smart helper who remembers the important stuff without needing you to repeat it.

Let’s visualize this:

Memory Augmented Transformers

We will create a memory class that:

    Uses a SlidingWindowMemory for recent chat.
    After each turn, uses the LLM to act as a “fact extractor.” It will analyze the conversation and decide if it contains a core fact, preference, or decision.
    If an important fact is found, it’s stored as a memory token (a concise string) in a separate list.
    The final context provided to the agent is a combination of the recent chat window and all the persistent memory tokens.



class MemoryAugmentedMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2):
        """
        Initializes the memory-augmented system.

                Args:
            window_size: The number of recent turns to keep in the short-term memory.
        "
""
       
        self.recent_memory = SlidingWindowMemory(window_size=window_size)
       
        self.memory_tokens = []

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds the latest turn to recent memory and then uses an LLM call to decide
        if a new, persistent memory token should be created from this interaction.
        """
       
        self.recent_memory.add_message(user_input, ai_response)

               
       
        fact_extraction_prompt = (
            f"Analyze the following conversation turn. Does it contain a core fact, preference, or decision that should be remembered long-term? "
            f"Examples include user preferences ('I hate flying'), key decisions ('The budget is $1000'), or important facts ('My user ID is 12345').\n\n"
            f"Conversation Turn:\nUser: {user_input}\nAI: {ai_response}\n\n"
            f"If it contains such a fact, state the fact concisely in one sentence. Otherwise, respond with 'No important fact.'"
        )

               
        extracted_fact = generate_text("You are a fact-extraction expert.", fact_extraction_prompt)

               
        if "no important fact" not in extracted_fact.lower():
           
            print(f"--- [Memory Augmentation: New memory token created: '{extracted_fact}'] ---")
            self.memory_tokens.append(extracted_fact)

    def get_context(self, query: str) -> str:
        """
        Constructs the context by combining the short-term recent conversation
        with the list of all long-term, persistent memory tokens.
        """
       
        recent_context = self.recent_memory.get_context(query)
       
        memory_token_context = "\n".join([f"- {token}" for token in self.memory_tokens])

               
        return f"### Key Memory Tokens (Long-Term Facts):\n{memory_token_context}\n\n### Recent Conversation:\n{recent_context}"

Our augmented class might be confusing at first glance, but let’s understand this:

    __init__(...): Initializes both a SlidingWindowMemory instance and an empty list for memory_tokens.
    add_message(...): This method now has two jobs. It adds the turn to the sliding window and makes an extra LLM call to see if a key fact should be extracted and added to self.memory_tokens.
    get_context(...): Constructs a rich prompt by combining the "sticky notes" (memory_tokens) with the recent chat history from the sliding window.

Let’s initialize this memory-augmented state and AI agent.



mem_aug_memory = MemoryAugmentedMemory(window_size=2)

agent = AIAgent(memory_strategy=mem_aug_memory)

We are using a window size of 2, just as we set previously. Now, we can simply test this approach using a multi-turn chat conversation and see how well it performs.



agent.chat("Please remember this for all future interactions: I am severely allergic to peanuts.")


agent.chat("Okay, let's talk about recipes. What's a good idea for dinner tonight?")



agent.chat("That sounds good. What about a dessert option?")


==== NEW INTERACTION ====
User: Please remember this for all future interactions: I am severely allergic to peanuts.
--- [Memory Augmentation: New memory token created: 'The user has a severe allergy to peanuts.'] ---
Bot: I have taken note of your long-term fact: You are severely allergic to peanuts. I will keep this in mind...

>>>> Tokens: 45 | Response Time: 1.32s

...

The conversation is the same as with an ordinary AI agent. Now, let’s test the memory-augmented technique by including a new method.



agent.chat("Could you suggest a Thai green curry recipe? Please ensure it's safe for me.")


==== NEW INTERACTION ====
User: Could you suggest a Thai green curry recipe? Please ensure it is safe for me.
--- Agent Debug Info ---
[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: MEMORY CONTEXT
Key Memory Tokens (Long-Term Facts):
- The user has a severe allergy to peanuts.

...

Recent Conversation:
User: Okay, lets talk about recipes...
...

Bot: Of course. Given your peanut allergy, it is very important to be careful with Thai cuisine as many recipes use peanuts or peanut oil. Here is a peanut-free Thai green curry recipe...

>>>> Tokens: 712 | Response Time: 6.45s

This approach can be deeply evaluated on a larger dataset in a better way since the transformer model used here requires many confidential solutions; this approach might be a better option.

It is a more complex and expensive strategy due to the extra LLM calls for fact extraction, but its ability to retain critical information over long, evolving conversations makes it incredibly powerful for building truly reliable and intelligent personal assistants.

Hierarchical Optimization for Multi-tasks

So far, we have treated memory as a single system. But what if we could build an agent that thinks more like a human, with different types of memory for different purposes?

This is the idea behind Hierarchical Memory. It’s a composite strategy that combines multiple, simpler memory types into a layered system, creating a more sophisticated and organized mind for our agent.

Think about how you remember things:

    Working Memory: The last few sentences someone said to you. It’s fast, but fleeting.
    Short-Term Memory: The main points from a meeting you had this morning. You can recall them easily for a few hours.
    Long-Term Memory: Your home address or a critical fact you learned years ago. It’s durable and deeply ingrained.

Hierarchical Optimization

Hierarchical approach works like this …

    It starts with capturing the user message into working memory.
    Then it checks if the information is important enough to promote to long-term memory.
    After that, promoted content is stored in a retrieval memory for future use.
    On new queries, it searches long-term memory for relevant context.
    Finally, it injects relevant memories into context to generate better responses.

Let’s build this component.


class HierarchicalMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2, k: int = 2, embedding_dim: int = 3584):
        """
        Initializes the hierarchical memory system.

                Args:
            window_size: The size of the short-term working memory (in turns).
            k: The number of documents to retrieve from long-term memory.
            embedding_dim: The dimension of the embedding vectors for long-term memory.
        """

        print("Initializing Hierarchical Memory...")
       
        self.working_memory = SlidingWindowMemory(window_size=window_size)
       
        self.long_term_memory = RetrievalMemory(k=k, embedding_dim=embedding_dim)
       
        self.promotion_keywords = ["remember", "rule", "preference", "always", "never", "allergic"]

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a message to working memory and conditionally promotes it to long-term
        memory based on its content.
        """
       
        self.working_memory.add_message(user_input, ai_response)

               
       
        if any(keyword in user_input.lower() for keyword in self.promotion_keywords):
            print(f"--- [Hierarchical Memory: Promoting message to long-term storage.] ---")
           
            self.long_term_memory.add_message(user_input, ai_response)

    def get_context(self, query: str) -> str:
        """
        Constructs a rich context by combining relevant information from both
        the long-term and short-term memory layers.
        """
       
        working_context = self.working_memory.get_context(query)
       
        long_term_context = self.long_term_memory.get_context(query)

               
        return f"### Retrieved Long-Term Memories:\n{long_term_context}\n\n### Recent Conversation (Working Memory):\n{working_context}"

So …

    __init__(...): Initializes an instance of SlidingWindowMemory and an instance of RetrievalMemory. It also defines a list of promotion_keywords.
    add_message(...): Adds every message to the short-term working_memory. It then checks if the user_input contains any of the special keywords. If it does, the message is also added to the long_term_memory.
    get_context(...): This is where the hierarchy comes together. It fetches context from both memory systems and combines them into one rich prompt, giving the LLM both recent conversational flow and relevant deep facts.

Let’s now initialize the memory component and AI agent.


hierarchical_memory = HierarchicalMemory()

agent = AIAgent(memory_strategy=hierarchical_memory)

We can now create a multi-turn chat conversation for this technique.



agent.chat("Please remember my User ID is AX-7890.")

agent.chat("Let's chat about the weather. It's very sunny today.")


agent.chat("I'm planning to go for a walk later.")



agent.chat("I need to log into my account, can you remind me of my ID?")

We are testing this with a scenario where the user provides an important piece of information (a User ID) using a keyword (“remember”).

Then, we now have a few turns of unrelated chat. In the last tern we are asking the agent to recall the ID. let’s look at the output of the ai agent.


==== NEW INTERACTION ====
User: Please remember my User ID is AX-7890.
--- [Hierarchical Memory: Promoting message to long-term storage.] ---
Bot: You have provided your User ID as AX-7890, which has been stored in long-term memory for future reference.

...

==== NEW INTERACTION ====
User: I need to log into my account, can you remind me of my ID?
--- Agent Debug Info ---
[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER:


User said: Please remember my User ID is AX-7890.
...

User: Let's chat about the weather...
User: I'm planning to go for a walk later...

Bot: Your User ID is AX-7890. You can use this to log into your account. Is there anything else I can assist you with?

>>>> Tokens: 452 | Response Time: 2.06s

As you can see, the agent successfully combines different memory types. It uses the fast working memory for the flow of conversation but correctly queries its deep, long-term memory to retrieve the critical User ID when asked.

This hybrid approach is a powerful pattern for building sophisticated agents.

Graph Based Optimization

So far, our memory has stored information as chunks of text, whether it’s the full conversation, a summary, or a retrieved document. But what if we could teach our agent to understand the relationships between different pieces of information? This is the leap we take with Graph-Based Memory.

This strategy moves beyond storing unstructured text and represents information as a knowledge graph.

A knowledge graph consists of:

    Nodes (or Entities): These are the "things" in our conversation, like people (Clara), companies (FutureScape), or concepts (Project Odyssey).
    Edges (or Relations): These are the connections that describe how the nodes relate to each other, like works_for, is_based_in, or manages.

The result is a structured, web-like memory. Instead of a simple fact like "Clara works for FutureScape," the agent stores a connection: (Clara) --[works_for]--> (FutureScape).

Graph Based Approach

This is incredibly powerful for answering complex queries that require reasoning about connections. The main challenge is populating the graph from unstructured conversation.

For this, we can use a powerful technique: using the LLM itself as a tool to extract structured (Subject, Relation, Object) triples from the text.

For our implementation, we’ll use the networkx library to build and manage our graph. The core of this strategy will be a helper method, _extract_triples, that calls the LLM with a specific prompt to convert conversational text into structured (Subject, Relation, Object) data.


class GraphMemory(BaseMemoryStrategy):
    def __init__(self):
        """Initializes the memory with an empty NetworkX directed graph."""
       
        self.graph = nx.DiGraph()

    def _extract_triples(self, text: str) -> list[tuple[str, str, str]]:
        """
        Uses the LLM to extract knowledge triples (Subject, Relation, Object) from a given text.
        This is a form of "LLM as a Tool" where the model's language understanding is
        used to create structured data.
        """
        print("--- [Graph Memory: Attempting to extract triples from text.] ---")
       
       
        extraction_prompt = (
            f"You are a knowledge extraction engine. Your task is to extract Subject-Relation-Object triples from the given text. "
            f"Format your output strictly as a list of Python tuples. For example: [('Sam', 'works_for', 'Innovatech'), ('Innovatech', 'focuses_on', 'Energy')]. "
            f"If no triples are found, return an empty list [].\n\n"
            f"Text to analyze:\n\"""{text}\""""
        )

               
        response_text = generate_text("You are an expert knowledge graph extractor.", extraction_prompt)

               
        try:
                       
           
            found_triples = re.findall(r"\(['\"](.*?)['\"],\s*['\"](.*?)['\"],\s*['\"](.*?)['\"]\)", response_text)
            print(f"--- [Graph Memory: Extracted triples: {found_triples}] ---")
            return found_triples
        except Exception as e:
           
            print(f"Could not parse triples from LLM response: {e}")
            return []

    def add_message(self, user_input: str, ai_response: str):
        """Extracts triples from the latest conversation turn and adds them to the knowledge graph."""
       
        full_text = f"User: {user_input}\nAI: {ai_response}"
       
        triples = self._extract_triples(full_text)
       
        for subject, relation, obj in triples:
           
           
           
            self.graph.add_edge(subject.strip(), obj.strip(), relation=relation.strip())

    def get_context(self, query: str) -> str:
        """
        Retrieves context by finding entities from the query in the graph and
        returning all their known relationships.
        """
       
        if not self.graph.nodes:
            return "The knowledge graph is empty."

               
       
       
        query_entities = [word.capitalize() for word in query.replace('?','').split() if word.capitalize() in self.graph.nodes]

               
        if not query_entities:
            return "No relevant entities from your query were found in the knowledge graph."

                context_parts = []
       
        for entity in set(query_entities):
           
            for u, v, data in self.graph.out_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")
           
            for u, v, data in self.graph.in_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")

               
        return "### Facts Retrieved from Knowledge Graph:\n" + "\n".join(sorted(list(set(context_parts))))

    _extract_triples(…): This is the engine of the strategy. It sends the conversation text to the LLM with a highly specific prompt, asking it to return structured data.
    add_message(…): This method orchestrates the process. It calls _extract_triples on the new conversation turn and then adds the resulting subject-relation-object pairs as edges to the networkx graph.
    get_context(…): This performs a simple search. It looks for entities from the user's query that exist as nodes in the graph. If it finds any, it retrieves all known relationships for those entities and provides them as structured context.

Let’s see if our agent can build a mental map of a scenario and then use it to answer a question that requires connecting the dots.

You’ll see the [Graph Memory: Extracted triples] log after each turn, showing how the agent is building its knowledge base in real-time

The final context won’t be conversational text but rather a structured list of facts retrieved from the graph.


graph_memory = GraphMemory()
agent = AIAgent(memory_strategy=graph_memory)


agent.chat("A person named Clara works for a company called 'FutureScape'.")
agent.chat("FutureScape is based in Berlin.")
agent.chat("Clara's main project is named 'Odyssey'.")



agent.chat("Tell me about Clara's project.")

The output we get after this multi-turn chat is:

############ OUTPUT ############
==== NEW INTERACTION ====
User: A person named Clara works for a company called 'FutureScape'.
--- [Graph Memory: Attempting to extract triples from text.] ---
--- [Graph Memory: Extracted triples: [('Clara', 'works_for', 'FutureScape')]] ---
Bot: Understood. I've added the fact that Clara works for FutureScape to my knowledge graph.

...

==== NEW INTERACTION ====
User: Clara's main project is named 'Odyssey'.
--- [Graph Memory: Attempting to extract triples from text.] ---
--- [Graph Memory: Extracted triples: [('Clara', 'manages_project', 'Odyssey')]] ---
Bot: Got it. I've noted that Clara's main project is Odyssey.

==== NEW INTERACTION ====
User: Tell me about Clara's project.
--- Agent Debug Info ---
[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
### Facts Retrieved from Knowledge Graph:
Clara --[manages_project]--> Odyssey
Clara --[works_for]--> FutureScape
...

Bot: Based on my knowledge graph, Clara's main project is named 'Odyssey', and Clara works for the company FutureScape.

>>>> Tokens: 78 | Response Time: 1.5s

The agent didn’t just find a sentence containing “Clara” and “project”, it navigated its internal graph to present all known facts related to the entities in the query.

    This opens the door to building highly knowledgeable expert agents.

Compression & Consolidation Memory

We have seen that summarization is a good way to manage long conversations, but what if we could be even more aggressive in cutting down token usage? This is where Compression & Consolidation Memory comes into play. It’s like summarization’s more intense sibling.

Instead of creating a narrative summary that tries to preserve the conversational flow, the goal here is to distill each piece of information into its most dense, factual representation.

Think of it like converting a long, verbose paragraph from a meeting transcript into a single, concise bullet point.

Compression Approach

The process is straightforward:

    After each conversational turn (user input + AI response), the agent sends this text to the LLM.
    It uses a specific prompt that asks the LLM to act like a “data compression engine”.
    The LLM’s task is to re-write the turn as a single, essential statement, stripping out all conversational fluff like greetings, politeness, and filler words.
    This highly compressed fact is then stored in a simple list.

The memory of the agent becomes a lean, efficient list of core facts, which can be significantly more token-efficient than even a narrative summary.


class CompressionMemory(BaseMemoryStrategy):
    def __init__(self):
        """Initializes the memory with an empty list to store compressed facts."""
        self.compressed_facts = []

    def add_message(self, user_input: str, ai_response: str):
        """Uses the LLM to compress the latest turn into a concise factual statement."""
       
        text_to_compress = f"User: {user_input}\nAI: {ai_response}"

               
       
        compression_prompt = (
            f"You are a data compression engine. Your task is to distill the following text into its most essential, factual statement. "
            f"Be as concise as possible, removing all conversational fluff. For example, 'User asked about my name and I, the AI, responded that my name is an AI assistant' should become 'User asked for AI's name.'\n\n"
            f"Text to compress:\n\"{text_to_compress}\""
        )

               
        compressed_fact = generate_text("You are an expert data compressor.", compression_prompt)
        print(f"--- [Compression Memory: New fact stored: '{compressed_fact}'] ---")
       
        self.compressed_facts.append(compressed_fact)

    def get_context(self, query: str) -> str:
        """Returns the list of all compressed facts, formatted as a bulleted list."""
        if not self.compressed_facts:
            return "No compressed facts in memory."

               
        return "### Compressed Factual Memory:\n- " + "\n- ".join(self.compressed_facts)

    __init__(...): Simply creates an empty list, self.compressed_facts.
    add_message(...): The core logic. It takes the latest turn, sends it to the LLM with the compression prompt, and stores the concise result.
    get_context(...): Formats the list of compressed facts into a clean, bulleted list to be used as context.

Let’s test this strategy with a simple planning conversation.

After each turn, you will see the [Compression Memory: New fact stored] log, showing the very short, compressed version of the interaction. Notice how the final context sent to the LLM is just a terse list of facts, which is highly token-efficient.


compression_memory = CompressionMemory()
agent = AIAgent(memory_strategy=compression_memory)


agent.chat("Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.")
agent.chat("The date is confirmed for October 26th, 2025.")
agent.chat("Could you please summarize the key details for the conference plan?")

Once we perform this multi-turn chat conversation, we can take a look at the output. Let’s do that.

############ OUTPUT ############
==== NEW INTERACTION ====
User: Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.
--- [Compression Memory: New fact stored: 'The conference venue has been decided as the 'Metropolitan Convention Center'.'] ---
Bot: Great! The Metropolitan Convention Center is an excellent choice. What's next on our planning list?

...

==== NEW INTERACTION ====
User: The date is confirmed for October 26th, 2025.
--- [Compression Memory: New fact stored: 'The conference date is confirmed for October 26th, 2025.'] ---
Bot: Perfect, I've noted the date.

...

==== NEW INTERACTION ====
User: Could you please summarize the key details for the conference plan?
--- Agent Debug Info ---
[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
### Compressed Factual Memory:
- The conference venue has been decided as the 'Metropolitan Convention Center'.
- The conference date is confirmed for October 26th, 2025.
...

Bot: Of course. Based on my notes, here are the key details for the conference plan:
- **Venue:** Metropolitan Convention Center
- **Date:** October 26th, 2025

>>>> Tokens: 48 | Response Time: 1.2s

As you can see, this strategy is extremely effective at reducing token count while preserving core facts. It’s a great choice for applications where long-term factual recall is needed on a tight token budget.

However, for conversations that rely heavily on nuance and personality, this aggressive compression might be too much.

OS-Like Memory Management

What if we could build a memory system for our agent that works just like the memory in your computer?

This advanced concept borrows directly from how a computer’s Operating System (OS) manages RAM and a hard disk.

Let’s use an analogy:

    RAM (Random Access Memory): This is the super-fast memory your computer uses for active programs. It’s expensive and you don’t have a lot of it. For our agent, the LLM’s context window is its RAM — it’s fast to access but very limited in size.
    Hard Disk (or SSD): This is your computer’s long-term storage. It’s much larger and cheaper than RAM, but also slower to access. For our agent, this can be an external database or a simple file where we store old conversation history.

OS Like Memory Management

This memory strategy works by intelligently moving information between these two tiers:

    Active Memory (RAM): The most recent conversation turns are kept here, in a small, fast-access buffer.
    Passive Memory (Disk): When the active memory is full, the oldest information is moved out to the passive, long-term storage. This is called “paging out.”
    Page Fault: When the user asks a question that requires information not currently in the active memory, a “page fault” occurs.
    The system must then go to its passive storage, find the relevant information, and load it back into the active context for the LLM to use. This is called “paging in.”

Our simulation will create an active_memory (a deque, like a sliding window) and a passive_memory (a dictionary). When the active memory is full, we'll page out the oldest turn.

To page in, we will use a simple keyword search to simulate a retrieval from passive memory.



class OSMemory(BaseMemoryStrategy):
    def __init__(self, ram_size: int = 2):
        """
        Initializes the OS-like memory system.

        Args:
            ram_size: The maximum number of conversational turns to keep in active memory (RAM).
        """

        self.ram_size = ram_size
       
        self.active_memory = deque()
       
        self.passive_memory = {}
       
        self.turn_count = 0

    def add_message(self, user_input: str, ai_response: str):
        """Adds a turn to active memory, paging out the oldest turn to passive memory if RAM is full."""
        turn_id = self.turn_count
        turn_data = f"User: {user_input}\nAI: {ai_response}"

               
        if len(self.active_memory) >= self.ram_size:
           
            lru_turn_id, lru_turn_data = self.active_memory.popleft()
           
            self.passive_memory[lru_turn_id] = lru_turn_data
            print(f"--- [OS Memory: Paging out Turn {lru_turn_id} to passive storage.] ---")

               
        self.active_memory.append((turn_id, turn_data))
        self.turn_count += 1

    def get_context(self, query: str) -> str:
        """Provides RAM context and simulates a 'page fault' to pull from passive memory if needed."""
       
        active_context = "\n".join([data for _, data in self.active_memory])

               
       
        paged_in_context = ""
        for turn_id, data in self.passive_memory.items():
            if any(word in data.lower() for word in query.lower().split() if len(word) > 3):
                paged_in_context += f"\n(Paged in from Turn {turn_id}): {data}"
                print(f"--- [OS Memory: Page fault! Paging in Turn {turn_id} from passive storage.] ---")

               
        return f"### Active Memory (RAM):\n{active_context}\n\n### Paged-In from Passive Memory (Disk):\n{paged_in_context}"

    def clear(self):
        """Clears both active and passive memory stores."""
        self.active_memory.clear()
        self.passive_memory = {}
        self.turn_count = 0
        print("OS-like memory cleared.")

    __init__(...): Sets up an active_memory deque with a fixed size and an empty passive_memory dictionary.
    add_message(...): Adds new turns to active_memory. If active_memory is full, it calls popleft() to get the oldest turn and moves it into the passive_memory dictionary. This is "paging out."
    get_context(...): Always includes the active_memory. It then performs a search on passive_memory. If it finds a match for the query, it "pages in" that data by adding it to the context.

Let’s run a scenario where the agent is told a secret code. We’ll then have enough conversation to force that secret code to be “paged out” to passive memory. Finally, we’ll ask for the code and see if the agent can trigger a “page fault” to retrieve it.
You’ll see two key logs:

    [Paging out Turn 0] after the third turn
    [Page fault! Paging in Turn 0] when we ask the final question


os_memory = OSMemory(ram_size=2)
agent = AIAgent(memory_strategy=os_memory)


agent.chat("The secret launch code is 'Orion-Delta-7'.")
agent.chat("The weather for the launch looks clear.")
agent.chat("The launch window opens at 0400 Zulu.")


agent.chat("I need to confirm the launch code.")

As shown previously, we can now run this multi-turn chat conversation with our AI agent. This is the output we get.

############ OUTPUT ############
...

==== NEW INTERACTION ====
User: The launch window opens at 0400 Zulu.
--- [OS Memory: Paging out Turn 0 to passive storage.] ---
Bot: PROCESSING NEW LAUNCH WINDOW INFORMATION...

...

==== NEW INTERACTION ====
User: I need to confirm the launch code.
--- [OS Memory: Page fault! Paging in Turn 0 from passive storage.] ---
--- Agent Debug Info ---
[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
### Active Memory (RAM):
User: The weather for the launch looks clear.
...
User: The launch window opens at 0400 Zulu.
...
### Paged-In from Passive Memory (Disk):
(Paged in from Turn 0): User: The secret launch code is 'Orion-Delta-7'.
...

Bot: CONFIRMING LAUNCH CODE: The stored secret launch code is 'Orion-Delta-7'.

>>>> Tokens: 539 | Response Time: 2.56s

It works perfectly! The agent successfully moved the old, “cold” data to passive storage and then intelligently retrieved it only when the query demanded it.

This is a conceptually powerful model for building large-scale systems with virtually limitless memory while keeping the active context small and fast.

Choosing the Right Strategy

We have gone through nine distinct memory optimization strategies, from the simple to the highly complex. There is no single “best” strategy, the right choice is a careful balance of your agent’s needs, your budget, and your engineering resources.

    Let’s understand when to choose what?

    For simple, short-lived bots: Sequential or Sliding Window are perfect. They are easy to implement and get the job done.
    For long, creative conversations: Summarization is a great choice to maintain the general flow without a massive token overhead.
    For agents needing precise, long-term recall: Retrieval-Based memory is the industry standard. It’s powerful, scalable, and the foundation of most RAG applications.
    For highly reliable personal assistants: Memory-Augmented or Hierarchical approaches provide a robust way to separate critical facts from conversational chatter.
    For expert systems and knowledge bases: Graph-Based memory is unparalleled in its ability to reason about relationships between data points.[/li][/list]

The most powerful agents in production often use hybrid approaches, combining these techniques. You might use a hierarchical system where the long-term memory is a combination of both a vector database and a knowledge graph.

The key is to start with a clear understanding of what you need your agent to remember, for how long, and with what level of precision. By mastering these memory strategies, you can move beyond building simple chatbots and start creating truly intelligent agents that learn, remember, and perform better over time.

Source

57
I've always been a tinkerer. If I weren't, there's almost no chance I'd be an entrepreneur.

When I released my first product in college, my goal wasn't to make money — it was to build something for the sake of it. I saw a problem and decided to see if I could create a solution.

Turns out, I could. Not everything I've built has worked out the way I wanted it to, but that's okay. The tinkerer mindset doesn't require a 100 percent success rate. You might think that my love of experimenting would have been tempered once my business grew. But actually, I've only become more firm in my conviction that great things come from those who tinker.

Even better? Recent leaps in AI capabilities have only made tinkering easier. Here's why.

Why experimentation is essential

If there's one trait every founder needs, it's a willingness to experiment. Great products aren't born fully formed — they're shaped by trial, error, feedback and iteration.

When I launched Jotform, I wasn't trying to build a company. I was trying to solve a problem. That curiosity led to our first tagline, "The Easiest Form Builder." I obsessed over usability and kept tweaking the product until it felt effortless to use. That mindset — build, test, improve — has guided every version since.

I often tell the founders I mentor: You don't need to get it perfect, you just need to get it in front of people. The feedback you get will tell you what to fix, what to double down on and what to scrap.

My 50/50 rule — spending half your time on product and half on growth — is built on the same principle. You're constantly experimenting on two fronts: what you're building and how you're getting it into users' hands. It's a push-pull dynamic that inherently requires trial and error.
Why AI is a tinkerer's dream

Here's the thing about tinkering: It doesn't work under duress.

Today, experimentation is easier and more accessible than ever thanks to AI. In the past, it was extremely difficult to carve out the time and space to be creative, because who has several uninterrupted hours just to play around with a project that may ultimately yield nothing? For me, early mornings and late nights were the golden times for working on my startup, when I didn't have to focus on my day job or any other obligations nagging for my attention.

For many people, those precious off-hours are still the ticket to unlocking creative thinking. But instead of wasting them on exasperating tasks like debugging code, designing a UI or writing copy from scratch, you can offload those responsibilities to an AI assistant. Want to build a landing page, translate it and generate five headline variations? That's now a 30-minute exercise, not a full weekend.

That kind of efficiency is a game-changer. It lowers the cost of experimentation, and more importantly, it removes the friction between idea and execution. You can move straight from "what if?" to "let's find out," which is exactly what tinkering is all about.

Amplifying creativity

There's a misconception that AI will do all the work for you. It won't. AI, at least not yet, cannot replicate human creativity and ingenuity. What it will do is eliminate the bottlenecks that keep you from doing your best work.

Recently, I returned from an eight-month break from my business. I'd had my third child, and I wanted to take the opportunity to spend time with my family. Once back in the office, I realized I didn't want to return to the way I'd been working before, getting pulled in several directions at once and being too stretched thin to focus on what I cared about.

Instead, I decided to dramatically limit the areas of my business I would focus on. Recently, that's meant working with our architect to design a new office space. It's something I enjoy, but couldn't fully commit to previously thanks to a pileup of other distractions.

In the past, I might have had to let it go — just because I wanted to be involved didn't mean I'd have the bandwidth to do it. It was a project that interested me, but didn't require my participation. That's the thing about tinkering — most of it isn't strictly necessary.

Since I've returned, I've been able to focus on blueprints and layout concepts for uninterrupted stretches of time. How?

One reason is that I have an executive team that has been able to take over many of the day-to-day functions that previously absorbed my attention. The second is because I've deputized AI to take on some of my most annoying, time-consuming busywork. For example, I've refined my already-effective email filtering technique even further with the help of an AI agent, which autonomously sorts and in some cases, even responds to routine queries so I don't have to. That means less time fighting the onslaught of emails, more time investing my energy where it counts.

My goal isn't to have AI figure out window placements for me, make hiring decisions or determine the strategic direction of my company. Instead, it's to clear my plate of the time-consuming tasks that have distracted me from what I want to do.

For entrepreneurs, AI has afforded us more of the most valuable resource we have: the space to tinker. And in my experience, that's where everything worthwhile happens.

Source

By Aytekin Tank
Entrepreneur Leadership Network® VIP
Entrepreneur; Founder and CEO, Jotform
58
Agentic AI / Agentic AI Architecture Framework for Enterprises
« Last post by Imrul Hasan Tusher on July 14, 2025, 10:38:43 AM »
Agentic AI Architecture Framework for Enterprises

Key Takeaways

To deploy agentic AI responsibly and effectively in the enterprise, organizations must progress through a three-tier architecture: Foundation Tier, Workflow Tier, and Autonomous Tier where trust, governance, and transparency precede autonomy.

First, build trust by establishing foundation and governance through tool orchestration, reasoning transparency, and data lifecycle patterns. Next, workflow delivers automation through five core patterns (Prompt Chaining, Routing, Parallelization, Evaluator-Optimizer, Orchestrator-Workers).
In the final phase, autonomous enables goal-directed planning. Deploying Constrained Autonomy Zones with validation checkpoints rather than full autonomous systems enables AI flexibility within governance boundaries while maintaining human oversight.

Prioritize explainability and continuous monitoring over performance, as enterprise success depends on stakeholder trust and regulatory compliance rather than technical capability.

Customize by industry. Financial services need bias testing and human checkpoints. Healthcare requires personal health information (PHI) and Fast Health Interoperability Resources (FHIR) compliance. Retail needs fairness monitoring. Manufacturing integrates safety and workforce impact assessment.
AI systems are transitioning from a reactive, input/output model to a new generation that actively reasons, plans, and executes actions autonomously. This represents the emergence of agentic AI, fundamentally transforming how organizations approach intelligent automation.

Yet deploying agentic systems in enterprise environments requires more than adopting the latest LLM models or vibe-coding techniques . Success demands architectural patterns that balance cutting-edge capabilities with organizational realities: governance requirements, audit trails, security protocols, and ethical accountability.

Organizations successfully deploying agentic systems share a common insight; they prioritize simple, composable architectures over complex frameworks, effectively managing complexity while controlling costs and maintaining performance standards.

Agentic systems operate across a capability spectrum. At one end, workflows orchestrate LLMs through predefined execution paths with deterministic outcomes. At the other end, autonomous agents dynamically determine their own approaches and tool usage.

The critical decision point lies in understanding when predictability and control take precedence versus when flexibility and autonomous decision-making deliver greater value. This understanding leads to a fundamental principle: start with the simplest effective solution, adding complexity only when clear business value justifies the additional operational overhead and risk.

Recent implementation-focused guidance from Anthropic's agentic patterns provides valuable tactical approaches for building specific AI workflows. The referenced article addresses the foundational question that precedes implementation: How would an enterprise architect comprehensive agentic AI systems that balance capability with governance? Our focus on architectural patterns establishes the strategic framework that guides implementation decisions across the entire enterprise AI ecosystem.

Three-Tier Framework

Enterprise deployment of agentic AI creates an inherent tension between AI autonomy and organizational governance requirements. Our Analysis of successful MVPs and on-going production implementations across multiple industries reveals three distinct architectural tiers, each representing different trade-offs between capability and control while anticipating emerging regulatory frameworks like the EU AI Act and others coming.


Enterprise Agentic AI Architecture Three Tier Framework

These tiers form a systematic maturity progression, so organizations can build competency and stakeholder trust incrementally before advancing to more sophisticated implementations.

Foundation Tier: Establishing Controlled Intelligence

The Foundation Tier creates the essential infrastructure for enterprise agentic AI deployment. These patterns deliver intelligent automation while maintaining strict operational controls, establishing the governance framework required for production systems where auditability, security, and ethical compliance are non-negotiable.


Tier 1: Establishing Controlled Intelligence

Tool Orchestration with Enterprise Security forms the cornerstone of this approach. Rather than granting broad system access, this pattern creates secure gateways between AI systems and enterprise applications and infrastructure. Implementation includes role-based permissions, adversarial input detection, supply chain validation, and behavioral monitoring.

API gateways equipped with authentication frameworks and threat detection capabilities control all AI models and tool interactions, while circuit breakers automatically prevent cascade failures and maintain system availability through graceful degradation.

The monitoring infrastructure at this level proves critical for enterprise adoption. Organizations must track API costs, token usage, and security events from the outset. Many enterprises discover post-deployment that inadequate cost tracking led to budget overruns or that insufficient security monitoring exposed them to novel attack vectors.

Reasoning Transparency with Continuous Evaluation addresses the accountability requirements that distinguish enterprise AI from experimental deployments. This pattern structures AI decision-making into auditable processes with integrated bias detection, hallucination monitoring, and confidence scoring.

Automated quality assessment continuously tracks reasoning consistency while capturing decision rationale, alternative approaches, and demographic impact indicators. This capability proves essential for regulatory compliance and model risk management.

In enterprise environments, explainability consistently outweighs raw performance in determining deployment success. Systems that clearly demonstrate their reasoning processes earn broader organizational adoption than more accurate but opaque alternatives.

Data Lifecycle Governance with Ethical Safeguards completes the foundational framework by implementing systematic information protection. This pattern manages data through classification schemes, encryption protocols, purpose limitation, and automated consent management.

Public information remains accessible while personally identifiable information (PII) and PHI receive differential privacy protection. Highly sensitive data undergoes pseudonymization techniques that facilitate compliance verification without exposing underlying information.

Automated retention enforcement is critical to long-term success. Manual processes for right-to-deletion and data lifecycle management cannot scale with enterprise AI deployments. Systems must think about data relationships without retaining sensitive information in active memory, ensuring both functional capability and regulatory compliance.

Together, these foundation tier patterns help lay the governance infrastructure with embedded security monitoring, continuous quality assessment, and ethical safeguards. This is essential to enable all subsequent AI capabilities that we cover next

Workflow Tier: Implementing Structured Autonomy
Once the Foundation Tier has established trust and demonstrated value, organizations can advance to Workflow Tier implementations where meaningful business transformation begins. In this tier, orchestration patterns manage multiple AI interactions across flexible execution paths, while preserving the determinism and oversight needed for complex business operations.


Tier 2: Implementing Structured Autonomy

Here, Constrained Autonomy Zones with Change Management bridges foundational controls with business process automation. This approach defines secure operational boundaries where AI systems can operate independently while leveraging the cost controls, performance monitoring, and governance frameworks established in the Foundation Tier.

Workflows tier incorporate mandatory checkpoints for validation, compliance verification, and human oversight, with automated escalation procedures that account for organizational change resistance patterns. Between these checkpoints, AI systems optimize their approaches, retry failed operations, and adapt to changing conditions within predefined constraints for cost, ethics, and performance.

The key insight gained is to perform gradual autonomy expansion based on measured outcomes and demonstrated user confidence, while tracking adoption rates alongside technical performance metrics.

Workflow Orchestration with Comprehensive Monitoring represents the operational core of this tier, decomposing complex business processes into coordinated components with real-time quality assessment. This orchestration enables independent optimization of individual steps while ensuring proper sequencing, error handling, and bias detection throughout the complete workflow.

Five essential orchestration patterns emerge within this workflow tier :

Prompt Chaining extends the reasoning transparency from Foundation Tier across multi-step task sequences. Complex work decomposes into predictable steps with validation gates, accuracy verification, and bias assessments between each component. Continuous monitoring tracks output quality and reasoning consistency across the complete execution chain, ensuring reliability and maintaining auditability.
Routing leverages established security and governance frameworks to classify inputs using confidence thresholds and fairness criteria. Tasks route to specialized agents while monitoring systems track demographic disparities and ensure optimal cost-capability matching with equitable treatment across user populations. This pattern enables organizations to balance expensive, capable models with efficient, targeted solutions.
Parallelization utilizes the robust monitoring infrastructure to process independent subtasks simultaneously with sophisticated result aggregation, conflict resolution, and consensus validation. Bias detection prevents systematic discrimination while load balancing ensures efficient resource utilization.
Evaluator-Optimizer extends continuous evaluation capabilities into iterative refinement processes. Self-correction loops operate with convergence detection, cost controls, and quality improvement tracking while preventing infinite iterations and ensuring productive outcomes that justify computational investment.
Orchestrator-Workers employs the comprehensive monitoring framework for dynamic planning with load balancing, failure handling, and adaptive replanning based on intermediate results. This pattern provides efficient resource utilization while maintaining visibility into distributed decision-making processes.
This orchestrated approach transforms solid foundational infrastructure into dynamic business capability, enabling AI systems to handle complex processes while operating within governance boundaries that maintain enterprise confidence. And, this naturally brings us to the final tier.

Autonomous Tier: Enabling Dynamic Intelligence

The progression from structured workflows leads naturally to the Autonomous Tier (i.e., advanced implementations that allow agentic AI systems to determine their own execution strategies based on high-level objectives). This autonomy becomes feasible only through the sophisticated monitoring, safety constraints, and ethical boundaries established in previous tiers.


Tier 3: Enabling Dynamic Intelligence

Goal-Directed Planning with Ethical Boundaries represents the culmination of Foundation Tier ethical safeguards and Workflow Tier orchestration capabilities. Systems receive strategic objectives and operate within ethical constraints, safety boundaries, cost budgets, and performance targets established through lower-tier implementations.

Planning processes incorporate uncertainty quantification, alternative strategy development, and comprehensive stakeholder impact assessment while continuous monitoring ensures autonomous decisions align with organizational values and regulatory requirements.

Adaptive Learning with Bias Prevention extends the continuous evaluation frameworks from previous tiers into self-improvement capabilities. Systems refine their approaches based on environmental feedback including tool execution results, user satisfaction metrics, and fairness indicators across demographic groups.

Learning mechanisms incorporate active bias correction to enhance performance without amplifying existing inequalities or creating new forms of discrimination.

Multi-Agent Collaboration with Conflict Resolution coordinates specialized agents through the structured communication protocols established in Workflow Tier implementations, enhanced with sophisticated conflict resolution, consensus mechanisms, and ethical arbitration. Agents manage planning, execution, testing, and analysis while maintaining shared context and synchronized ethical standards that prevent echo chambers or biased consensus formation.

In short, autonomous tier require the sophisticated monitoring, cost controls, and governance frameworks that Foundation and Workflow tiers provide. They operate most effectively in controlled environments with strict resource limits, comprehensive safety monitoring, and explicit regulatory approval, demanding robust exception handling and clear escalation procedures that only mature foundational infrastructure can support.

Industry-Specific Implementation Approaches

Our three-tier progression manifests differently across industries, reflecting unique regulatory environments, risk tolerances, customer expectations and operational requirements. Understanding these industry-specific approaches enables organizations to tailor their implementation strategies while maintaining systematic capability development. Let’s look at some industry examples:

Financial services represents, perhaps the most challenging environment for agentic AI deployment. Financial institutions leverage AI capabilities for fraud detection, risk assessment, and customer service while operating under increasingly strict regulatory oversight focused on algorithmic fairness and discriminatory impact prevention.

This creates a natural emphasis on Foundation Tier implementations with comprehensive Tool Orchestration providing strict governance, threat detection, and bias monitoring for all financial system interactions. Reasoning transparency becomes critical for defensible decision-making with demographic impact tracking, while Data Lifecycle Governance incorporates aggressive tokenization, consent management, and fairness verification protocols.

For example, Workflow Tier advancement for loan underwriting and algorithmic trading requires mandatory human checkpoints, comprehensive bias testing, and equitable outcome monitoring. Autonomous patterns remain largely experimental due to regulatory constraints that demand the transparency and control only mature Foundation implementations provide.

Healthcare Agentic deployment carries the highest stakes, where patient safety and health equity concerns make the systematic tier progression essential. Healthcare organizations must ensure AI systems augment clinical judgment while maintaining strict compliance with privacy regulations and ethical standards.

Where, Foundation Tier implementation prioritizes Data Lifecycle Governance for PHI with FHIR compliance and comprehensive consent management, Tool Orchestration with stringent access controls for electronic health records (EHRs) and medical devices, and Reasoning Transparency for AI-assisted diagnosis with clinical evidence tracking and fairness validation.

Then, Workflow Tier progression focuses on administrative automation and clinical workflow support with mandatory human oversight, health equity assessments, and patient safety checkpoints that leverage Foundation monitoring capabilities. Autonomous tier remain highly restricted, requiring the mature governance frameworks that comprehensive Foundation and Workflow implementations provide.

Retail organizations demonstrate how tier progression enables personalization at scale while ensuring customer fairness across diverse populations. Retailers must balance intimate personalization with global optimization while preventing discriminatory practices that could harm brand reputation or violate emerging regulations.

Implementation leverages Foundation Tier for comprehensive PII protection and secure access with bias detection throughout customer-facing systems. Workflow Tier provides sophisticated customer service routing with fairness validation, order processing with integrated fraud detection, personalized content generation with demographic equity verification, and inventory management with demand forecasting capabilities.

Autonomous pattern exploration in dynamic pricing and supply chain optimization becomes viable within controlled contexts because Foundation-level fairness monitoring ensures equitable treatment across customer segments and geographic regions.

Manufacturing showcases how systematic tier progression manages the intersection of AI capabilities with physical safety requirements and workforce impact considerations. Manufacturing organizations must maintain absolute precision and safety while managing workforce transitions as AI augments human capabilities.

Foundation Tier focuses on operational technology/information technology (OT/IT) security integration with comprehensive threat detection and workforce impact and safety monitoring, Tool Orchestration for machinery and sensor integration with safety protocols, creating the safety framework required for advanced automation.

Workflow Tier enables production sequence automation with quality validation, computer vision quality control with bias and anomaly detection systems, and predictive maintenance coordination with workforce planning considerations. Autonomous patterns supporting predictive maintenance and dynamic scheduling become feasible within strict safety boundaries because Foundation monitoring capabilities ensure comprehensive workforce impact assessments and ethical automation guidelines that consider broader community effects.

Implementation Strategy and Guiding Principles

Successful deployment of these three-tier progression depends on combining technical excellence with ethical responsibility and strong change management. These four implementation phases help move safely through each capability tier while keeping security, governance and trust at the center.

Establish Foundation Tier Patterns

Implement Tool Orchestration with Enterprise Security, Reasoning Transparency with Continuous Evaluation, and Data Lifecycle Governance with Ethical Safeguards. Include threat modeling, bias testing, and human-AI collaboration rules to earn trust early.

Demonstrate Foundation Tier Value

Execute controlled pilots using foundation infrastructure to prove security compliance, cost visibility, and trust building. Begin in non-critical areas, train teams, and measure adoption alongside technical performance before scaling.

Expand Workflow Tier Patterns

Deploy Constrained Autonomy Zones and the five core orchestration patterns we discussed in this article earlier (Prompt Chaining, Routing, Parallelization, Evaluator-Optimizer, Orchestrator-Workers) for business integration. Advance only when foundation value is proven, maintaining comprehensive monitoring.

Explore Autonomous Tier Capabilities

Test Goal-Directed Planning with Ethical Boundaries, Adaptive Learning with Bias Prevention, and Multi-Agent Collaboration in controlled environments with regulatory approval. Require comprehensive safety monitoring while planning for emerging regulations like the EU AI Act.


Agentic AI Implementation Roadmap

Act Now, Build Sustainably

The enterprise agentic AI landscape is at an inflection point. Early implementations reveal a clear pattern: organizations that prioritize governance foundations consistently outperform those chasing autonomous capabilities first. Our three-tier progression isn't theoretical, it reflects the successful deployment patterns emerging across industries.

We strongly recommend progressing deliberately through each tier. Prove security compliance and stakeholder trust before expanding scope. The companies building systematic capabilities now will dominate the next phase of enterprise AI, while those rushing to autonomy face increasing regulatory scrutiny and operational risk.

The competitive advantage of Agentic AI belongs to organizations that master governance-enabled autonomy, not ungoverned automation.

Source: https://www.infoq.com/articles/agentic-ai-architecture-framework/
59
Agentic AI / Agentic AI Is Quietly Replacing Developers
« Last post by Imrul Hasan Tusher on July 13, 2025, 04:06:39 PM »
Agentic AI Is Quietly Replacing Developers

Discover how agentic AI is transforming software development from manual coding to intelligent automation.


Photo by Arnold Francisca on Unsplash

For decades, elegant code has been the hallmark of great software development — code that is streamlined, efficient, well-documented, reusable and expertly crafted. Early in my career, I was trained to prioritize writing tools and program generators over directly writing programs. This approach led to my involvement in numerous innovative initiatives that automated software modernization and improved productivity. By developing tools that generated transpilers, we were able to modernize millions of lines of code efficiently.

Today, generative AI models can create code from human language, making development more intuitive. It’s not surprising that concepts like vibe coding, which emphasizes intuition, aesthetics and the experiential impact of code, have gained traction. Though despite the expansion of low-code and no-code systems, most software is still crafted by engineers.

How Agentic AI Is Revolutionizing Software Development
With the rapid evolution of AI, from traditional AI to generative AI and now agentic AI, the focus is shifting from merely coding logic to automating the entire software development process. Automated testing, continuous documentation, automated pull requests and CI/CD systems, and AI-driven performance and security issue tracking and patching are showing great promise.

Building on these advancements, modern IT systems — including on-premises, hybrid cloud and superscalar environments—offer opportunities beyond simple reporting and analysis. These systems enable advanced problem detection, localization, correction, optimization, resilience and prevention, all facilitated by modern AI technologies.

For example, our clients use IBM’s suite of automation products to leverage both traditional and generative AI to transform raw data into intelligent systems. This powerful combination enhances IT operations by enabling sophisticated, data-driven solutions across the entire technology stack.

Agentic AI takes this a step further by executing tasks independently. Instead of writing code, it can handle the elements of building and running software automatically. This allows engineers to design AI teams where individual AI agents specialize in different tasks but collaborate to validate, refine and improve solutions.

These agents, powered by large language models and other AI systems, can communicate, share data and iteratively enhance their outputs. This continuous cycle of problem-solving will support correctness, scalability and efficiency.

Ultimately, the true engineering excellence of the future will lie in elegant automation, driven by the capabilities of advanced AI systems.

The Rise of Automation-First Development
This paradigm shift brings new opportunities and challenges for IT leaders. They will need to address key questions, such as: How can they encourage their team to use AI to achieve end-to-end automation? How can they ensure that the right controls are in place to enhance efficiencies and unlock innovation, rather than creating new problems? What tools and experiences can facilitate AI integration earlier in the software development life cycle?

3 Steps to Pivot Your Team from Code to Automation
1. Focus on Outcomes, Not Lines of Code
To achieve excellence in automation, shift your focus from the amount of code written to the number of processes automated and time saved. Implement metrics that measure these outcomes to track progress effectively. Create a culture that relies on data and experimentation to drive decision-making. Encourage engineers to think about how automation, especially with AI, can improve the speed and scalability of their work, rather than just focusing on the technical details. Promote a culture where automation is seen as a way to enhance and support the team, rather than replace it.

Last year, a small team at IBM consisting of experienced and newly hired product developers took over code repositories that contained approximately 750 JavaScript files without any documentation. The team’s goal was to figure out what the code was and exactly what it did.

To start, the team members created a proof of concept that used watsonx Code Assistant to document about 1,000 lines of code across nine different files. This allowed them to understand and document the content of each file in seconds, resulting in a time savings of over 90% for this specific task.

The time saved with this specific task has the potential to scale up into other use cases across the enterprise and culminate in thousands of hours saved.

2. Build with Automation-First Principles
Integrate automation into the core development workflow from the start. Encourage teams to automate deployment, testing, and monitoring as a regular part of their development process. Utilize AI-powered development assistants to speed up the creation of high-quality software systems. Prioritize low-code and no-code solutions for repetitive tasks, freeing up engineers to focus on high-value work and boosting overall efficiency and innovation.

For example, Ensono, a global managed service provider, leveraged IBM watsonx Code Assistant for Red Hat Ansible Lightspeed to consolidate the company’s automation platform, perform direct database migrations, and establish a single source of truth for its Ansible content.

This streamlined approach allowed Ensono to complete 28 million tasks and save developers more than an estimated 100,000 hours in just one year, significantly enhancing client incident response, applications performance and overall efficiency.

3. Develop AI-Ready Skills and Mindsets
Transform developers into automation engineers by focusing on how AI and automation tools can augment their expertise. Invest in AI and automation training to help teams build expertise in AI-driven development, observability, and self-healing systems. Encourage cross-functional collaboration between IT, operations and business teams to ensure automation aligns with strategic goals. Establish a feedback loop to encourage continuous iteration and improvement of automation strategies.

From Developer to Automation Engineer
The next era of IT isn’t about coding harder, it’s about automating smarter. The best IT leaders recognize that their teams’ greatest value is in building intelligent, adaptive systems that scale effortlessly and respond proactively to change.

By shifting focus to elegant automation, developers are empowered to solve bigger problems, create greater impact, and future-proof organizations.

The transition from elegant code to elegant automation represents a significant shift in the software development landscape. It requires a new mindset, a new set of skills and willingness to embrace AI as a collaborative tool rather than adopt AI as a replacement for human expertise.

As we move forward, successful IT teams will be those that effectively harness the power of AI to create elegant, efficient, and scalable automated solutions.

Source: https://thenewstack.io/agentic-ai-is-quietly-replacing-developers/
60
Startup / Banks allowed to make equity investment in startups
« Last post by Imrul Hasan Tusher on July 13, 2025, 10:12:36 AM »
Banks allowed to make equity investment in startups


The Bangladesh Bank (BB) has allowed banks to make equity investments in the startup sector in addition to providing loans at 4 percent interest.

The regulator issued a circular in this regard yesterday, stating that it will establish a venture capital company to facilitate investment. The company will be financed by one percent of the annual net profit of all banks.

The circular details that banks will have to offer the equity investment solely from their self-established startup fund.

Additionally, banks are required to distribute loans by availing refinancing from Bangladesh Bank's Tk 500 crore refinance fund.

No new loans or investments may be disbursed to startups outside this fund, it states, adding that disbursements under previously approved loans or investments may continue.

Furthermore, the loan ceiling for entrepreneurs has been set in phases, ranging from Tk 2 crore to Tk 8 crore, up from the previous limit of Tk 1 crore.

Entrepreneurs must be at least 21 years old, with no upper age limit, to be eligible for the financing, according to the BB circular. Existing businesses will also be eligible for the financing, provided their registration was completed within the last 12 years.

Startup companies play a supportive role in driving growth, generating employment, and fostering innovation in the country's economy, as per the BB circular.

These ventures are contributing to the creation of innovative business infrastructure, establishing connections with global investment opportunities, and opening new avenues for employment, which aligns with one of the core goals of the Sustainable Development Goals (SDGs), it further states.

To ensure more dynamic financing of high-potential startups by banks and financial institutions, several timely amendments and revisions have been made to the existing startup financing policy.

Source: https://www.thedailystar.net/business/news/banks-allowed-make-equity-investment-startups-3936266
Pages: 1 ... 4 5 [6] 7 8 ... 10