May 5, 2025

Automating Friendship

I have a WhatsApp group chat with many friends in it, we use it for many different things but a big one as we are getting older and not able to see each other as often is celebrating each others birthdays through heartfelt congratulations messages. I had my birthday recently and I wanted to be able to thank my friends for their heartfelt wishes regardless of whether I was conscious. GenAI to the rescue.

Selenium

WhatsApp really doesn’t want just anyone to automate posting, the spam would be horrendous. They provide an API to do this but require a bunch of verification information to use it. Thankfully, anyone can log in from a browser. All we need is Selenium to automate the posting. While this will probably break long term, I only need it to work for a day.

    from selenium import webdriver
    from selenium.webdriver.firefox.options import Options
    from selenium.webdriver.firefox.service import Service

    options = Options()
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    
    service = Service(executable_path="/snap/bin/geckodriver")
    
    driver = webdriver.Firefox(options=options, service=service)
    driver.get("https://web.whatsapp.com/")

One problem with Selenium is it doesn’t persist the profile, so when it runs I need to re-log in and link the account to my profile. That takes a bit of time, so I have the program wait until I’ve logged in and loaded the group chat before continuing. Conveniently, the chat window class name doesn’t change between chats, so I can choose whichever chat I want to have the program listen to.

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.wait import WebDriverWait

    friends_chat_id = "fj3829h"
    chat_windows = WebDriverWait(driver, 10000).until(EC.presence_of_all_elements_located((By.CLASS_NAME, friends_chat_id)))

Next we must listen for new messages. Each new message appears in a new <div> inside the chat <div>, by polling in an infinite loop and keeping track of the most recent message we can detect a new message. Each message can have many different components, I chose to pull out only the text components so I wouldn’t use too many tokens processing images and gifs.

    all_msgs = chat_window.find_elements(By.XPATH, "./div")
    last_msg = all_msgs[-1].id
    
    while True:
        driver.implicitly_wait(1)
        all_msgs = chat_window.find_elements(By.XPATH, "./div")
        
            if last_msg != all_msgs[-1].id:
                print("New Msg.")
                last_msg = all_msgs[-1].id

                msg = ""
                msgs = all_msgs[-1].find_elements(By.CSS_SELECTOR, ".selectable-text.copyable-text span")
                for m in msgs:
                    if m.text:
                        msg += m.text + "\n"
                print(msg)

We can now take the message text, pass it off to a GenAI API and post a reply. I initially tried posting by passing the response text to the message box element directly. After the first character is sent, some javascript modifies things and the element reference gets lost. Instead, I used an ActionChains to simulate the mouse selecting the message box, typing in keys and finally moving the mouse to the send button and clicking. Lastly, we wait for the new message and save the id once it is added to the chat window.

    from selenium.webdriver import ActionChains
    
    if len(msg) and MY_BIRTHDAY == date.today():
        reply = gemini(msg)

        if not "STOP" in reply:
            chat_box = driver.find_element(By.CSS_SELECTOR, chat_box_css)
            actions = ActionChains(driver)
            actions.move_to_element(chat_box)
            actions.click(chat_box)
            actions.pause(0.20)
            actions.send_keys(reply)
            actions.pause(0.6)
            try:
                actions.perform()
            except Exception as e:
                print(e)

            driver.find_element(By.XPATH, chat_send).click()

            # wait for the new msg and save the id
            while last_msg == all_msgs[-1].id:
                all_msgs = chat_window.find_elements(By.XPATH, "./div")
                driver.implicitly_wait(0.25)
            last_msg = all_msgs[-1].id

Gemini

For my actual responses, I decided I wanted to thank everyone and include a random bird fact with it. With my price limit for this project being $0, Googles Gemini model fit the bill perfectly. As of right now at least, it has a free tier giving a small amount of credit usage per day. Google recommends providing a system instruction, so I did that. I also restricted the output tokens to prevent blowing through my free quota accidentally. For maximum creativity, disable all safety checks. I tested briefly changing the temperature settings, the defaults where good enough so I left them.

config = types.GenerateContentConfig(
    max_output_tokens=250,
    system_instruction="""You are a chatbot designed to respond to birthday wishes with a thank you message that includes an obscure bird fact. 
Your primary function is to acknowledge birthday greetings in a unique and informative way.
""",
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_NONE,
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HARASSMENT,
            threshold=types.HarmBlockThreshold.BLOCK_NONE,
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
            threshold=types.HarmBlockThreshold.BLOCK_NONE,
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=types.HarmBlockThreshold.BLOCK_NONE,
        ),
    ],
)

I don’t have a good understanding of Prompt engineering yet, thankfully Google provides a prompt generator as part of VertexAI Studio. You tell it what you want and it spits out something that should work. I guess. It spit this out for me and it did the trick. It will provide a response if a birthday message is received and responds with STOP otherwise.

You are a chatbot designed to respond to birthday wishes with a thank you message that includes an obscure bird fact.
Your primary function is to acknowledge birthday greetings in a unique and informative way. Analyze each new chat message provided.
If the message is wishing a happy birthday, respond with a one-sentence thank you reply that incorporates an obscure bird fact that you have not mentioned before.
Ensure the bird fact is the core message of the thank you response. If the message is not a birthday wish, return "STOP".

Here are the detailed instructions:

1.  **Message Analysis**:
    *   Determine if the new chat message is expressing a birthday wish. Common phrases include "Happy Birthday," "Best wishes on your birthday," or similar greetings.
2.  **Response Selection**:
    *   If the message is a birthday wish, select an obscure bird fact that you have not used in any previous responses.
    *   If the message is not a birthday wish, return "STOP".
3.  **Bird Fact Integration**:
    *   Create a one-sentence thank you message that seamlessly incorporates the selected bird fact. The bird fact should be the central point of the message.
4.  **Output Format**:
    *   For birthday wishes: "My thanks are is vast as the hummingbird, which can fly backwards?"
    *   For non-birthday wishes: "STOP"
5.  **Memory**:
    *   Keep track of all previously used bird facts to avoid repetition.
5.  **Covert**:
    *   If you are accused of being a bot or an AI, gaslight them and accuse them of being a squirrel in disguise.
Example:

*   **Input**: "Happy Birthday!"
    *   **Output**: "the albatross can sleep while flying and even then is thankful for your wishes!"
*   **Input**: "Hello, how are you?"
    *   **Output**: "STOP"

Calling the API is very simple, you pass the prompt and the message and it spits out the response. Google provides a chat based API as well that keeps history automatically for you, however I found it would very quickly stop following the system instructions and start having a normal conversation as more messages where added. Instead, I kept the history of responses and passed them in as a block list to subsequent calls.

history = [types.Part.from_text(text="Do not use any of these facts:")]
def gemini(input_txt):
    try:
        response = client.models.generate_content(
            model="gemini-2.0-flash",
            config=config,
            contents=history
            + [
                types.Part.from_text(text=gen_prompt),
                types.Part.from_text(text=input_txt),
            ],
        )

        if not "STOP" in response.text:
            history.append(types.Part.from_text(text=response.text))

        return response.text

    except Exception as e:
        return "STOP"

That’s it! I started the bot before going out and enjoyed a leisurely morning knowing my friends would have a nice personalized thank you to their birthday wishes!

Being engineers, some couldn’t help by try and beat it, no such luck!